
Korean National Assembly data for political science education.
Documentation: https://kyusik-yang.github.io/assemblykor/
The goal of assemblykor is to provide a curated
collection of Korean National Assembly datasets for teaching
quantitative methods in political science. Think of it as a Korean
politics counterpart to palmerpenguins.
The package includes seven built-in datasets covering legislators, bills, asset declarations, policy seminars, committee speeches, plenary votes, and roll call records, all drawn from public data of the Korean National Assembly (2000-2026).
This package and all its tutorials follow tidyverse conventions. We use
dplyr, ggplot2, tidyr, and the
pipe operator (%>%) throughout.
Why tidyverse-first for teaching?
Note: This package is designed for educational purposes only. The datasets have been processed and curated for classroom use and may not reflect the most up-to-date or complete records. For research or policy analysis, please verify against the original data sources listed in the Data sources section below.
|
20-22대 국회의원 메타데이터 (이름, 정당, 선거구, 성별, 선수, 발의 건수) |
법안 메타데이터 (제목, 위원회, 발의일, 처리 결과, 대표발의자) |
|
의원 재산신고 패널 (순자산, 부동산, 예금, 주식, 13개 시점) |
정책세미나 활동 패널 (개최 건수, 초당적 비율, 여당 여부) |
|
22대 과학기술정보방송통신위원회 상임위 회의록 전수 (발언 전문, 화자 역할 포함) |
본회의 표결 결과 (20-22대, 찬성/반대/기권 수, 의결 결과) |
|
22대 개별 의원 표결 기록 (의원별 찬성/반대/기권/불참, 1,233개 법안) |
|
# install.packages("remotes")
remotes::install_github("kyusik-yang/assemblykor")library(assemblykor)
data(legislators)
data(bills)
data(wealth)
data(seminars)
data(speeches)
data(votes)
data(roll_calls)library(dplyr)
legislators %>%
filter(assembly == 22) %>%
count(party, gender) %>%
filter(n >= 3)library(ggplot2)
ggplot(wealth, aes(x = net_worth / 1e6)) +
geom_histogram(bins = 50, fill = "steelblue") +
labs(x = "Net worth (billion KRW)", y = "Count",
title = "Distribution of legislator wealth")bills %>%
count(result, sort = TRUE) %>%
head(5)
#> result n
#> 1 임기만료폐기 30678
#> 2 대안반영폐기 13692
#> 3 수정가결 2222
#> 4 원안가결 1048
#> 5 철회 578ggplot2 plots with Korean text (axis labels, titles) may show broken characters. Run this once per session to fix it:
set_ko_font()
#> Korean font set: Apple SD Gothic NeoThis auto-detects a Korean font on your system (macOS, Windows,
Linux). The interactive tutorials (run_tutorial()) handle
this automatically.
Larger datasets are available via download functions (requires the
arrow package):
# Bill propose-reason texts (60,925 texts, ~40 MB download)
texts <- get_bill_texts()
# Co-sponsorship records (769,773 rows, ~25 MB download)
proposers <- get_proposers()The package includes nine Korean-language tutorials designed for classroom use in political science methods courses:
| # | Tutorial | Topic | Level |
|---|---|---|---|
| 1 | R 기초와 tidyverse | dplyr 핵심 함수, 파이프, 데이터 결합 |
Beginner |
| 2 | ggplot2 시각화 | 막대, 히스토그램, 산점도, 박스플롯, facet | Beginner |
| 3 | 회귀분석 | OLS, 다중회귀, 로그 변환, 상호작용, 계수 시각화 | Intermediate |
| 4 | 패널 데이터와 고정효과 | 합동 OLS vs FE, 양방향 FE, DiD, 군집 표준오차 | Intermediate |
| 5 | 텍스트 분석 입문 | 키워드 빈도, TF-IDF, 발언 분석, 위원회별 비교 | Intermediate |
| 6 | 네트워크 분석 | 공동발의 네트워크, 중심성, 커뮤니티 탐지, 초당적 분석 | Advanced |
| 7 | 기명투표 분석 | Rice Index, 이탈투표, 투표 히트맵, 정당 응집력 | Advanced |
| 8 | 법안 가결 요인 | 이항 변수, 로지스틱 회귀, 승산비, 예측 확률 | Intermediate |
| 9 | 발언 패턴 분석 | 발언 빈도/길이, 발언 순서, 의제 키워드, 다양성 | Advanced |
Each tutorial is available in two formats:
Option A: Interactive browser (recommended for self-study)
# Launch an interactive tutorial with exercises in the browser
run_tutorial(1) # or run_tutorial("01-tidyverse-basics")Requires the learnr package
(install.packages("learnr")). Students can type and run
code directly in the browser with hints and solutions.
Option B: Plain R Markdown (for editing in RStudio)
# List available tutorials
list_tutorials()
# Copy an Rmd file to your working directory
open_tutorial(1) # or open_tutorial("01-tidyverse-basics")Students can edit and knit the Rmd file in RStudio at their own pace.
All datasets share the member_id and/or
assembly columns for easy joining:
# Merge legislators with wealth data
leg_wealth <- legislators %>%
inner_join(wealth, by = "member_id", relationship = "many-to-many")CSV versions of the smaller datasets are available for teaching file I/O:
# Find the file path
path_to_file("legislators.csv")
# Read directly
legislators_csv <- read.csv(path_to_file("legislators.csv"), fileEncoding = "UTF-8")All data in this package are derived from publicly available sources.
| Dataset | Source | License |
|---|---|---|
| Legislators, bills, proposers, votes, roll calls | Open National Assembly Information API (open.assembly.go.kr) | Public domain (Korean government open data) |
| Speeches (committee minutes) | Open National Assembly Information API (open.assembly.go.kr) | Public domain (Korean government open data) |
| Asset declarations | OpenWatch | CC BY-SA 4.0 |
| Policy seminars | National Assembly Seminar Database | Public data |
Inspired by: palmerpenguins (Horst, Hill & Gorman, 2020), which demonstrates how domain-specific datasets can make methods teaching more engaging.
This package is intended for educational use only (teaching quantitative methods in political science courses). The datasets have been processed, filtered, and in some cases sampled for classroom convenience. They should not be treated as authoritative records.
For research or policy analysis, please consult the original data sources listed above and verify the data independently.
citation("assemblykor")Yang, Kyusik (2026). assemblykor: Korean National Assembly Data for
Political Science Education. R package version 0.1.1.
https://github.com/kyusik-yang/assemblykor
MIT (package code). Data licenses as noted above.