Rfuzzydid: Fuzzy Difference-in-Differences

R-CMD-check pkgcheck Codecov test coverage

Title

fuzzydid — Estimation with Fuzzy Difference-in-Difference Designs

Installation

Install the development version from GitHub with:

install.packages("remotes")
remotes::install_github("kmfrick/Rfuzzydid")

Full documentation and worked examples are available at https://kmfrick.github.io/Rfuzzydid/.

Syntax

fuzzydid(
  data,
  formula,
  group,
  time,
  group_forward = NULL,
  did = FALSE,
  tc = FALSE,
  cic = FALSE,
  lqte = FALSE,
  newcateg = NULL,
  numerator = FALSE,
  partial = FALSE,
  nose = FALSE,
  cluster = NULL,
  breps = 50,
  eqtest = FALSE,
  modelx = NULL,
  sieves = FALSE,
  sieveorder = NULL,
  tagobs = FALSE,
  backend = c("auto", "native"),
  seed = NULL,
  treatment = NULL
)

Description

fuzzydid() computes estimators of local average and quantile treatment effects in fuzzy DID designs, following de Chaisemartin and D’Haultfoeuille (2018a). It also computes their standard errors and confidence intervals.

Rfuzzydid is an R port of the Stata fuzzydid package. Its development aim is feature parity with the Stata package while exposing the estimators through a formula-first R interface.

Lifecycle and Prior Art

Rfuzzydid is a maturing R implementation of the estimators introduced by de Chaisemartin and D’Haultfoeuille (2018a) and implemented for Stata by de Chaisemartin, D’Haultfoeuille, and Guyonvarch (2018b). New development is focused on native R parity, input validation, and review-ready documentation rather than adding estimators beyond those references.

Arguments:

A detailed introduction to the methodology is given in de Chaisemartin et al. (2018b; doi:10.1177/1536867X19854019).

y, d, group, time, and group_forward must be numeric vectors. Numeric covariates are treated as continuous; factor, character, and logical covariates are treated as qualitative predictors. NA and NaN values are removed by complete-case filtering over all analysis variables. Inf and -Inf are rejected. Use tagobs = TRUE to recover the retained-row mask.

Options

Estimators:

At least one of did, tc, cic, or lqte must be specified. If several are specified, all requested estimators are computed.

Treatment categorization:

Numerators and bounds:

Inference:

Covariates:

When covariates are included and neither modelx nor sieves is specified, all conditional expectations are estimated by OLS by default.

Other:

Extractors

fuzzydid objects support print(), summary(), coef(), confint(), nobs(), formula(), vcov(), plot(), generics::tidy(), and generics::glance(). They do not implement predict(), fitted(), or residuals() because the object summarizes causal estimands rather than observation-level fitted outcomes.

Returned Values

An object of class "fuzzydid" containing:

Data frames:

Matrices (Stata-parity):

Counts:

Examples

Generate the dataset


# Generate simulated data (saved to CSV for R/Stata parity verification)
set.seed(50321)
n_cell <- 80
df <- rbind(
  data.frame(y = rnorm(n_cell, 1 + 1.8 * rbinom(n_cell, 1, 0.20)), g = 0, t = 0, d = rbinom(n_cell, 1, 0.20)),
  data.frame(y = rnorm(n_cell, 1 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.35)), g = 0, t = 1, d = rbinom(n_cell, 1, 0.35)),
  data.frame(y = rnorm(n_cell, 1 + 0.7 + 1.8 * rbinom(n_cell, 1, 0.30)), g = 1, t = 0, d = rbinom(n_cell, 1, 0.30)),
  data.frame(y = rnorm(n_cell, 1 + 0.7 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.70)), g = 1, t = 1, d = rbinom(n_cell, 1, 0.70))
)

# Save for Stata comparison
write.csv(df, "fuzzydid_example.csv", row.names = FALSE)

R

library(Rfuzzydid)
df <- read.csv("fuzzydid_example.csv")

fit <- fuzzydid(
  data = df,
  formula = y ~ d,
  group = "g",
  time = "t",
  did = TRUE,
  tc = TRUE,
  cic = TRUE,
  breps = 50
)

summary(fit)

Stata

import delimited "fuzzydid_example.csv", clear

fuzzydid y g t d, did tc cic breps(50)

Note: The Stata command is shown for parity/reference only. Rfuzzydid does not bundle the Stata fuzzydid sources, so Stata users need that command installed separately in their Stata environment.

Point estimates from R and Stata will be identical for the covered parity fixtures, but bootstrap confidence intervals can differ due to RNG differences between the two platforms. Results remain comparable across implementations.

References

License

AGPL-3.0