The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This R package provides a big-data-friendly and memory-efficient difference-in-differences (DiD) estimator for staggered (and non-staggered) treatment contexts. It supports controlling for time-varying covariates, heteroskedasticity-robust standard errors, and (single and multi-way) clustered standard errors. It addresses 4 issues that arise in the context of large administrative datasets:
DiDforBigData
will provide estimation and inference for
staggered DiD with millions of observations on a personal laptop. It is
orders of magnitude faster than other available software if the sample
size is large; see the demonstration here.DiDforBigData
helps by using much less memory than other
software; see the demonstration here.data.table
for big data management and
sandwich
for robust standard error estimation, which are
already installed with most R distributions. Optionally, it will use the
fixest
package to speed up the estimation if it is
installed. If the progress
package is installed, it will
also provide a progress bar so you know how much longer the estimation
will take.DiDforBigData
makes
parallelization easy as long as the parallel
package is
installed.To install the package from CRAN:
install.packages("DiDforBigData")
To install the package from Github:
::install_github("setzler/DiDforBigData") devtools
To use the package after it is installed:
library(DiDforBigData)
It is recommended to also make sure these optional packages have been installed:
library("progress")
library("fixest")
library("parallel")
There are only 3 functions in this package:
SimDiD()
: This function simulates data.DiDge()
: This function estimates DiD for a single
cohort and a single event time.DiD()
: This function estimates DiD for all available
cohorts and event times.Details for each function are available from the Function Documentation.
Before estimation, set up a variable list with the names of your variables:
= list()
varnames $time_name = "year"
varnames$outcome_name = "Y"
varnames$cohort_name = "cohort"
varnames$id_name = "id" varnames
To estimate DiD for a single cohort and event time, use the
DiDge
command. For example:
DiDge(inputdata = yourdata, varnames = varnames,
cohort_time = 2010, event_postperiod = 3)
A detailed manual explaining the various features available in
DiDge
is available here
or by running this command in R:
?DiDge
To estimate DiD for many cohorts and event times, use the
DiD
command. For example:
DiD(inputdata = yourdata, varnames = varnames,
min_event = -3, max_event = 5)
A detailed manual explaining the various features available in
DiD
is available here
or by running this command in R:
?DiD
For more information, read the following articles:
Acknowledgements: Thanks to Mert Demirer and Kirill Borusyak for helpful comments.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.