Introduction to etwfe

Background

A canonical research design for social scientists is the so-called “differences-in-differences” (DiD) design.1 In the classic 2x2 DiD case (two units, two periods), a simple interaction effect between two dummy variables suffices to recover the treatment effect. In base R this might look something like:

lm(y ~ Dtreated_unit * Dpost_treatment, data = somedata)

In practice, rather than specify the interaction directly, researchers often use an equivalent formulation known as two-way fixed effects (TWFE). The core idea of TWFE is that we can subsume our dummy variables above (Dtreated_unit and Dpost_treatment) by adding unit and time fixed effects to the regression. A single treatment dummy can then be used to capture the effect of treatment directly. This shortcut is especially nice for more complicated panel data settings with multiple units and multiple times periods. A TWFE in base R might look as follows:

lm(y ~ Dtreat + factor(id) + factor(period), data = somedata)

This type of regression is easy to run and intuitive, and for a long time everyone was happy. But it was too good to last. A cottage industry of smart research now demonstrates that things aren’t quite so simple. Among other things, the core TWFE result is very likely to be biased if the rollout of treatment was staggered over time (a common occurrence).

Fortunately, just as econometricians were taking away one of our favourite tools, they were kind enough to replace it with some new ones. Among these, the proposed approach by Wooldridge (2021, 2022) is noteworthy. His idea might be paraphrased as stating that the problem with TWFE is not that we were doing it in the first place. Rather, it’s that we weren’t doing it enough. Instead of only including a single treatment × time interaction, Wooldridge recommends that we saturate our model with all possible interactions between treatment and time variables, including treatment cohorts, as well as other covariates. He goes on to show that this approach actually draws an equivalence between different types of estimators (pooled OLS, twoway Mundlak regression, etc.) So it’s not entirely clear what to call it. But Wooldridge refers to the general idea as as extended TWFE—or, ETWFE—which I rather like and is where this package takes its name.

The Wooldridge ETWFE solution is intuitive and elegant. But it is also rather tedious and error prone to code up manually. The etwfe package aims to simplify the process by providing convenience functions that do the work for you.

Dataset

To demonstrate the core functionality of etwfe, we’ll use the mpdta dataset on US teen employment from the did package (which you’ll need to install separately).

# install.packages("did")
data("mpdta", package = "did")
head(mpdta)
#>     year countyreal     lpop     lemp first.treat treat
#> 866 2003       8001 5.896761 8.461469        2007     1
#> 841 2004       8001 5.896761 8.336870        2007     1
#> 842 2005       8001 5.896761 8.340217        2007     1
#> 819 2006       8001 5.896761 8.378161        2007     1
#> 827 2007       8001 5.896761 8.487352        2007     1
#> 937 2003       8019 2.232377 4.997212        2007     1

In the examples that follow, our goal is to estimate the effect of treatment (treat) on log employment (lemp). Notice that the panel ID is at the county level (countyreal), but treatment was staggered across cohorts (first.treat) so that a group of counties were treated at the same time. In addition to these staggered treatment effects, we also observe log population (lpop) as a potential, time-invariant control.2

Basic usage

Okay, let’s load etwfe and work through its basic functionality. Given the package name, it won’t surprise you to know that the key estimating function is etwfe(). Here’s how it would look for our example dataset.

library(etwfe)

mod =
  etwfe(
    fml  = lemp ~ lpop, # outcome ~ (time-invariant) controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
    )

There are a few things to say about our etwfe() argument choices and other function options, but we’ll leave those details aside until a bit later. Right now, just know that all of the above arguments are required except vcov (though I generally recommend it too, since we probably want to cluster our standard errors at the individual unit level).

Let’s take a look at our model object.

mod
#> OLS estimation, Dep. Var.: lemp
#> Observations: 2,500 
#> Fixed-effects: first.treat: 4,  year: 5
#> Varying slopes: lpop (first.treat: 4),  lpop (year: 5)
#> Standard-errors: Clustered (countyreal) 
#>                                               Estimate Std. Error   t value
#> .Dtreat:first.treat::2004:year::2004         -0.021248   0.021728 -0.977890
#> .Dtreat:first.treat::2004:year::2005         -0.081850   0.027375 -2.989963
#> .Dtreat:first.treat::2004:year::2006         -0.137870   0.030795 -4.477097
#> .Dtreat:first.treat::2004:year::2007         -0.109539   0.032322 -3.389024
#> .Dtreat:first.treat::2006:year::2006          0.002537   0.018883  0.134344
#> .Dtreat:first.treat::2006:year::2007         -0.045093   0.021987 -2.050907
#> .Dtreat:first.treat::2007:year::2007         -0.045955   0.017975 -2.556568
#> .Dtreat:first.treat::2004:year::2004:lpop_dm  0.004628   0.017584  0.263184
#> .Dtreat:first.treat::2004:year::2005:lpop_dm  0.025113   0.017904  1.402661
#> .Dtreat:first.treat::2004:year::2006:lpop_dm  0.050735   0.021070  2.407884
#> .Dtreat:first.treat::2004:year::2007:lpop_dm  0.011250   0.026617  0.422648
#> .Dtreat:first.treat::2006:year::2006:lpop_dm  0.038935   0.016472  2.363731
#> .Dtreat:first.treat::2006:year::2007:lpop_dm  0.038060   0.022477  1.693276
#> .Dtreat:first.treat::2007:year::2007:lpop_dm -0.019835   0.016198 -1.224528
#>                                                Pr(>|t|)    
#> .Dtreat:first.treat::2004:year::2004         3.2860e-01    
#> .Dtreat:first.treat::2004:year::2005         2.9279e-03 ** 
#> .Dtreat:first.treat::2004:year::2006         9.3851e-06 ***
#> .Dtreat:first.treat::2004:year::2007         7.5694e-04 ***
#> .Dtreat:first.treat::2006:year::2006         8.9318e-01    
#> .Dtreat:first.treat::2006:year::2007         4.0798e-02 *  
#> .Dtreat:first.treat::2007:year::2007         1.0866e-02 *  
#> .Dtreat:first.treat::2004:year::2004:lpop_dm 7.9252e-01    
#> .Dtreat:first.treat::2004:year::2005:lpop_dm 1.6134e-01    
#> .Dtreat:first.treat::2004:year::2006:lpop_dm 1.6407e-02 *  
#> .Dtreat:first.treat::2004:year::2007:lpop_dm 6.7273e-01    
#> .Dtreat:first.treat::2006:year::2006:lpop_dm 1.8474e-02 *  
#> .Dtreat:first.treat::2006:year::2007:lpop_dm 9.1027e-02 .  
#> .Dtreat:first.treat::2007:year::2007:lpop_dm 2.2133e-01    
#> ... 10 variables were removed because of collinearity (.Dtreat:first.treat::2006:year::2004, .Dtreat:first.treat::2006:year::2005 and 8 others [full set in $collin.var])
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.537131     Adj. R2: 0.87167 
#>                  Within R2: 8.449e-4

What etwfe() has done underneath the hood is construct a treatment dummy variable .Dtreat and saturated it together with the other variables of interest as a set of multiway interaction terms.3

You may also have noticed that our etwfe() call returns a standard fixest object, since this is what it uses to perform the underlying estimation. All of the associated methods and functions from the fixest package are thus compatible with our model object. For example, we could plot the raw regression coefficients with fixest::coefplot(), or print them to a nice regression table with fixest::etable(). However, the raw coefficients from an etwfe() estimation are not particularly meaningful in of themselves. Remember that these are complex multiway interaction terms that are probably hard to to interpret on their own. So, we probably want to aggregate them along some dimension of interest instead (e.g., by groups or time, or as an event study). A natural way to perform these aggregations is by recovering the appropriate marginal effects. The etwfe package provides another convenience function for doing so: emfx(), which is a thin(ish) wrapper around marginaleffects::marginaleffects().

For example, we can recover the average treatment effect as follows.

emfx(mod)
#>       type    term    contrast        dydx  std.error statistic      p.value
#> 1 response .Dtreat mean(dY/dX) -0.05062703 0.01249979 -4.050232 5.116687e-05
#>      conf.low  conf.high predicted predicted_hi predicted_lo .Dtreat
#> 1 -0.07512616 -0.0261279  8.555778     8.555769     8.555778       1

Or, we can recover the dynamic treatment effects a la an event study by invoking the type argument. This time I’ll save the resulting object since I plan to reuse it in a moment.

mod_es = emfx(mod, type = "event")
# mod_es
summary(mod_es) # Summary print looks a bit nicer
#>      Term    Contrast event   Effect Std. Error z value   Pr(>|z|)    2.5 %
#> 1 .Dtreat mean(dY/dX)     0 -0.03321    0.01337  -2.484 0.01297951 -0.05941
#> 2 .Dtreat mean(dY/dX)     1 -0.05735    0.01715  -3.343 0.00082830 -0.09097
#> 3 .Dtreat mean(dY/dX)     2 -0.13787    0.03079  -4.477 7.5665e-06 -0.19823
#> 4 .Dtreat mean(dY/dX)     3 -0.10954    0.03232  -3.389 0.00070142 -0.17289
#>     97.5 %
#> 1 -0.00701
#> 2 -0.02373
#> 3 -0.07751
#> 4 -0.04619
#> 
#> Model type:  etwfe 
#> Prediction type:  response

(Other types of aggregations include "calendar" (here: treatment by year effects) and "group" (here: treatment by cohort).)

Since emfx produces a standard marginaleffects object, we can pass it on to other supported methods and packages. For example, we can get get pass it on to modelsummary to get a nice regression table of the event study coefficients. Note the use of the shape and coef_rename arguments below; these are optional but help to make the output look a bit nicer.

library(modelsummary)

rename_fn = function(old_names) {
  new_names = gsub(".Dtreat", "Years post treatment =", old_names)
  setNames(new_names, old_names)
}

modelsummary(
  mod_es,
  shape = term:event:statistic ~ model,
  coef_rename =  rename_fn,
  gof_omit = "Adj|Within|RMSE",
  title = "Event study",
  notes = "Std. errors are clustered at the county level"
)
Event study
Model 1
Years post treatment = 0 −0.033
(0.013)
Years post treatment = 1 −0.057
(0.017)
Years post treatment = 2 −0.138
(0.031)
Years post treatment = 3 −0.110
(0.032)
Num.Obs. 2500
R2 0.873
FE..first.treat X
FE..year X
Std. errors are clustered at the county level

For vizualization, you can pass it on to your preferred plotting method. For example:

library(ggplot2)

ggplot(mod_es, aes(x = event, y = dydx, ymin = conf.low, ymax = conf.high)) +
  geom_hline(yintercept = 0, col = "grey50") +
  geom_pointrange() +
  labs(x = "Years post treatment", y = "Effect on log employment") +
  theme_minimal()

Other families

A key feature of the ETWFE approach—one that sets it apart from other advanced DiD implementations and extensions—is that it supports nonlinear model (distribution / link) families. Users need simply invoke the family argument. Here’s a brief example, where we recast our earlier event-study as a Poisson regression.

mpdta$emp = exp(mpdta$lemp)

etwfe(
  emp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
  family = "poisson"
  ) |>
  emfx("event") |>
  summary()
#>      Term    Contrast event  Effect Std. Error  z value   Pr(>|z|)   2.5 %
#> 1 .Dtreat mean(dY/dX)     0 -22.849      16.24 -1.40674 0.15950582  -54.68
#> 2 .Dtreat mean(dY/dX)     1   3.654      43.48  0.08405 0.93301382  -81.56
#> 3 .Dtreat mean(dY/dX)     2 -71.521      21.28 -3.36144 0.00077537 -113.22
#> 4 .Dtreat mean(dY/dX)     3 -97.798      26.50 -3.69045 0.00022386 -149.74
#>    97.5 %
#> 1   8.986
#> 2  88.865
#> 3 -29.819
#> 4 -45.858
#> 
#> Model type:  etwfe 
#> Prediction type:  response

What is etwfe doing under the hood?

Now that you’ve seen etwfe in action, let’s circle back to what the package is doing under the hood. This section isn’t necessary for you to use any of the functions. Feel free to skip it. But a review of the internal details should help you to optimize for different scenarios and also give you an understanding of etwfe’s default choices.

As I’ve stated several times, the Wooldridge ETWFE approach basically involves saturating the regression with interaction effects. You can easily grab the formula of an estimated model to see for yourself.

mod$fml_all
#> $linear
#> lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003)/lpop_dm
#> <environment: 0x55c42d9af748>
#> 
#> $fixef
#> ~first.treat + first.treat[[lpop]] + year + year[[lpop]]

At this point, however, you may notice a few things. The first is that our formula references several variables that aren’t in the original dataset. An obvious one is the .Dtreat treatment dummy. A more subtle one is lpop_dm, which is the demeaned (i.e., group-centered) version of our lpop control variable. All (time-invariant) control variables have to be demeaned before they are interacted in the ETWFE setting. Here’s how you could have constructed the dataset ahead of time and estimated the ETWFE regression manually:

# First construct the dataset
mpdta2 = mpdta |>
  transform(
    .Dtreat = as.integer(year >= first.treat & first.treat != 0),
    lpop_dm = ave(lpop, first.treat, year, FUN = \(x) x - mean(x, na.rm = TRUE))
  )

# Then estimate the manual version of etwfe
mod2 = fixest::feols(
  lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) / lpop_dm |
    first.treat[lpop] + year[lpop],
  data = mpdta2,
  vcov = ~countyreal
)

We can confirm that the manual approach yields the same output as our original etwfe regression. I’ll use modelsummary to do that here, since I’ve already loaded it above.4.

modelsummary(
  list("etwfe" = mod, "manual" = mod2),
  gof_map = NA # drop all goodness-of-fit info for brevity
)
etwfe manual
.Dtreat × first.treat = 2004 × year = 2004 −0.021 −0.021
(0.022) (0.022)
.Dtreat × first.treat = 2004 × year = 2005 −0.082 −0.082
(0.027) (0.027)
.Dtreat × first.treat = 2004 × year = 2006 −0.138 −0.138
(0.031) (0.031)
.Dtreat × first.treat = 2004 × year = 2007 −0.110 −0.110
(0.032) (0.032)
.Dtreat × first.treat = 2006 × year = 2006 0.003 0.003
(0.019) (0.019)
.Dtreat × first.treat = 2006 × year = 2007 −0.045 −0.045
(0.022) (0.022)
.Dtreat × first.treat = 2007 × year = 2007 −0.046 −0.046
(0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2004 × lpop_dm 0.005 0.005
(0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2005 × lpop_dm 0.025 0.025
(0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2006 × lpop_dm 0.051 0.051
(0.021) (0.021)
.Dtreat × first.treat = 2004 × year = 2007 × lpop_dm 0.011 0.011
(0.027) (0.027)
.Dtreat × first.treat = 2006 × year = 2006 × lpop_dm 0.039 0.039
(0.016) (0.016)
.Dtreat × first.treat = 2006 × year = 2007 × lpop_dm 0.038 0.038
(0.022) (0.022)
.Dtreat × first.treat = 2007 × year = 2007 × lpop_dm −0.020 −0.020
(0.016) (0.016)

Let’s switch gears and talk about fixed-effects quickly. If you are a regular fixest user, you may have noticed that we’ve been invoking its varying slopes syntax in the fixed-effect slot (i.e., first.treat[lpop] and year[lpop]). The reason for this is part practical, part philosophical. From a practical perspective, factor_var[numeric_var] is equivalent to base R’s factor_var/numeric_var “nesting” syntax but is much faster for high-dimensional factors.5 From a philosphical perspective, etwfe tries to limit the amount of extraneous information that it reports to users. Most of the interaction effects in the ETWFE framework are just acting as controls. By relegating them to the fixed-effects slot, we can avoid polluting the user’s console with a load of extra coefficients. Nonetheless, we can control this behaviour with the fe (“fixed effects”) argument. Consider the following options and their manual equivalents.

# fe = "feo" (fixed effects only)
mod_feo = etwfe(
  lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
  fe = "feo"
)
# ... which is equivalent to the manual regression
mod_feo2 = fixest::feols(
  lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) / lpop_dm +
    lpop + i(first.treat, lpop, ref = 0) + i(year, lpop, ref = 2003) |
    first.treat + year,
  data = mpdta2, vcov = ~countyreal
)

# fe = "none"
mod_none = etwfe(
  lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, vcov = ~countyreal,
  fe = "none"
)
# ... which is equivalent to the manual regression
mod_none2 = fixest::feols(
  lemp ~ .Dtreat:i(first.treat, i.year, ref = 0, ref2 = 2003) / lpop_dm +
    lpop + i(first.treat, lpop, ref = 0) + i(year, lpop, ref = 2003) +
    i(first.treat, ref = 0) + i(year, ref = 2003),
  data = mpdta2, vcov = ~countyreal
)

I’ll leave it up to you to pass any of these to models emfx to confirm that they give correct aggregated treatment effects. But we can quickly demonstrate in a regression table that they all return the same raw coefficients.

mods = list(
  "etwfe"         = mod,
  "manual"        = mod2,
  "etwfe (feo)"   = mod_feo,
  "manual (feo)"  = mod_feo2,
  "etwfe (none)"  = mod_none,
  "manual (none)" = mod_none2
)

modelsummary(mods, gof_map = NA)
etwfe manual etwfe (feo) manual (feo) etwfe (none) manual (none)
.Dtreat × first.treat = 2004 × year = 2004 −0.021 −0.021 −0.021 −0.021 −0.021 −0.021
(0.022) (0.022) (0.022) (0.022) (0.022) (0.022)
.Dtreat × first.treat = 2004 × year = 2005 −0.082 −0.082 −0.082 −0.082 −0.082 −0.082
(0.027) (0.027) (0.027) (0.027) (0.027) (0.027)
.Dtreat × first.treat = 2004 × year = 2006 −0.138 −0.138 −0.138 −0.138 −0.138 −0.138
(0.031) (0.031) (0.031) (0.031) (0.031) (0.031)
.Dtreat × first.treat = 2004 × year = 2007 −0.110 −0.110 −0.110 −0.110 −0.110 −0.110
(0.032) (0.032) (0.032) (0.032) (0.032) (0.032)
.Dtreat × first.treat = 2006 × year = 2006 0.003 0.003 0.003 0.003 0.003 0.003
(0.019) (0.019) (0.019) (0.019) (0.019) (0.019)
.Dtreat × first.treat = 2006 × year = 2007 −0.045 −0.045 −0.045 −0.045 −0.045 −0.045
(0.022) (0.022) (0.022) (0.022) (0.022) (0.022)
.Dtreat × first.treat = 2007 × year = 2007 −0.046 −0.046 −0.046 −0.046 −0.046 −0.046
(0.018) (0.018) (0.018) (0.018) (0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2004 × lpop_dm 0.005 0.005 0.005 0.005 0.005 0.005
(0.018) (0.018) (0.018) (0.018) (0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2005 × lpop_dm 0.025 0.025 0.025 0.025 0.025 0.025
(0.018) (0.018) (0.018) (0.018) (0.018) (0.018)
.Dtreat × first.treat = 2004 × year = 2006 × lpop_dm 0.051 0.051 0.051 0.051 0.051 0.051
(0.021) (0.021) (0.021) (0.021) (0.021) (0.021)
.Dtreat × first.treat = 2004 × year = 2007 × lpop_dm 0.011 0.011 0.011 0.011 0.011 0.011
(0.027) (0.027) (0.027) (0.027) (0.027) (0.027)
.Dtreat × first.treat = 2006 × year = 2006 × lpop_dm 0.039 0.039 0.039 0.039 0.039 0.039
(0.016) (0.016) (0.016) (0.016) (0.016) (0.016)
.Dtreat × first.treat = 2006 × year = 2007 × lpop_dm 0.038 0.038 0.038 0.038 0.038 0.038
(0.022) (0.022) (0.022) (0.022) (0.022) (0.022)
.Dtreat × first.treat = 2007 × year = 2007 × lpop_dm −0.020 −0.020 −0.020 −0.020 −0.020 −0.020
(0.016) (0.016) (0.016) (0.016) (0.016) (0.016)
lpop 1.065 1.065 1.065 1.065
(0.022) (0.022) (0.022) (0.022)
first.treat = 2004 × lpop 0.051 0.051 0.051 0.051
(0.038) (0.038) (0.038) (0.038)
first.treat = 2006 × lpop −0.041 −0.041 −0.041 −0.041
(0.047) (0.047) (0.047) (0.047)
first.treat = 2007 × lpop 0.056 0.056 0.056 0.056
(0.039) (0.039) (0.039) (0.039)
year = 2004 × lpop 0.011 0.011 0.011 0.011
(0.008) (0.008) (0.008) (0.008)
year = 2005 × lpop 0.021 0.021 0.021 0.021
(0.008) (0.008) (0.008) (0.008)
year = 2006 × lpop 0.011 0.011 0.011 0.011
(0.011) (0.011) (0.011) (0.011)
year = 2007 × lpop 0.021 0.021 0.021 0.021
(0.012) (0.012) (0.012) (0.012)
(Intercept) 2.260 2.260
(0.088) (0.088)
first.treat = 2004 0.038 0.038
(0.154) (0.154)
first.treat = 2006 0.461 0.461
(0.213) (0.213)
first.treat = 2007 −0.286 −0.286
(0.167) (0.167)
year = 2004 −0.090 −0.090
(0.031) (0.031)
year = 2005 −0.110 −0.110
(0.033) (0.033)
year = 2006 −0.052 −0.052
(0.040) (0.040)
year = 2007 −0.057 −0.057
(0.045) (0.045)

  1. Good textbook introductions to DiD are available here and here, among many other places.↩︎

  2. The time-invariance property is important; time-varying controls are not allowed in the ETWFE framework. But you can also omit controls if you don’t have any.↩︎

  3. It has also demeaned the lpop control variable, but that again is something we’ll get back too later. It’s not particularly important for interpreting the final results.↩︎

  4. Another option would be to use fixest::etable()↩︎

  5. We won’t see a speed-up for this small dataset, but it can make a significant difference for large datasets.↩︎