Interactive Fixed Effects with xtife

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Binzhi Chen

2026-04-21

Introduction

Standard two-way fixed effects (TWFE) estimators control for unobserved unit-specific and time-specific heterogeneity through additive fixed effects \(\alpha_i\) and \(\xi_t\). In many empirical settings, however, unobserved confounders interact: a unit’s exposure to a common shock depends on its own latent characteristics.

The Interactive Fixed Effects (IFE) model of Bai (2009) generalises TWFE by replacing the additive error structure with a factor model:

\[y_{it} = \alpha_i + \xi_t + X_{it}'\beta + \lambda_i'F_t + u_{it}\]

where \(F_t \in \mathbb{R}^r\) are common factors and \(\lambda_i \in \mathbb{R}^r\) are unit-specific loadings. The term \(\lambda_i'F_t\) captures unobserved heterogeneity that varies over both dimensions simultaneously.

xtife provides a pure base-R implementation with:

Analytical standard errors: homoskedastic, HC1 robust, cluster-robust by unit
Asymptotic bias correction: Bai (2009) static and Moon & Weidner (2017) dynamic
Factor number selection: information criteria from Bai & Ng (2002) and Bai (2009)

Quick Start

library(xtife)
data(cigar)

# Fit IFE with r=2 factors, two-way FE, cluster-robust SE
fit <- ife(sales ~ price, data = cigar,
           index  = c("state", "year"),
           r      = 2,
           force  = "two-way",
           se     = "cluster")
print(fit)
#> 
#> Interactive Fixed Effects (Bai 2009, Econometrica)
#> -------------------------------------------------------
#> Panel    : N = 46 units, T = 30 periods
#> Factors  : r = 2
#> Force    : two-way fixed effects
#> SE type  : cluster (by state)
#> Outcome  : sales
#> -------------------------------------------------------
#>       Estimate Std.Error t.value Pr(>|t|)             95% CI    
#> price  -0.5242    0.0802 -6.5333   0.0000 [-0.6816, -0.3667] ***
#> ---
#> Signif. codes: *** <0.01  ** <0.05  * <0.1
#> -------------------------------------------------------
#> sigma^2  : 18.456076 | df = 1157
#> Converged: YES | Iterations: 10
#> -------------------------------------------------------
#> Factor selection criteria at r = 2 [IC1-3: Bai & Ng 2002; IC_bic/PC: Bai 2009]:
#>   IC1 =   3.2347  |  IC2 =   3.2900  |  IC3 =   3.1421
#>   PC  =  36.8093  |  IC (BIC-style) =   4.2661
#>   -> Run ife_select_r() to compare criteria across r = 0, 1, ..., r_max
#>      and identify the IC-minimising number of factors.

The output shows the coefficient table, convergence status, factor selection criteria, and (when enabled) bias correction details.

Access individual components:

# Estimated coefficients
fit$coef
#>      price 
#> -0.5241574

# Standard errors
fit$se
#>     price 
#> 0.0802281

# p-values
fit$pval
#>        price 
#> 9.616369e-11

# 95% confidence intervals
fit$ci
#>         CI.lower   CI.upper
#> price -0.6815663 -0.3667486

# Estimated factors (T x r matrix)
dim(fit$F_hat)
#> [1] 30  2

# Estimated loadings (N x r matrix)
dim(fit$Lambda_hat)
#> [1] 46  2

Standard Error Types

Three SE estimators are available via the se argument:

`se =`	Formula	When to use
`"standard"`	\(\hat{\sigma}^2 (\tilde{X}'\tilde{X})^{-1}\)	Benchmark; assumes homoskedasticity
`"robust"`	HC1 sandwich	Heteroskedasticity across \(it\) cells
`"cluster"`	Cluster-robust by unit \(i\)	Serial correlation within units (most panels)

fit_std <- ife(sales ~ price, data = cigar,
               index = c("state", "year"), r = 2, se = "standard")
fit_rob <- ife(sales ~ price, data = cigar,
               index = c("state", "year"), r = 2, se = "robust")
fit_cl  <- ife(sales ~ price, data = cigar,
               index = c("state", "year"), r = 2, se = "cluster")

# Compare standard errors
se_table <- data.frame(
  se_type  = c("standard", "robust (HC1)", "cluster"),
  coef     = c(fit_std$coef, fit_rob$coef, fit_cl$coef),
  se       = c(fit_std$se,   fit_rob$se,   fit_cl$se),
  t_stat   = c(fit_std$tstat, fit_rob$tstat, fit_cl$tstat)
)
print(se_table, digits = 4, row.names = FALSE)
#>       se_type    coef      se  t_stat
#>      standard -0.5242 0.03987 -13.146
#>  robust (HC1) -0.5242 0.04510 -11.623
#>       cluster -0.5242 0.08023  -6.533

Cluster-robust SEs are typically larger (>= robust >= standard) because they account for serial dependence within units.

Selecting the Number of Factors

Use ife_select_r() to compare information criteria across candidate values of \(r\):

# Not run during package build (takes ~30 s on cigar data)
sel <- ife_select_r(sales ~ price, data = cigar,
                    index = c("state", "year"),
                    r_max = 6,
                    force = "two-way")

The function prints a table of IC1, IC2, IC3 (Bai & Ng 2002), IC(BIC), and PC (Bai 2009) criteria for each \(r\). The recommended criterion for small-to-moderate panels (\(\min(N,T) < 60\)) is IC(BIC), which imposes a stronger \(\log(NT)\) penalty and avoids overselection.

Asymptotic Bias Correction

In large balanced panels, the IFE estimator has an asymptotic bias of order \(1/N + 1/T\). Setting bias_corr = TRUE applies the analytical correction from Bai (2009) Section 7:

\[\hat{\beta}^\dagger = \hat{\beta} - \hat{B}/N - \hat{C}/T\]

fit_bc <- ife(sales ~ price, data = cigar,
              index     = c("state", "year"),
              r         = 2,
              se        = "standard",
              bias_corr = TRUE)
print(fit_bc)
#> 
#> Interactive Fixed Effects (Bai 2009, Econometrica)
#> -------------------------------------------------------
#> Panel    : N = 46 units, T = 30 periods
#> Factors  : r = 2
#> Force    : two-way fixed effects
#> SE type  : standard (homoskedastic)
#> Outcome  : sales
#> -------------------------------------------------------
#>       Estimate Std.Error  t.value Pr(>|t|)             95% CI    
#> price  -0.5309    0.0399 -13.3155   0.0000 [-0.6092, -0.4527] ***
#> ---
#> Signif. codes: *** <0.01  ** <0.05  * <0.1
#> -------------------------------------------------------
#> sigma^2  : 18.457095 | df = 1157
#> Converged: YES | Iterations: 10
#> -------------------------------------------------------
#> Bias correction (Bai 2009 Sec. 7): beta^ = beta_raw - B/N - C/T
#>   Conditions: T/N=0.652  T/N^2=0.01418  N/T^2=0.05111
#>   price        raw= -0.5242  B/N= 0.004232  C/T= 0.002545  corrected= -0.5309
#> -------------------------------------------------------
#> Factor selection criteria at r = 2 [IC1-3: Bai & Ng 2002; IC_bic/PC: Bai 2009]:
#>   IC1 =   3.2347  |  IC2 =   3.2900  |  IC3 =   3.1421
#>   PC  =  36.8093  |  IC (BIC-style) =   4.2661
#>   -> Run ife_select_r() to compare criteria across r = 0, 1, ..., r_max
#>      and identify the IC-minimising number of factors.

The bias correction is most important when \(T/N\) is non-negligible (e.g., \(T/N > 0.3\)). For the cigar panel (\(N = 46\), \(T = 30\), \(T/N \approx 0.65\)) the correction shifts the price coefficient from \(-0.524\) to \(-0.531\).

Dynamic IFE (Moon & Weidner 2017)

When regressors include lagged dependent variables or variables correlated with past errors (predetermined, not strictly exogenous), use method = "dynamic". This applies the double projection \(M_\Lambda M_F\) in the SVD loop and the three-term bias correction from Moon & Weidner (2017):

\[\hat{\beta}^* = \hat{\beta} + \hat{W}^{-1}\left(\frac{\hat{B}_1}{T} + \frac{\hat{B}_2}{N} + \frac{\hat{B}_3}{T}\right)\]

where \(\hat{B}_1\) captures Nickell-type dynamic bias.

fit_dyn <- ife(sales ~ price, data = cigar,
               index     = c("state", "year"),
               r         = 2,
               se        = "standard",
               method    = "dynamic",
               bias_corr = TRUE,
               M1        = 1L)
print(fit_dyn)
#> 
#> Interactive Fixed Effects -- Dynamic (Moon & Weidner 2017, ET)
#> -------------------------------------------------------
#> Panel    : N = 46 units, T = 30 periods
#> Factors  : r = 2
#> Force    : two-way fixed effects
#> SE type  : standard (homoskedastic)
#> Outcome  : sales
#> -------------------------------------------------------
#>       Estimate Std.Error  t.value Pr(>|t|)             95% CI    
#> price  -0.5317    0.0417 -12.7399   0.0000 [-0.6136, -0.4498] ***
#> ---
#> Signif. codes: *** <0.01  ** <0.05  * <0.1
#> -------------------------------------------------------
#> sigma^2  : 18.457329 | df = 1157
#> Converged: YES | Iterations: 6
#> -------------------------------------------------------
#> Bias correction (Moon & Weidner 2017): beta* = beta + W^{-1}(B1/T + B2/N + B3/T)
#>   Method: dynamic  N=46  T=30  M1=1
#>   price        raw= -0.5242  B1/T=-0.000831  B2/N= 0.004636  B3/T= 0.002788  corrected= -0.5317
#> -------------------------------------------------------
#> Factor selection criteria at r = 2 [IC1-3: Bai & Ng 2002; IC_bic/PC: Bai 2009]:
#>   IC1 =   3.2347  |  IC2 =   3.2900  |  IC3 =   3.1421
#>   PC  =  36.8093  |  IC (BIC-style) =   4.2661
#>   -> Run ife_select_r() to compare criteria across r = 0, 1, ..., r_max
#>      and identify the IC-minimising number of factors.

For price (approximately exogenous), \(B_1/T \approx 0\), confirming that the static and dynamic estimates coincide.

Comparison with TWFE

Setting r=0 reduces ife() to the standard two-way FE estimator and produces results identical to lm() with unit and time dummies:

fit0 <- ife(sales ~ price, data = cigar,
            index = c("state", "year"), r = 0)

# Manual two-way demeaning
cigar$y_dm <- cigar$sales  - ave(cigar$sales,  cigar$state) -
                              ave(cigar$sales,  cigar$year)  + mean(cigar$sales)
cigar$x_dm <- cigar$price  - ave(cigar$price,  cigar$state) -
                              ave(cigar$price,  cigar$year)  + mean(cigar$price)
lm0 <- lm(y_dm ~ x_dm - 1, data = cigar)

cat(sprintf("ife (r=0): %.6f\n", fit0$coef["price"]))
#> ife (r=0): -1.084712
cat(sprintf("lm TWFE:  %.6f\n", coef(lm0)["x_dm"]))
#> lm TWFE:  -1.084712
cat(sprintf("diff:     %.2e\n",
            abs(fit0$coef["price"] - coef(lm0)["x_dm"])))
#> diff:     6.66e-16

The difference is at machine precision (\(\approx 10^{-15}\)), confirming that r=0 is algebraically equivalent to TWFE.

References

Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4), 1229–1279. doi:[10.3982/ECTA6135](https://doi.org/10.3982/ECTA6135)

Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221. doi:[10.1111/1468-0262.00273](https://doi.org/10.1111/1468-0262.00273)

Moon, H.R. and Weidner, M. (2017). Dynamic linear panel regression models with interactive fixed effects. Econometric Theory, 33, 158–195. doi:[10.1017/S0266466615000328](https://doi.org/10.1017/S0266466615000328)

Baltagi, B.H. (1995). Econometric Analysis of Panel Data. Wiley.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.