---
title: "Doubly robust estimation of the LATE and LATT with drlate"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Doubly robust estimation of the LATE and LATT with drlate}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
                      fig.width = 7, fig.height = 4.2, dpi = 110,
                      out.width = "100%")
```

## Overview

`drlate` estimates the local average treatment effect (LATE) and the local
average treatment effect on the treated (LATT) from observational data with
a binary instrument. It implements the complete estimator suite of
Słoczyński, Uysal, and Wooldridge: the doubly robust estimators of their
2022 paper (the Stata command `drlate`, Statistical Software Components
S459708) and the Abadie-kappa weighting estimators of their 2025 *JBES*
paper (the Stata command `kappalate`, S459257), unified behind one
interface and one inference architecture.

The estimation core supports:

* **Doubly robust and regression/weighting estimators** (`method`):
  inverse-probability-weighted regression adjustment (`"ipwra"`, the
  default, doubly robust), inverse probability weighting (`"ipw"`),
  augmented inverse probability weighting (`"aipw"`, doubly robust), and
  regression adjustment (`"ra"`).
* **Abadie-kappa weighting estimators**: `"kappa"`
  (kappalate's `tau_a`), `"kappa0"` (`tau_a,0`), and `"kappa10"`
  (`tau_a,10`); together with the two IPW variants (= `tau_u` and
  `tau_a,1`) these complete the five-estimator menu of the 2025 paper.
* **Outcome and treatment models**: linear, logistic, probit, or Poisson,
  plus fractional-logit and fractional-probit for outcomes in `[0, 1]`, so the
  response may be continuous, binary, a count, or a proportion (matching the
  Stata `lateffects` `omodel`/`tmodel` options).
* **Instrument propensity score models** (`ivmodel`): logistic regression by
  maximum likelihood (default), covariate balancing (`"cbps"`, Imai and
  Ratkovic 2014), inverse probability tilting (`"ipt"`, Graham, Pinto,
  and Egel 2012), or probit maximum likelihood (`"probit"`, for the
  weighting estimators).
* **Normalized** (default) or unnormalized weighting for IPW and AIPW.
* Sampling weights and cluster-robust standard errors.

Beyond the two Stata commands, the package adds a common workflow layer,
and makes it available on the kappa weighting estimators too, where
`kappalate` itself offers only robust and cluster-robust standard errors:

* **Diagnostics**: `plot()` displays of propensity-score overlap,
  covariate balance, and implied weights; `balance()` tables and the
  `balance_test()` overidentification balance test; `complier_means()`
  complier profiling; first-stage strength on every printout.
* **Weak-instrument-robust inference**: Fieller confidence sets via
  `confint(method = "fieller")` (for the ratio-form estimators, including
  `"kappa"` and `"kappa0"`).
* **Bootstrap inference**: `vcov = "bootstrap"` (cluster-aware,
  parallelizable).
* **The DR Hausman test** of unconfoundedness from the 2022 paper's
  Section 5 (`dr_hausman()`), with an analytic standard error from a
  jointly stacked moment system.
* **Estimator comparison**: `drlate_compare()` with a dot-whisker plot.

The estimators are validated against the authors' Stata commands by
golden-fixture parity (estimates and standard errors), and the inference
extensions by Monte Carlo.

### Joint inference

drlate computes point estimates from sequential weighted regressions. For
inference, it stacks the moment conditions of *every* estimation stage — the
instrument propensity score, the outcome regressions, the treatment
regressions, and the causal aggregates — into one just-identified
M-estimation system; the variance is the sandwich
\(A^{-1} B A^{-\top} / n\) evaluated at the estimates. This reproduces the
Stata package's `gmm, onestep iterate(0)` construction: standard errors
account for the estimation uncertainty of each stage, including the
first-stage propensity score.

## Example

The bundled `drlate_sim` data simulates a binary instrument `rsncode`, a
binary treatment `nvstat` with two-sided noncompliance, and outcomes on
three scales. The true complier effect on `lwage` is 0.5.

```{r}
library(drlate)
data(drlate_sim)

fit <- drlate(lwage ~ age + educ,      # outcome model
              nvstat ~ age + educ,     # treatment model
              rsncode ~ age + educ,    # instrument propensity score model
              data = drlate_sim)
summary(fit)
```

The three reported quantities mirror the Stata package's output: the causal estimate
(LATE), the intent-to-treat effect of the instrument on the outcome
(numerator), and the first-stage effect of the instrument on the treatment
(denominator), with the LATE formed as their ratio.

```{r}
coef(fit)
confint(fit)
```

### Other estimators

```{r}
# AIPW with unnormalized moments
drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ,
       data = drlate_sim, method = "aipw", normalized = FALSE)

# IPW: no covariates in the outcome/treatment equations
drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ,
       data = drlate_sim, method = "ipw")

# Regression adjustment: no instrument covariates
drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ 1,
       data = drlate_sim, method = "ra")
```

### Abadie-kappa weighting estimators

The kappa methods are pure weighting estimators — covariates enter only
through the instrument propensity score, so the outcome and treatment
formulas are intercept-only. The printed output shows each estimator's
`kappalate` name:

```{r}
# Normalized Abadie kappa (kappalate tau_a,10); reports the LATE only,
# since the estimator is a difference of two ratios
drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ,
       data = drlate_sim, method = "kappa10")

# Unnormalized Abadie kappa (tau_a); Fieller sets available
fit_k <- drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age + educ,
                data = drlate_sim, method = "kappa")
confint(fit_k, method = "fieller")
```

### LATT, other model families, and IPT

```{r}
# LATT with an inverse-probability-tilted instrument propensity score
drlate(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ,
       data = drlate_sim, estimand = "latt", ivmodel = "ipt")

# Poisson outcome model for the positive wage level
drlate(kwage ~ age + educ, nvstat ~ age + educ, rsncode ~ 1,
       data = drlate_sim, method = "ra", omodel = "poisson")
```

### Clustered standard errors and weights

```{r}
drlate(lwage ~ age, nvstat ~ age, rsncode ~ age, data = drlate_sim,
       cluster = drlate_sim$educ)
```

## Diagnostics

`plot()` provides the standard design checks — propensity-score overlap,
covariate balance before/after weighting (the love plot), and the implied
weight distributions; `balance()` returns the standardized mean
differences as a data frame:

```{r diag-balance}
fit <- drlate(lwage ~ age + educ, nvstat ~ age + educ,
              rsncode ~ age + educ, data = drlate_sim)
plot(fit, type = "balance")
balance(fit)
```

```{r diag-overlap}
plot(fit, type = "overlap")
```

`complier_means()` profiles how the compliers differ from the population
(weighting by Abadie's kappa), and `balance_test()` runs the
Imai--Ratkovic overidentification test of whether the propensity-score
model balances the covariates --- diagnostics that mirror the
postestimation suite of Stata's `lateffects` command:

```{r diag-postest}
complier_means(fit)
balance_test(fit)
```

## Inference beyond the default sandwich

Every printout reports the first-stage z (with z² ≈ F for a single binary
instrument) and flags weakness below F = 10. The package adds two
inference tools:

```{r}
# Weak-instrument-robust Fieller confidence set (may be unbounded when
# the first stage is weak -- that is the honest answer)
confint(fit, method = "fieller")

# Nonparametric bootstrap (percentile CIs; clusters resampled whole
# when `cluster` is supplied)
fit_b <- drlate(lwage ~ age + educ, nvstat ~ age + educ,
                rsncode ~ age + educ, data = drlate_sim,
                vcov = "bootstrap", boot_reps = 199, boot_seed = 1)
confint(fit_b)
```

## The DR Hausman test of unconfoundedness

Under one-sided noncompliance (nobody takes the treatment without the
instrument), the instrument-based LATT equals the unconfoundedness-based
ATT if treatment assignment is unconfounded given the covariates.
Section 5 of the 2022 paper turns this equality into a heterogeneity-robust
Hausman test, implemented here; the Stata package does not provide it:

```{r}
d_os <- drlate_sim
d_os$nvstat[d_os$rsncode == 0] <- 0L
dr_hausman(lwage ~ age + educ, nvstat ~ age + educ, rsncode ~ age + educ,
           data = d_os)
```

The simulated treatment is confounded by construction, and the test
rejects.

## Comparing estimators

```{r compare-plot}
cmp <- drlate_compare(lwage ~ age + educ, nvstat ~ age + educ,
                      rsncode ~ age + educ, data = drlate_sim)
cmp
plot(cmp)
```

## Replicating the Stata examples

The Stata help file's examples use a public extract from the Survey of
Income and Program Participation (SIPP). The equivalent R calls are:

```{r, eval = FALSE}
sipp <- haven::read_dta("https://people.brandeis.edu/~tslocz/sipp.dta")
sipp <- subset(as.data.frame(sipp),
               !is.na(kwage) & !is.na(educ) & rsncode != 999)
sipp$lwage <- log(sipp$kwage)

# Stata: drlate (lwage age_5) (nvstat age_5) (rsncode age_5)
drlate(lwage ~ age_5, nvstat ~ age_5, rsncode ~ age_5, data = sipp)

# Stata: drlate (lwage age_5) (nvstat age_5) (rsncode age_5, ipt), latt
drlate(lwage ~ age_5, nvstat ~ age_5, rsncode ~ age_5, data = sipp,
       ivmodel = "ipt", estimand = "latt")

# Stata: kappalate lwage (nvstat = rsncode) age_5, zmodel(logit) which(all)
drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age_5, data = sipp,
       method = "kappa")     # tau_a; likewise "kappa0", "kappa10",
                             # and method = "ipw" for tau_u / tau_a,1

# Stata: kappalate lwage (nvstat = rsncode) age_5, zmodel(probit)
drlate(lwage ~ 1, nvstat ~ 1, rsncode ~ age_5, data = sipp,
       method = "kappa", ivmodel = "probit")
```

The package's test suite verifies numerical equivalence of estimates and
standard errors against fixtures generated by both Stata commands on this
dataset (see `inst/stata/make-fixtures.do` and
`inst/stata/make-kappalate-fixtures.do`).

## Citation

If you use drlate in your research, please cite the R package, the
methodological paper for the estimators you use, and the original Stata
module (see `citation("drlate")` for BibTeX entries):

> Venkitasubramanian, K. (2026). drlate: Doubly Robust Estimation of the
> Local Average Treatment Effect in R. R package version
> `r as.character(packageVersion("drlate"))`.
> https://github.com/kvenkita/drlate

> Słoczyński, T., Uysal, S. D., & Wooldridge, J. M. (2025). Abadie's Kappa
> and Weighting Estimators of the Local Average Treatment Effect.
> *Journal of Business & Economic Statistics* 43(1), 164–177.

> Uysal, D., Słoczyński, T., & Wooldridge, J. M. (2026). DRLATE: Stata
> module to perform doubly robust estimation of the local average
> treatment effect (LATE) and the local average treatment effect on the
> treated (LATT). Statistical Software Components S459708, Boston College
> Department of Economics.

## References

* Słoczyński, T., S. D. Uysal, and J. M. Wooldridge (2022). "Doubly Robust
  Estimation of Local Average Treatment Effects Using Inverse Probability
  Weighted Regression Adjustment." arXiv:2208.01300.
* Słoczyński, T., S. D. Uysal, and J. M. Wooldridge (2025). "Abadie's
  Kappa and Weighting Estimators of the Local Average Treatment Effect."
  *Journal of Business & Economic Statistics* 43(1), 164–177.
* Abadie, A. (2003). "Semiparametric Instrumental Variable Estimation of
  Treatment Response Models." *Journal of Econometrics* 113(2), 231–263.
* Donald, S. G., Y.-C. Hsu, and R. P. Lieli (2014). "Testing the
  Unconfoundedness Assumption via Inverse Probability Weighted Estimators
  of (L)ATT." *Journal of Business & Economic Statistics* 32(3), 395–415.
* Fieller, E. C. (1954). "Some Problems in Interval Estimation." *JRSS-B*
  16(2), 175–185.
* Graham, B. S., C. C. de Xavier Pinto, and D. Egel (2012). "Inverse
  Probability Tilting for Moment Condition Models with Missing Data."
  *Review of Economic Studies* 79(3), 1053–1079.
* Imai, K., and M. Ratkovic (2014). "Covariate Balancing Propensity Score."
  *JRSS-B* 76(1), 243–263.