The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Exponential Tilting Theory

Overview

This vignette explains the matrix-vectorized implementation of the exponential tilting (EXPTILT) estimator used in this package. The exposition focuses on concepts and linear-algebraic operations used in the implementation (functions such as generate_conditional_density_matrix, generate_C_matrix, generate_Odds, and the vectorized s_function), and mirrors the statistical equations that underlie the algorithm (see Riddles et al. (2016) for the theoretical background).

The method operates on a dataset split into respondents and non-respondents. Let \[n = n_0 + n_1\] where \(n_1\) is the number of respondents (observed \(Y\)) and \(n_0\) is the number of non-respondents (missing \(Y\)). The algorithm always treats these two groups separately: the conditional-density model is fit on respondents and used to impute support for non-respondents; the missingness model is fit using covariates that include the candidate \(y\) values.

We emphasize two distinct and disjoint sets of covariates throughout, and we require they have empty intersection: \[\mathcal{X}_{\text{outcome}} \cap \mathcal{X}_{\text{missingness}} = \varnothing.\]

Distinguishing these sets clearly in notation and code is crucial: the conditional density model is fit using covariates_for_outcome on respondents only, while the missingness model uses covariates_for_missingness and (importantly) takes \(y\) as an additional predictor when forming \(\pi(\cdot;\phi)\).

Notation and main objects

Let:

We build three central matrices:

  1. Conditional-density matrix \(F\) (denoted in code as f_matrix_nieobs):

    • Size: \(n_0 \times n_1\) where \(n_0\) is the number of non-respondents and \(n_1\) is the number of distinct respondent \(y\) values used as support.
    • Entries: \[F_{ij} = f_1(y_j \mid x^{\text{un}}_i; \hat\gamma)\]
      where \(\hat\gamma\) is the (estimated) parameter vector of the conditional-density model fit on respondents. This corresponds to the empirical approximation in equation (12): \[\hat f_1(y_j) \propto \sum_{k:\,\delta_k=1} f_1(y_j \mid x_k; \hat\gamma).\]
  2. Column-normalizer vector \(C\) (denoted in code as C_matrix_nieobs):

    • Size: \(n_1 \times 1\) (a column vector).
    • Entries: the column-sums of the conditional densities evaluated at respondent covariates: \[C_j = C(y_j;\hat\gamma) = \sum_{k:\,\delta_k=1} f_1(y_j \mid x^{\text{obs}}_k;\hat\gamma).\] Conceptually this is the denominator that appears when fractional weights are formed (see below).
  3. Odds matrix \(O(\phi)\) (constructed by generate_Odds):

    • Size: \(n_0 \times n_1\).

Using these objects we form a non-normalized weight matrix \[U_{ij}(\phi) = O_{ij}(\phi) \cdot F_{ij}\] and then a normalized fractional-weight matrix \[W_{ij}(\phi) = \frac{U_{ij}(\phi)}{C_j} = \frac{O_{ij}(\phi)\,f_1(y_j\mid x^{\text{un}}_i;\hat\gamma)}{\sum_{k:\,\delta_k=1} f_1(y_j\mid x^{\text{obs}}_k;\hat\gamma)}.\]

These \(W_{ij}\) match the weights appearing in the theoretical mean score approximation (equation (15) in the notes): they are the fractional contribution of each imputed \((i,j)\) pair to the conditional expectation for unit \(i\).

The vectorized score S_2 and matrix algebra

Recall the mean-score approximation that leads to the estimating equation used to solve for \(\phi\) (cf. equation (13) in the notes): \[ S_2(\phi; \hat\phi^{(t)}, \hat\gamma) = \sum_{r=1}^n \Big[ \delta_r s(\phi;\delta_r, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) + (1-\delta_r)\widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_r, x^{\text{out}}_r, Y) \mid x^{\text{out}}_r;\hat\phi^{(t)},\hat\gamma\} \Big] = 0. \]

It is convenient to decompose this as the sum of observed and unobserved contributions: \[ S_2(\phi) = S_{\text{obs}}(\phi) + S_{\text{un}}(\phi), \] where \[ S_{\text{obs}}(\phi) = \sum_{r:\,\delta_r=1} s(\phi;\delta=1, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) \] is the score contribution from respondents, and the missing-unit contribution is approximated by the discrete-support expectation \[ S_{\text{un}}(\phi) \approx \sum_{i:\,\delta_i=0} \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). \] The remainder of this section explains how the double sum in \(S_{\text{un}}(\phi)\) is computed via matrix operations.

The crucial observation for vectorization is that the inner conditional expectation for a non-respondent \(i\) is approximated by a weighted finite-sum over the respondent support \(\{y_j\}\): \[ \widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_i, x^{\text{out}}_i, Y) \mid x^{\text{out}}_i\} \approx \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). \]

Stacking the non-respondent expectations across all non-respondents gives a single matrix operation. Let

In the implementation this reduces to a sequence of dense matrix operations and row/column sums rather than explicit loops over the expanded index set of length \(n_0\times n_1\). This yields large speed and memory benefits for real datasets.

Concise vectorized S_2 recipe (conceptual):

  1. Build \(F\) (size \(n_0\times n_1\)) via generate_conditional_density_matrix(model).
  2. Build \(C\) (size \(n_1\)) via generate_C_matrix(model) by summing the conditional densities over respondents.
  3. For a candidate \(\phi\) compute \(O(\phi)\) (size \(n_0\times n_1\)) via generate_Odds(model,\phi).
  4. Form \(W(\phi)=\dfrac{O(\phi)\circ F}{\mathbf{1}_{n_0} C^\top}\) (i.e. divide each column of \(O\circ F\) by the corresponding scalar \(C_j\)).
  5. Compute observed-score sum: \(S_{\text{obs}}(\phi)=\sum_{r:\,\delta_r=1} s(\phi;\delta=1,x^{\text{miss}}_r,x^{\text{out}}_r,y_r)\) (this is small: one score vector per respondent).
  6. Compute non-respondent expected-score: use \(W(\phi)\) and parameter-wise factor matrices derived from \(s(\cdot)\) to compute \[S_{\text{un}}(\phi)=\sum_{i=1}^{n_0}\sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0,x^{\text{miss}}_i,x^{\text{out}}_i,y_j)\] implemented via matrix multiplications and row/column sums.
  7. Return the total score \(S_2(\phi)=S_{\text{obs}}(\phi) + S_{\text{un}}(\phi)\).

The root-finder then searches for \(\phi\) such that \(S_2(\phi)=0\). In the package this is implemented by repeatedly forming \(O(\phi)\) for the current candidate \(\phi\), computing \(W(\phi)\) and \(S_2(\phi)\), and letting nleqslv perform the iteration.

Mean estimation and survey weights

After the response-model parameters \(\hat\phi\) are obtained the package reports a mean estimate of the target outcome using an inverse-probability reweighting form. Let respondents be indexed by \(j=1,\dots,n_1\); denote by \(\pi_j = \pi(x^{\text{miss}}_j,y_j;\hat\phi)\) the fitted response probabilities evaluated at each respondent (here we write \(x^{\text{miss}}_j\) for the covariates used in the missingness model evaluated at respondent \(j\)). With design weights \(w_j\) (by default \(w_j\equiv 1\) for non-survey data) the point estimator computed in the implementation is \[ \hat\mu = \frac{\sum_{j=1}^{n_1} w_j\, y_j / \pi_j}{\sum_{j=1}^{n_1} w_j / \pi_j}.\tag{mean-est} \]

Notes on survey weights and \(f_1\) fitting:

In short: survey weights enter two places — (i) the conditional-density estimation (if requested) and (ii) the final mean calculation through the weighted IPW-type ratio in (mean-est).

Arguments passed to exptilt (summary)

The exptilt / exptilt.data.frame user-facing function accepts a number of arguments that control model specification and computation; the most important are:

These arguments appear in the exptilt.data.frame function signature and control how the matrices \(F\), \(C\), and \(O\) are built and how the solver is run.

Connection to the EM viewpoint

The EM-like update displayed in the theoretical notes (equation (14)) \[\hat\phi^{(t+1)} \leftarrow \text{solve } S_2(\phi \mid \hat\phi^{(t)},\hat\gamma)=0\] is exactly what the implementation achieves: for a fixed conditional-density estimate \(\hat\gamma\) and a current \(\hat\phi^{(t)}\), the expectations for the missing units are approximated by the discrete support on observed \(y_j\) and the resulting equation for \(\phi\) is solved (via root-finding). The heavy-lift is performed by the matrix calculus described earlier — constructing \(F\), \(C\), \(O\) and then computing \(W\) and multiplying by the parameter-wise score factors.

Stopping criterion (maximum-norm)

Practical optimization requires a convergence criterion. The implementation uses the maximum absolute component of the vector-valued score as a stopping rule. Concretely, if the solver produces a score vector \(S_2(\phi)\) with \[\max_{m=1,\dots,p} |S_{2,m}(\phi)| < \varepsilon\] for a user-specified stopping_threshold = \varepsilon, the algorithm treats this as converged. In the code this is used as an early-exit inside the target-function passed to the nonlinear solver: when the score’s sup-norm is below the threshold, a zero vector is returned to signal convergence and avoid further unnecessary computations.

This choice matches the intuition that the root-finder should stop when the estimating equations are (componentwise) negligibly small.

Practical tutorial: from raw data to matrix operations (conceptual steps)

  1. Fit the conditional density \(f_1(y\mid x;\gamma)\) using respondents and covariates_for_outcome. This gives a function that evaluates \(f_1(y, x)\) for any \(y\) and any covariate row \(x\) (used for both respondents and non-respondents).

  2. Evaluate the conditional density at the Cartesian product of non-respondent covariates and observed respondent \(y\) values to form \(F\) (done by generate_conditional_density_matrix). This is the empirical imputation support. Think of rows as target non-respondents and columns as candidate respondent outcomes.

  3. Evaluate the same conditional density at respondent covariates to form the column-normalizer \(C\) (done by generate_C_matrix) — this is simply the column-sum of densities over respondents.

  4. For each trial value \(\phi\) of the response-model parameters, compute the odds matrix \(O(\phi)\) using the separable linear predictor and link function (implemented efficiently via outer() in code). Combine \(O\) with \(F\) and normalize columns by \(C\) to obtain \(W(\phi)\).

  5. Use the vectorized s_function to obtain parameter-specific factor matrices for the non-respondent-imputed scores; multiply (elementwise) by \(W(\phi)\) and reduce (row/column sums) to compute the non-respondent contribution to \(S_2(\phi)\).

  6. Add the observed-respondent score and use a root-finder (e.g. nleqslv) to find \(\phi\) with \(S_2(\phi)=0\). The solver may use the maximum-norm stopping threshold described above to exit early.

Why the vectorization matters (practical remarks)

Closing notes and references

This vignette mapped the implementation functions to the math in the theoretical notes and showed how the EM-like mean-score step reduces to a small set of matrix operations. The implementation follows the ideas described in Riddles et al. (2016) for exponential tilting under NMAR: fit the conditional density on respondents, approximate the missing-data expectation by a finite sum over observed \(y\) values, and solve the resulting estimating equations for the missingness-model parameters.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.