The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette explains the matrix-vectorized implementation of the
exponential tilting (EXPTILT) estimator used in this package. The
exposition focuses on concepts and linear-algebraic operations used in
the implementation (functions such as
generate_conditional_density_matrix,
generate_C_matrix, generate_Odds, and the
vectorized s_function), and mirrors the statistical
equations that underlie the algorithm (see Riddles et al. (2016) for the
theoretical background).
The method operates on a dataset split into respondents and non-respondents. Let \[n = n_0 + n_1\] where \(n_1\) is the number of respondents (observed \(Y\)) and \(n_0\) is the number of non-respondents (missing \(Y\)). The algorithm always treats these two groups separately: the conditional-density model is fit on respondents and used to impute support for non-respondents; the missingness model is fit using covariates that include the candidate \(y\) values.
We emphasize two distinct and disjoint sets of covariates throughout, and we require they have empty intersection: \[\mathcal{X}_{\text{outcome}} \cap \mathcal{X}_{\text{missingness}} = \varnothing.\]
Distinguishing these sets clearly in notation and code is crucial:
the conditional density model is fit using
covariates_for_outcome on respondents only, while the
missingness model uses covariates_for_missingness and
(importantly) takes \(y\) as an
additional predictor when forming \(\pi(\cdot;\phi)\).
Let:
We build three central matrices:
Conditional-density matrix \(F\)
(denoted in code as f_matrix_nieobs):
Column-normalizer vector \(C\)
(denoted in code as C_matrix_nieobs):
Odds matrix \(O(\phi)\)
(constructed by generate_Odds):
outer() to form the \(n_0\times
n_1\) matrix efficiently.Using these objects we form a non-normalized weight matrix \[U_{ij}(\phi) = O_{ij}(\phi) \cdot F_{ij}\] and then a normalized fractional-weight matrix \[W_{ij}(\phi) = \frac{U_{ij}(\phi)}{C_j} = \frac{O_{ij}(\phi)\,f_1(y_j\mid x^{\text{un}}_i;\hat\gamma)}{\sum_{k:\,\delta_k=1} f_1(y_j\mid x^{\text{obs}}_k;\hat\gamma)}.\]
These \(W_{ij}\) match the weights appearing in the theoretical mean score approximation (equation (15) in the notes): they are the fractional contribution of each imputed \((i,j)\) pair to the conditional expectation for unit \(i\).
Recall the mean-score approximation that leads to the estimating equation used to solve for \(\phi\) (cf. equation (13) in the notes): \[ S_2(\phi; \hat\phi^{(t)}, \hat\gamma) = \sum_{r=1}^n \Big[ \delta_r s(\phi;\delta_r, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) + (1-\delta_r)\widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_r, x^{\text{out}}_r, Y) \mid x^{\text{out}}_r;\hat\phi^{(t)},\hat\gamma\} \Big] = 0. \]
It is convenient to decompose this as the sum of observed and unobserved contributions: \[ S_2(\phi) = S_{\text{obs}}(\phi) + S_{\text{un}}(\phi), \] where \[ S_{\text{obs}}(\phi) = \sum_{r:\,\delta_r=1} s(\phi;\delta=1, x^{\text{miss}}_r, x^{\text{out}}_r, y_r) \] is the score contribution from respondents, and the missing-unit contribution is approximated by the discrete-support expectation \[ S_{\text{un}}(\phi) \approx \sum_{i:\,\delta_i=0} \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). \] The remainder of this section explains how the double sum in \(S_{\text{un}}(\phi)\) is computed via matrix operations.
The crucial observation for vectorization is that the inner conditional expectation for a non-respondent \(i\) is approximated by a weighted finite-sum over the respondent support \(\{y_j\}\): \[ \widetilde{E}_0\{ s(\phi;\delta, x^{\text{miss}}_i, x^{\text{out}}_i, Y) \mid x^{\text{out}}_i\} \approx \sum_{j=1}^{n_1} W_{ij}(\phi)\, s(\phi;\delta=0, x^{\text{miss}}_i, x^{\text{out}}_i, y_j). \]
Stacking the non-respondent expectations across all non-respondents gives a single matrix operation. Let
For each pair \((i,j)\) evaluate
the vector-valued score s at \((\delta=0,
x^{\text{miss}}_i, x^{\text{out}}_i, y_j)\). Collect these
quickly by exploiting the algebraic factorization of the score (see the
s_function implementation): many parameter-specific
components are separable in \(i\) and
\(j\) which allows creation of
low-memory representations.
Denote by \(S^{(0)}_{ij}\) the
p-dimensional score vector at \((i,j)\). Organize these so that for each
parameter index \(m\) we have an \(n_0\times n_1\) matrix of values \([S^{(0)}_{\bullet\bullet}]_{m}\)
(parameter-wise maps). Then the non-respondent contribution to the
overall score vector is \[
\sum_{i=1}^{n_0} \sum_{j=1}^{n_1} W_{ij}(\phi)\, S^{(0)}_{ij} = \left[
\,\text{vec}
\big( W \circ [S^{(0)}]_{1} \big)\, ,\, \ldots\, ,\, \text{vec}\big( W
\circ [S^{(0)}]_{p} \big)\,\right]^\top
\] where \(\circ\) denotes
elementwise multiplication and vec followed by an
appropriate collapse (row-sum or column-sum) implements the inner
summation depending on the parameter’s factorization. In concrete
computational terms:
For parameter components that multiply per-row (i.e. depend only on \(x_i\) times a factor that is a function of \(y_j\)) we compute elementwise products between \(W\) and a factor matrix and then row-sum across \(j\) to get an \(n_0\times 1\) contribution per non-respondent, and then sum across \(i\).
For intercept-like or column-wise components, a column-sum followed by weighted multiplication suffices.
In the implementation this reduces to a sequence of dense matrix operations and row/column sums rather than explicit loops over the expanded index set of length \(n_0\times n_1\). This yields large speed and memory benefits for real datasets.
Concise vectorized S_2 recipe (conceptual):
generate_conditional_density_matrix(model).generate_C_matrix(model) by summing the conditional
densities over respondents.generate_Odds(model,\phi).The root-finder then searches for \(\phi\) such that \(S_2(\phi)=0\). In the package this is
implemented by repeatedly forming \(O(\phi)\) for the current candidate \(\phi\), computing \(W(\phi)\) and \(S_2(\phi)\), and letting
nleqslv perform the iteration.
After the response-model parameters \(\hat\phi\) are obtained the package reports a mean estimate of the target outcome using an inverse-probability reweighting form. Let respondents be indexed by \(j=1,\dots,n_1\); denote by \(\pi_j = \pi(x^{\text{miss}}_j,y_j;\hat\phi)\) the fitted response probabilities evaluated at each respondent (here we write \(x^{\text{miss}}_j\) for the covariates used in the missingness model evaluated at respondent \(j\)). With design weights \(w_j\) (by default \(w_j\equiv 1\) for non-survey data) the point estimator computed in the implementation is \[ \hat\mu = \frac{\sum_{j=1}^{n_1} w_j\, y_j / \pi_j}{\sum_{j=1}^{n_1} w_j / \pi_j}.\tag{mean-est} \]
Notes on survey weights and \(f_1\) fitting:
design_weights or a survey design is
supplied.model$design_weights and default to 1 for non-survey
use.In short: survey weights enter two places — (i) the conditional-density estimation (if requested) and (ii) the final mean calculation through the weighted IPW-type ratio in (mean-est).
exptilt (summary)The exptilt / exptilt.data.frame
user-facing function accepts a number of arguments that control model
specification and computation; the most important are:
data: a data.frame with outcome and
covariates.formula: a partitioned formula of the form
y ~ aux1 + aux2 | miss1 + miss2 where the left part (before
|) lists covariates_for_outcome and the right
part lists covariates_for_missingness (the package helper
splits these automatically).auxiliary_means: (optional) population or target means
for auxiliary variables used for scaling.standardize: logical, whether to standardize features
before fitting.prob_model_type: character, either "logit"
or "probit" to select the response probability family.y_dens: choice of conditional-density family;
"auto", "normal", "lognormal", or
"exponential".variance_method: "delta" or
"bootstrap" for variance estimation.bootstrap_reps: number of bootstrap replications when
bootstrap is used.control: list of control parameters forwarded to the
nonlinear solver (nleqslv).stopping_threshold: numeric; the sup-norm threshold for
early stopping of the score (see above).on_failure: behavior on failure ("return"
or "error").supress_warnings: logical to silence certain
warnings.design_weights: optional vector of respondent design
weights (or full-sample weights which are subset internally).survey_design: optional survey.design
object; when provided some internal logic uses the survey path.trace_level: integer controlling verbosity.sample_size: integer for stratified subsampling used
when data are large.outcome_label, user_formula: utility
arguments used for presentation and bookkeeping.These arguments appear in the exptilt.data.frame
function signature and control how the matrices \(F\), \(C\), and \(O\) are built and how the solver is
run.
The EM-like update displayed in the theoretical notes (equation (14)) \[\hat\phi^{(t+1)} \leftarrow \text{solve } S_2(\phi \mid \hat\phi^{(t)},\hat\gamma)=0\] is exactly what the implementation achieves: for a fixed conditional-density estimate \(\hat\gamma\) and a current \(\hat\phi^{(t)}\), the expectations for the missing units are approximated by the discrete support on observed \(y_j\) and the resulting equation for \(\phi\) is solved (via root-finding). The heavy-lift is performed by the matrix calculus described earlier — constructing \(F\), \(C\), \(O\) and then computing \(W\) and multiplying by the parameter-wise score factors.
Practical optimization requires a convergence criterion. The
implementation uses the maximum absolute component of the vector-valued
score as a stopping rule. Concretely, if the solver produces a score
vector \(S_2(\phi)\) with \[\max_{m=1,\dots,p} |S_{2,m}(\phi)| <
\varepsilon\] for a user-specified
stopping_threshold = \varepsilon, the algorithm treats this
as converged. In the code this is used as an early-exit inside the
target-function passed to the nonlinear solver: when the score’s
sup-norm is below the threshold, a zero vector is returned to signal
convergence and avoid further unnecessary computations.
This choice matches the intuition that the root-finder should stop when the estimating equations are (componentwise) negligibly small.
Fit the conditional density \(f_1(y\mid
x;\gamma)\) using respondents and
covariates_for_outcome. This gives a function that
evaluates \(f_1(y, x)\) for any \(y\) and any covariate row \(x\) (used for both respondents and
non-respondents).
Evaluate the conditional density at the Cartesian product of
non-respondent covariates and observed respondent \(y\) values to form \(F\) (done by
generate_conditional_density_matrix). This is the empirical
imputation support. Think of rows as target non-respondents and columns
as candidate respondent outcomes.
Evaluate the same conditional density at respondent covariates to
form the column-normalizer \(C\) (done
by generate_C_matrix) — this is simply the column-sum of
densities over respondents.
For each trial value \(\phi\) of
the response-model parameters, compute the odds matrix \(O(\phi)\) using the separable linear
predictor and link function (implemented efficiently via
outer() in code). Combine \(O\) with \(F\) and normalize columns by \(C\) to obtain \(W(\phi)\).
Use the vectorized s_function to obtain
parameter-specific factor matrices for the non-respondent-imputed
scores; multiply (elementwise) by \(W(\phi)\) and reduce (row/column sums) to
compute the non-respondent contribution to \(S_2(\phi)\).
Add the observed-respondent score and use a root-finder
(e.g. nleqslv) to find \(\phi\) with \(S_2(\phi)=0\). The solver may use the
maximum-norm stopping threshold described above to exit early.
outer, matrix multiplications, column/row sums)
to avoid large temporary allocations.This vignette mapped the implementation functions to the math in the theoretical notes and showed how the EM-like mean-score step reduces to a small set of matrix operations. The implementation follows the ideas described in Riddles et al. (2016) for exponential tilting under NMAR: fit the conditional density on respondents, approximate the missing-data expectation by a finite sum over observed \(y\) values, and solve the resulting estimating equations for the missingness-model parameters.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.