The ‘residuals.em.glm’ method returns two types of residuals, the deviance and pearson. The residuals are defined using standard GLM theory where the log-likelihood of an observation is expressed in the form:
\[ l(y_i) = \frac{w_i(y_i\theta_i - b(\theta_i))}{\phi} + c(y_i, \phi) \]
Where \(y_i\) is the observed value, \(w_i\) are the observed weights, \(\phi\) is the dispersion parameter (currently assumed to be 1 for the models implemented), and \(b\) and \(c\) are distribution-dependent functions. The function \(b\) is related to \(y_i\) such that \(\mu_i = E(y_i) = b'(\theta_i)\).
The deviance residuals, \(d_i\), are defined as:
\[d_i = sign(y_i - \mu_i) \sqrt{2 \left( l_{sat.}(y_i) - l_{model}(y_i) \right) }\]
where \(l_{sat.}\) is the ‘saturated model’ such that each value of \(\mu_i = y_i\) and \(l_{model}\) is the likelihood of \(y_i\) for the predicted value of \(\mu_i\). We will not go into the relationship between \(\mu_i\), the covariates \(X\), and the fit parameters \(B\) here.
The deviance residuals are implemented as:
rho <- x %*% B
mu <- family()$linkinv(rho)
# Poisson deviance
abs(y - w * mu) * sqrt(2 * (dpois(y, y, log = TRUE) - dpois(y, w * mu, log = TRUE)))
# Binomial deviance
abs(y - w * mu) * sqrt(2 * (dbinom(y, w, y / w, log = TRUE) - dbinom(y, w, mu, log = TRUE)))
where x is the input data, B is the optimal parameters, w is the weights, y is the target values and family is (currently) either poisson or binom.
The Pearson’s residuals, \(r_i\), when the weights all equal 1 are defined as:
\[ r_i = \frac{y_i - \mu_i}{\sqrt{Var(y_i)}} = \frac{y_i - \mu_i}{\sqrt{V(\mu_i)}}\]
where \(V(\mu)\) is the variance function of the GLM and depends on the underlying distribution.
Where the models are weighted (either if we look at rate of successes for a binomial process or have an exposure for a Poisson process) the Pearson residuals become:
\[ r_i = \frac{y_i / w_i -\mu_i}{\sqrt{Var(y_i)}} = \frac{y_i/w_i - \mu_i}{\sqrt{V(\mu_i)/w_i}} = \frac{y - w_i \mu_i}{\sqrt{w_i V(\mu_i)}}\]
The relationship between \(Var(y_i)\) and \(V(\mu_i)\) arises from taking the differentials of \(l\) wrt. \(\theta\). The first differential gives rise to the score vector \(u\):
\[ \frac{d l}{d \theta} = u = \frac{w_i (y_i - b'(\theta))}{\phi}\]
and second differential:
\[ \frac{d^2 l}{d \theta^2} = -\frac{w_i b''(\theta)}{\phi} \]
Combining these two results with Bartlett’s second identity:
\[ Var_\theta\left[ \frac{d l}{d\theta} \right] = - E_\theta\left[ \frac{d^2 l}{d\theta^2} \right] \]
We can deduce that:
\[ Var_\theta\left[ \frac{w_i (y_i - b'(\theta))}{\phi} \right] = - E_\theta\left[ -\frac{w_i b''(\theta)}{\phi} \right] \]
\[ \frac{w_i^2 Var(y_i) }{\phi^2} = \frac{w_i b''(\theta)}{\phi} \]
\[ Var(y_i) = \frac{\phi b''(\theta)}{w_i} = \frac{\phi V(\theta)}{w_i} \]
In the stats package each distribution family carries its own variance function defined relative to \(\mu\) not \(\theta\), \(V(\mu) = b''(b'^{-1}(\mu))\). Hence we use:
\[ Var(y_i) = \frac{\phi V(\mu)} {w_i} \]