The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Choosing a transformation: Box-Cox in practice

library(shewhartr)
library(ggplot2)

Box & Cox (1964) introduced a one-parameter family of power transformations,

\[ y(\lambda) = \begin{cases} (x^\lambda - 1)/\lambda & \lambda \neq 0 \\ \log(x) & \lambda = 0, \end{cases} \]

and a procedure for choosing \(\lambda\) by maximum likelihood. The goal is to find a scale on which the residuals are approximately normal and homoscedastic — the assumptions that classical inferential tools, including Shewhart charts, presuppose.

shewhart_box_cox() returns the profile log-likelihood, the maximiser \(\hat \lambda\), and a 95% confidence interval based on the chi-square approximation to twice the log-likelihood drop.

A textbook example

set.seed(2025)
y <- rlnorm(200, meanlog = 0, sdlog = 0.5)   # log-normal -> lambda = 0
bc <- shewhart_box_cox(y)
bc
#> 
#> ── Box-Cox profile likelihood ──────────────────────────────────────────────────
#> • n = 200
#> • lambda_hat = 0
#> • 95% CI: [-0.25, 0.2]

The optimal lambda is near zero (log transformation), and the 95% CI should cover zero. Let’s plot the profile:

autoplot(bc)

If the CI for \(\lambda\) contains 1, no transformation is needed (the data are approximately normal as is). If it contains 0, take logs. If it contains 0.5, take square roots — and so on.

Interaction with shewhart_regression(model = "auto")

The "auto" model in shewhart_regression() calls shewhart_box_cox() internally on the response (with a +1 shift to keep zeros valid) and selects among linear, log, loglog according to the value of \(\hat \lambda\):

This is a guidance step, not a guarantee. Always inspect the residual diagnostics afterwards via shewhart_diagnostics().

When not to transform

If your data are counts, proportions, times-to-event, or other quantities with a known parametric family, model that family explicitly. Box was clear about this: if you can model the right distribution, do so. Transforms exist for the case where the right distribution isn’t tractable and a normal approximation on a suitably-chosen scale is the best available compromise.

The c, u, p, and np charts in this package implement that advice: they support limits = "poisson" (or "binomial") for exact distribution-aware limits, instead of relying on a transformation to coerce counts into approximate normality.

References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.