The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This tutorial is written for applied researchers across education, economics, sociology, political science, public health, and adjacent fields, who suspect that one regression line is hiding more than one story, and who care about more than the average. You do not need prior exposure to quantile regression or to mixture models. Each idea is introduced in words before any equation, every analysis step is shown in runnable code, and every result is visualized.
By the end you will be able to: recognize when a mixture of quantile regressions is the right tool; fit one with mixqr; read and visualize the estimates; classify observations into the recovered subgroups; check whether the answer can be trusted; describe effects across the whole outcome distribution; and report the analysis reproducibly.
This tutorial covers the core of the framework. mixqr is built on a single EM platform with an engine/extension contract, so the same machinery also supports expectile and M-quantile component families, component-specific penalized selection, and non-crossing multi-quantile estimation (see the Extensions article), while the companion package mixqrgate adds location-varying gating.
Most regression models a single relationship: “a one-unit change in x moves y by β.” But populations are often mixtures of unobserved subgroups, each with its own relationship. A few examples from the social sciences:
When you fit one line to a mixture, you get a number that averages away the groups and mirrors none of them. A finite mixture of regressions instead assumes the data come from a small number of latent groups (called components or regimes) and estimates a separate regression in each, discovering the grouping and the group-specific effects jointly, without a pre-existing label (McLachlan and Peel 2000).
mixqr does this for quantile
regressions rather than mean regressions. That buys two things,
explained next.
Why quantiles. A quantile regression at level \(\tau\) models the conditional \(\tau\)-th quantile of the outcome (the median at \(\tau = 0.5\), a lower tail at \(\tau = 0.1\), an upper tail at \(\tau = 0.9\)) instead of the mean (Koenker and Bassett 1978; Koenker 2005). The median is far more robust to skew and outliers than the mean, and fitting several quantiles reveals effects that a mean model cannot: a predictor may lift the bottom of the distribution while leaving the top alone.
Why a mixture. Within each latent regime,
mixqr fits a quantile regression; across regimes, a
mixing probability says how prevalent each regime is. Formally,
with a latent class \(Z \in \{1, \dots,
m\}\) and \(\Pr(Z = j) =
\pi_j\), the conditional \(\tau\)-th quantile in regime \(j\) is \[
Q_\tau(Y \mid x, Z = j) = x'\beta_j , \] and the components
are estimated by an EM algorithm so that the grouping and the \(\beta_j\) are found together (Wu and Yao
2016).
Why not simpler tools?
| Approach | What it misses |
|---|---|
| OLS / single mean regression | averages the regimes away; sensitive to skew |
| A single quantile regression | robust and distributional, but still one line |
| Cluster first (k-means), then regress | two-stage; clusters ignore the relationship, and the second-stage SEs ignore the clustering uncertainty |
mixqr |
finds regimes and their quantile-specific effects jointly, with calibrated uncertainty |
To keep everything reproducible and transparent, we simulate
a student- achievement example with a known structure, then let
mixqr recover it. (You would use your own data; the
workflow is identical.)
We generate 600 students. Each belongs to one of two unobserved school regimes:
Two features make this a job for quantile methods. Scores are right-skewed (a few unusually high achievers), so the median is a steadier summary than the mean. Within each regime the spread of scores grows with SES, so SES affects the top of the distribution differently from the bottom, a pattern a mean model cannot express.
simulate_schools <- function(n = 600, pi_A = 0.6) {
ses <- rnorm(n) # standardized SES
regime <- ifelse(runif(n) < pi_A, "A", "B")
err <- exp(rnorm(n, 0, 0.45)) - 1 # right-skewed, median 0
spread <- ifelse(regime == "A", 6, 9) * exp(0.30 * ses) # grows with SES
score <- ifelse(regime == "A",
56 + 8.0 * ses, # supportive schools
46 + 3.0 * ses) + spread * err
data.frame(score = score, ses = ses, regime = factor(regime))
}
schools <- simulate_schools()
head(schools)
#> score ses regime
#> 1 48.77229 0.52058907 B
#> 2 46.61438 -1.07969076 A
#> 3 55.72871 0.13923812 A
#> 4 58.51637 -0.08474878 A
#> 5 53.17665 -0.66663962 A
#> 6 40.02345 -2.51608903 BThe raw scatter already hints at two fans of points, but a single median line (dashed) splits the difference and represents neither regime:
single <- rq(score ~ ses, tau = 0.5, data = schools)
ggplot(schools, aes(ses, score)) +
geom_point(alpha = 0.35, colour = "grey30") +
geom_abline(intercept = coef(single)[1], slope = coef(single)[2],
linewidth = 1.1, linetype = "dashed") +
labs(x = "Family SES (standardized)", y = "Math score",
title = "One median line hides the subgroups",
subtitle = "A single quantile regression averages two regimes together") +
theme_mixqr()We fit a two-component mixture of median regressions. We ask for stochastic-EM standard errors (the recommended, calibrated method; see Section 8) and several restarts because the mixture likelihood is multimodal.
fit <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
nstart = 25, variance = "stochEM",
vcontrol = mixqr_vcontrol(B = 200, seed = 1))
fit
#> Mixture of quantile regressions (mixqr)
#> engine: ald tau = 0.5 components m = 2 n = 600
#> converged: TRUE in 29 iterations
#>
#> Mixing probabilities (pi):
#> comp1 comp2
#> 0.2626 0.7374
#>
#> Component coefficients (beta):
#> comp1 comp2
#> (Intercept) 44.8695 55.3545
#> ses 2.2444 7.7639
#>
#> logLik = -1844.296 AIC = 3702.59 BIC = 3733.37Read the output as two regressions plus the mixing probabilities.
summary() adds standard errors, \(z\)-values, and the diagnostics discussed
later:
summary(fit)
#> Mixture of quantile regressions (mixqr) -- summary
#> engine: ald tau = 0.5 m = 2 n = 600
#>
#> Component 1 (pi = 0.2626):
#> Estimate Std.Err z value Pr(>|z|)
#> (Intercept) 44.8695 0.3719 120.649 < 2e-16 ***
#> ses 2.2444 0.3879 5.786 7.21e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Component 2 (pi = 0.7374):
#> Estimate Std.Err z value Pr(>|z|)
#> (Intercept) 55.3545 0.2026 273.27 <2e-16 ***
#> ses 7.7639 0.1926 40.31 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Missing-information fraction (separability): 0.242
#> Responsibility overlap (0 = separated, 1 = overlapping): 0.293
#>
#> logLik = -1844.296 (ALD working likelihood) AIC = 3702.59 BIC = 3733.37The estimated components recover the design: one regime with a high intercept and a steep SES slope (the supportive schools), and one with a lower intercept and a shallower slope (the under-resourced schools). Components are ordered by slope, so component 1 is always the shallower-gradient regime.
The single most useful picture overlays the component median lines on the data, colored by each student’s most likely regime.
# Components are ordered by SES slope, so component 1 is the shallower-gradient
# (under-resourced) regime and component 2 the steeper (supportive) one.
comp_label <- c("Under-resourced schools", "Supportive schools")
pal2 <- c("Under-resourced schools" = "#e07b39",
"Supportive schools" = "#1b6ca8")
schools$regime_hat <- factor(comp_label[predict(fit, type = "class")],
levels = comp_label)
xseq <- seq(min(schools$ses), max(schools$ses), length.out = 100)
lines_df <- do.call(rbind, lapply(1:2, function(j) {
data.frame(ses = xseq,
score = as.numeric(cbind(1, xseq) %*% fit$beta[, j]),
regime = factor(comp_label[j], levels = comp_label))
}))
ggplot(schools, aes(ses, score)) +
geom_point(aes(colour = regime_hat), alpha = 0.45, size = 1.8) +
geom_line(data = lines_df, aes(colour = regime), linewidth = 1.3) +
scale_colour_manual(values = pal2, name = "Recovered regime") +
labs(x = "Family SES (standardized)", y = "Math score",
title = "Two SES–achievement gradients, recovered jointly",
subtitle = "Component median regressions from mixqr") +
theme_mixqr()For reporting, a coefficient plot with confidence intervals
communicates the substantive contrast between regimes at a glance. We
pull the component coefficients and their intervals straight from
confint().
ci <- confint(fit) # Wald intervals from stochastic-EM SEs
keep <- grep("^comp", rownames(ci))
coef_df <- data.frame(
label = rownames(ci)[keep],
est = c(as.numeric(fit$beta)),
lo = ci[keep, 1], hi = ci[keep, 2]
)
comp_idx <- as.integer(sub("comp(\\d+):.*", "\\1", coef_df$label))
coef_df$regime <- factor(comp_label[comp_idx], levels = comp_label)
coef_df$term <- sub("comp\\d+:", "", coef_df$label)
coef_df$term <- factor(coef_df$term, labels = c("Intercept", "SES slope"))
ggplot(coef_df, aes(est, regime, colour = regime)) +
geom_pointrange(aes(xmin = lo, xmax = hi), linewidth = 0.9, size = 0.7) +
facet_wrap(~ term, scales = "free_x") +
scale_colour_manual(values = pal2, guide = "none") +
labs(x = "Estimate (95% CI)", y = NULL,
title = "Regime-specific effects with uncertainty",
subtitle = "The SES gradient is far steeper in the supportive regime") +
theme_mixqr()Mixture models give soft memberships: each
observation has a posterior probability of belonging to each regime.
Hard labels (type = "class") take the most likely regime,
but the probabilities themselves are informative: many points are
assigned with near-certainty, a few are ambiguous.
post <- predict(fit, type = "posterior")
schools$p_supportive <- post[, 2] # component 2 = supportive (steeper) regime
ggplot(schools, aes(p_supportive)) +
geom_histogram(bins = 30, fill = "#1b6ca8", colour = "white") +
labs(x = "Posterior probability of the supportive regime",
y = "Students",
title = "Most students are classified with confidence",
subtitle = "Mass near 0 and 1 means confident assignments") +
theme_mixqr()Here about 75% of students are assigned to a regime with probability above 0.9. Only a few sit in the ambiguous middle. The package summarizes the separation in one number, the responsibility overlap (0 = perfectly separated, 1 = fully overlapping), moderate in this example:
Two checks and a caveat matter for a mixture of quantile regressions.
mixqr’s semiparametric engine estimates each regime’s
error distribution non-parametrically, constrained so its \(\tau\)-th quantile is exactly zero (Hall and Presnell
1999). Plotting these densities shows the shape of each
regime’s residuals and confirms the constraint (the mass below 0 equals
\(\tau\)).
fit_kd <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
engine = "kdEM", init = "ald", nstart = 15)
rng <- range(vapply(fit_kd$density, function(d) range(d$grid), numeric(2)))
tt <- seq(rng[1], rng[2], length.out = 400)
dens_df <- do.call(rbind, lapply(1:2, function(j) {
data.frame(resid = tt, density = fit_kd$density[[j]]$eval(tt),
regime = factor(comp_label[j], levels = comp_label))
}))
ggplot(dens_df, aes(resid, density, colour = regime)) +
geom_line(linewidth = 1.1) +
geom_vline(xintercept = 0, linetype = 3, colour = "grey40") +
scale_colour_manual(values = pal2, name = "Regime") +
labs(x = "Residual", y = "Density",
title = "Each regime has its own (skewed) error distribution",
subtitle = "Estimated by the Wu & Yao kernel-density EM; tau-quantile fixed at 0") +
theme_mixqr()The fast "ald" engine assumes a parametric (asymmetric
Laplace) error; the "kdEM" engine estimates the error
shape. When they give similar component lines, you can trust the
parametric default; when they diverge, the data have structure the
parametric model misses. Here they agree closely.
The semiparametric estimator can be biased when the regimes overlap heavily and the errors are strongly asymmetric (a property of the estimating equations, not a bug; see Wu and Yao (2016), sec. 6). The overlap diagnostic above is your early warning, and cross-checking the two engines is the practical guard. With well-separated regimes (the common, useful case), the estimates are reliable.
mixqr offers two standard-error methods:
variance = "sparsity" — fast, but conditional
on the classification: it treats group membership as known and
therefore understates uncertainty. summary() flags
this.variance = "stochEM" — a stochastic-EM
multiple-imputation estimator (Wu and Yao 2016, Alg. 3.1) that
propagates the classification and mixing uncertainty. Use this method
for inference. A Monte-Carlo benchmark (see the Validation
article) shows its intervals reach roughly nominal 95% coverage for
the regression coefficients.confint(fit)
#> 2.5 % 97.5 %
#> comp1:(Intercept) 44.1405626 45.598387
#> comp1:ses 1.4841146 3.004633
#> comp2:(Intercept) 54.9574726 55.751493
#> comp2:ses 7.3864166 8.141325
#> pi1 0.2140562 0.311070One caveat: the mixing-probability estimate carries a finite-sample bias, so its interval is correctly sized but can sit off-center. Here the true split is 60/40, yet the estimate is about 74/26 — a live reminder to report \(\pi\) as an approximate split, not to the decimal.
The real payoff of quantile mixtures is describing each regime across the outcome distribution, beyond its median. We refit at \(\tau = 0.1, 0.5, 0.9\) and draw each regime’s conditional-quantile lines. (Components are slope-ordered, so they align across \(\tau\).)
taus <- c(0.1, 0.5, 0.9)
multi <- do.call(rbind, lapply(taus, function(tt) {
f <- mixqr(score ~ ses, data = schools, tau = tt, m = 2, nstart = 15)
do.call(rbind, lapply(1:2, function(j) {
data.frame(ses = xseq,
score = as.numeric(cbind(1, xseq) %*% f$beta[, j]),
regime = factor(comp_label[j], levels = comp_label), tau = tt)
}))
}))
ggplot(schools, aes(ses, score)) +
geom_point(alpha = 0.18, colour = "grey45", size = 1.2) +
geom_line(data = multi,
aes(colour = regime, group = interaction(regime, tau),
linewidth = factor(tau))) +
scale_colour_manual(values = pal2, name = "Regime") +
scale_linewidth_manual(values = c("0.1" = 0.5, "0.5" = 1.2, "0.9" = 0.5),
name = "Quantile") +
labs(x = "Family SES (standardized)", y = "Math score",
title = "Conditional quantile bands by regime",
subtitle = "Wider bands at high SES reveal growing within-regime inequality") +
theme_mixqr()The vertical spread between a regime’s 10th- and 90th-percentile
lines is its within-regime inequality at a given SES. Where
that gap widens with SES, the top and bottom of the regime pull apart —
a distributional finding invisible to a mean model. (Fitting several
\(\tau\) separately can in principle
let a regime’s quantile lines cross; jointly enforcing non-crossing is
the role of a companion extension to mixqr.)
We assumed two regimes. In practice you choose m.
mixqr_select() compares component counts. Cross-validated
predictive log-likelihood ("cv") is engine- agnostic and
penalizes complexity; AIC/BIC are available
too (with the usual caveat that mixture-component counts sit on a
parameter-space boundary).
sel <- mixqr_select(score ~ ses, data = schools, tau = 0.5, m = 1:4,
criterion = "cv", folds = 5, nstart = 8,
control = mixqr_control(seed = 1))
sel$table
#> m cv_negloglik
#> 1 1 1977.857
#> 2 2 1850.712
#> 3 3 1849.940
#> 4 4 1856.446
ggplot(sel$table, aes(m, cv_negloglik)) +
geom_line(colour = "#1b6ca8") +
geom_point(size = 3, colour = "#1b6ca8") +
geom_point(data = sel$table[sel$table$m == 2, ],
size = 5, shape = 21, fill = "#e07b39", colour = "#e07b39") +
labs(x = "Number of components (m)", y = "CV held-out negative log-likelihood",
title = "Choosing the number of regimes",
subtitle = "Lower is better; the elbow at two regimes is the parsimonious choice") +
theme_mixqr()The sharp drop from one to two components, then the near-flat stretch beyond, is the signature of two genuine regimes. The strict cross-validation minimum here falls at m = 3 (1850 versus 1851 at m = 2 — a rounding-error apart, while m = 1 is far worse). Read the elbow, not the bare minimum: two regimes is the parsimonious, defensible choice.
A complete, reproducible write-up needs little:
m and how it was chosen, the engine,
and the number of restarts;# everything needed to reproduce the headline fit
fit <- mixqr(score ~ ses, data = schools, tau = 0.5, m = 2,
nstart = 25, variance = "stochEM",
vcontrol = mixqr_vcontrol(B = 200, seed = 1))
summary(fit)That is the whole arc. One model call turned “one line, two stories” into two regimes, their quantile-specific effects, calibrated intervals, and diagnostics that say when to trust them. The same few lines of code scale from this teaching example to your own data.
If mixqr supports your research, please cite the package
and the underlying method:
Venkitasubramanian, K. (2026). mixqr: Finite Mixtures of Quantile Regressions. R package version 0.1.0.9000. https://github.com/kvenkita/mixqr
Wu, Q. & Yao, W. (2016). Mixtures of quantile regressions. Computational Statistics & Data Analysis, 93, 162–176.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.