The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Sufficient dimension reduction (SDR) reduces the dimensionality of predictors \(\mathbf{X}\) while preserving their relationship with a response \(Y\). SDR estimates a basis \(\mathbf{B}\) of the central subspace \(\mathcal{S}_{Y\mid\mathbf{X}}\) defined by \[ Y \perp \mathbf{X} \mid \mathbf{B}^\top \mathbf{X}. \] In high dimensions, sparse SDR improves interpretability and accuracy by driving many rows of \(\mathbf{B}\) to zero, which is achieved by adding a sparsity-inducing penalty to the SDR optimization problem.
The ppmSDR package implements a unified framework
for sparse SDR based on the penalized principal machine (\(\mathrm{P}^2\mathrm{M}\)). A single
front-end, ppm(), dispatches to ten loss-specific
estimators, all fitted by one group coordinate descent (GCD) engine, and
ppm_tune() selects the sparsity parameter by
cross-validation.
Given data \((y_i, \mathbf{x}_i) \in \mathbb{R} \times \mathbb{R}^p\) with centered predictors, and a sequence of cutoffs \(r_1 < \cdots < r_h\), the sample principal machine solves \[ (\beta_{0k}, \boldsymbol{\beta}_k) = \arg\min_{\beta_0, \boldsymbol{\beta}} \boldsymbol{\beta}^\top \hat{\Sigma} \boldsymbol{\beta} + \frac{c}{n} \sum_{i=1}^n L_k\!\left(\tilde{y}_{ik}, \beta_0 + \boldsymbol{\beta}^\top \mathbf{x}_i\right), \quad k = 1, \ldots, h, \] where \(\hat{\Sigma} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top / n\). The basis \(\hat{\mathbf{B}}\) is estimated by the leading \(d\) eigenvectors of \(\sum_{k=1}^h \hat{\boldsymbol{\beta}}_k \hat{\boldsymbol{\beta}}_k^\top\).
Under the sparsity assumption the slopes \(\boldsymbol{\beta}_k\) share a common support across \(k\), leading to the row-group penalized objective \[ \sum_{k=1}^{h} \left[ \boldsymbol{\beta}_k^\top \hat{\Sigma} \boldsymbol{\beta}_k + \frac{c}{n} \sum_{i=1}^n L_k(\tilde{y}_{ik}, \beta_{0k} + \boldsymbol{\beta}_k^\top \mathbf{x}_i) \right] + \sum_{j=1}^{p} p_{\lambda}\!\left(\|\boldsymbol{\beta}_{(j)}\|_2\right), \] where \(\boldsymbol{\beta}_{(j)} = (\beta_{1j}, \ldots, \beta_{hj})^\top\) and \(p_\lambda(\cdot)\) is the group LASSO, SCAD or MCP penalty. The penalty acts group-wise on all coefficients of predictor \(j\), so variable selection corresponds to identifying the predictors that form a sparse basis.
| Machine | Response | Type | Loss \(L_k(\tilde y_k, f)\) | Algorithm |
|---|---|---|---|---|
| \(\mathrm{P}^2\mathrm{LSM}\) | Continuous | RPM | \((1 - \tilde y_k f)^2\) | GCD |
| \(\mathrm{P}^2\mathrm{WLSM}\) | Binary | LPM | \(w_k(1 - y f)^2\) | GCD |
| \(\mathrm{P}^2\mathrm{LR}\) | Continuous | RPM | \(\log(1 + e^{-\tilde y_k f})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{WLR}\) | Binary | LPM | \(w_k \log(1 + e^{-y f})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{AR}\) | Both | LPM | \((y-f)^2(\rho_k\mathbb{I}\{y \ge f\} + (1-\rho_k)\mathbb{I}\{y<f\})\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{L2M}\) | Continuous | RPM | \([\max\{0, 1 - \tilde y_k f\}]^2\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{WL2M}\) | Binary | LPM | \(w_k[\max\{0, 1 - y f\}]^2\) | Iterative GCD |
| \(\mathrm{P}^2\mathrm{SVM}\) | Continuous | RPM | \(\max\{0, 1 - \tilde y_k f\}\) | MM-GCD |
| \(\mathrm{P}^2\mathrm{WSVM}\) | Binary | LPM | \(w_k \max\{0, 1 - y f\}\) | MM-GCD |
| \(\mathrm{P}^2\mathrm{QR}\) | Both | LPM | \((y-f)(\rho_k - \mathbb{I}\{y<f\})\) | MM-GCD |
The squared loss yields an exact group-penalized least-squares problem solved directly by GCD. Smooth losses (logistic, asymmetric least squares, L2-hinge) are reduced to a sequence of penalized least-squares problems via quadratic approximation (iterative GCD). Non-differentiable losses (hinge, check) are handled by quadratic majorization within a majorization-minimization scheme (MM-GCD).
The argument loss selects the estimator;
penalty is one of "grSCAD",
"grLasso" or "grMCP"; lambda
controls sparsity and is best chosen by cross-validation (see
below).
set.seed(1)
n <- 1000; p <- 10
B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1
x <- matrix(rnorm(n * p), n, p)
y <- (x %*% B[, 1]) / (0.5 + (x %*% B[, 2] + 1)^2) + 0.2 * rnorm(n)
## penalized principal least-squares SVM (P^2LSM)
fit <- ppm(x, y, H = 10, C = 1, loss = "lssvm", penalty = "grSCAD", lambda = 0.01)
round(fit$evectors[, 1:2], 3)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
#> [3,] 0 0
#> [4,] 0 0
#> [5,] 0 0
#> [6,] 0 0
#> [7,] 0 0
#> [8,] 0 0
#> [9,] 0 0
#> [10,] 0 0
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm Penalty: grSCAD lambda = 0.01
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> x1 1e+00 -1e-04
#> x2 1e-04 1e+00
#> x3 0e+00 0e+00
#> x4 0e+00 0e+00
#> x5 0e+00 0e+00
#> x6 0e+00 0e+00
#> x7 0e+00 0e+00
#> x8 0e+00 0e+00
#> x9 0e+00 0e+00
#> x10 0e+00 0e+00
#>
#> Selected variables (2 of 10): x1, x2For a binary response the weighted losses code the classes internally as \(\{-1, +1\}\).
y.binary <- sign(y)
## penalized principal weighted least-squares SVM (P^2WLSM)
fit2 <- ppm(x, y.binary, H = 10, C = 1, loss = "asls",
penalty = "grSCAD", lambda = 0.03)
round(fit2$evectors[, 1:2], 3)
#> [,1] [,2]
#> [1,] 1.000 -0.005
#> [2,] 0.005 1.000
#> [3,] 0.000 0.000
#> [4,] 0.000 0.000
#> [5,] 0.005 0.004
#> [6,] 0.000 0.000
#> [7,] 0.006 0.001
#> [8,] 0.000 0.000
#> [9,] 0.000 0.000
#> [10,] 0.000 0.000lambda by cross-validationppm_tune() performs \(K\)-fold cross-validation, selecting the
lambda that maximizes the held-out distance correlation
between the response and the estimated sufficient predictors. Call
set.seed() beforehand for reproducible folds.
set.seed(1)
cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5,
nlambda = 10, lambda.max = 0.02)
cv$opt.lambda
#> [1] 0.009283178
summary(cv$fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: lssvm Penalty: grSCAD lambda = 0.00928318
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> x1 1e+00 -3e-04
#> x2 3e-04 1e+00
#> x3 0e+00 0e+00
#> x4 0e+00 0e+00
#> x5 0e+00 0e+00
#> x6 0e+00 0e+00
#> x7 0e+00 0e+00
#> x8 0e+00 0e+00
#> x9 0e+00 0e+00
#> x10 0e+00 0e+00
#>
#> Selected variables (2 of 10): x1, x2We illustrate a weighted estimator on the Wisconsin Diagnostic Breast
Cancer (WDBC) data, coding the diagnosis as malignant = +1
and benign = -1. After fitting the penalized principal
weighted L2-SVM (P2WL2M), the predictors are projected onto the
estimated two-dimensional central subspace and plotted by class.
data(wdbc)
x <- scale(as.matrix(wdbc[, -1]))
y <- ifelse(wdbc$diagnosis == "M", 1, -1)
fit <- ppm(x, y, loss = "wl2svm", penalty = "grSCAD", lambda = 0.3)
summary(fit, d = 2)
#> Penalized Principal Machine (P2M) summary
#> Loss: wl2svm Penalty: grSCAD lambda = 0.3
#> Working dimension d = 2
#>
#> Estimated basis of the central subspace:
#> Dir1 Dir2
#> radius_mean -0.2733 -0.3393
#> texture_mean 0.0000 0.0000
#> perimeter_mean -0.2929 -0.3362
#> area_mean -0.2650 0.0599
#> smoothness_mean 0.0000 0.0000
#> compactness_mean -0.0819 -0.0947
#> concavity_mean -0.2452 0.0401
#> concave_points_mean -0.3706 0.2268
#> symmetry_mean 0.0000 0.0000
#> fractal_dimension_mean 0.0000 0.0000
#> radius_se -0.1022 0.3825
#> texture_se 0.0000 0.0000
#> perimeter_se -0.0773 0.2433
#> area_se -0.0670 0.2180
#> smoothness_se 0.0000 0.0000
#> compactness_se 0.0000 0.0000
#> concavity_se 0.0000 0.0000
#> concave_points_se 0.0000 0.0000
#> symmetry_se 0.0000 0.0000
#> fractal_dimension_se 0.0000 0.0000
#> radius_worst -0.3641 0.0832
#> texture_worst 0.0000 0.0000
#> perimeter_worst -0.3715 0.0384
#> area_worst -0.3223 0.4892
#> smoothness_worst 0.0000 0.0000
#> compactness_worst -0.0743 -0.0829
#> concavity_worst -0.1698 -0.2341
#> concave_points_worst -0.3679 -0.3794
#> symmetry_worst 0.0000 0.0000
#> fractal_dimension_worst 0.0000 0.0000
#>
#> Selected variables (15 of 30): radius_mean, perimeter_mean, area_mean, compactness_mean, concavity_mean, concave_points_mean, radius_se, perimeter_se, area_se, radius_worst, perimeter_worst, area_worst, compactness_worst, concavity_worst, concave_points_worst
B <- fit$evectors[, 1:2]
scores <- x %*% B
plot(scores[, 1], scores[, 2],
col = ifelse(y == 1, "red", "blue"),
pch = ifelse(y == 1, 17, 1),
xlab = "1st SDR direction", ylab = "2nd SDR direction")
legend("topright", legend = c("malignant (+1)", "benign (-1)"),
col = c("red", "blue"), pch = c(17, 1))The Boston housing data (data(boston)) provide a
continuous-response counterpart for the response-based estimators such
as "lssvm".
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.