The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette walks through the full SS-SVCQR workflow on the small
Lucas County housing subset shipped with the package
(lucas_housing_sample, \(n =
150\)). The aim is to make every step of the case study
reproducible without having to download external data: the model is the
same one used in the JSS software paper but on a subset rather than the
full \(n =
25{,}357\) panel. The replication script at the project root
reproduces the full-sample case study from
data/lucas_housing_clean.csv.
The hedonic interpretation is conventional: log sale price is regressed on global controls (log total living area, log lot size, sale-year indicators) and on candidate spatially varying age effects. Because the Lucas County region is geographically compact, longitude and latitude serve as a pragmatic spatial index for graph construction; for larger study regions, projected coordinates should be used.
library("sssvcqr")
data("lucas_housing_sample")
housing <- lucas_housing_sample
str(housing)
#> 'data.frame': 150 obs. of 8 variables:
#> $ log_price : num 11.9 11 11.4 11.1 11.3 ...
#> $ log_TLA : num 7.92 6.51 7.45 7.29 7.86 ...
#> $ log_lotsize: num 10.69 8.48 9.9 8.78 8.82 ...
#> $ age_scaled : num 0.09 0.56 0.76 0.63 0.9 0.71 0.99 0.63 1.07 0.49 ...
#> $ age2_scaled: num 0.0081 0.3136 0.5776 0.3969 0.81 ...
#> $ longitude : num -83.8 -83.5 -83.7 -83.6 -83.6 ...
#> $ latitude : num 41.7 41.7 41.6 41.6 41.7 ...
#> $ sale_year : int 1994 1997 1997 1997 1996 1995 1996 1996 1997 1998 ...
summary(housing[, c("log_price", "log_TLA", "log_lotsize", "age_scaled")])
#> log_price log_TLA log_lotsize age_scaled
#> Min. : 8.434 Min. :6.275 Min. : 7.601 Min. :0.0300
#> 1st Qu.:10.612 1st Qu.:7.020 1st Qu.: 8.434 1st Qu.:0.3700
#> Median :11.097 Median :7.207 Median : 8.716 Median :0.5650
#> Mean :10.989 Mean :7.217 Mean : 8.887 Mean :0.5703
#> 3rd Qu.:11.460 3rd Qu.:7.416 3rd Qu.: 9.113 3rd Qu.:0.7600
#> Max. :12.801 Max. :8.595 Max. :12.257 Max. :1.1700y, Z, X, and
coordinatesThe matrix interface separates always-global controls
(Z) from candidate spatially varying covariates
(X). The coordinates (u) drive both the
proximity graph and the location-indexed deviation fields.
It is good practice to query the graph before fitting, both to check
that the proximity structure is connected and to inspect the degree
distribution. build_graph_laplacian() returns the sparse
adjacency, the degree vector, the chosen Laplacian, and component
membership.
The fit below uses fixed penalties for speed; the JSS software paper tunes \((\lambda_1, \lambda_2)\) by spatially blocked cross-validation on the full sample. Tighter ADMM tolerances are appropriate for the moderate sample sizes encountered in this vignette.
fit <- ss_svcqr(
y = y, Z = Z, X = X, u = u,
tau = 0.5,
lambda1 = 3, lambda2 = 1, k_nn = 8,
control = list(max_iter = 100, warn_nonconvergence = FALSE)
)
summary(fit)
#> Sparse-smooth SVC quantile regression summary
#> n = 150 q = 4 p = 2 tau = 0.5
#> lambda1 = 3 lambda2 = 1
#> iterations = 70 converged = TRUE
#>
#> alpha:
#> [1] -67.52811362 0.92689808 0.11572866 0.03569393
#> beta_G:
#> [1] 0.2598358 -1.2339241
#> delta L2 norms:
#> [1] 0.5186381 0.0000000The deviation L2 norms make the global-versus-local decision explicit: a norm at exact zero means the group penalty has classified the corresponding candidate as global.
For a single candidate effect, the package’s plot()
method renders the local total coefficient surface over the first two
coordinate columns. Inverse-distance-weighted interpolation gives a
smooth visual summary; the observed locations are overlaid as small
reference marks.
For real analyses, the penalties should be tuned rather than fixed by hand. The example below evaluates a small grid with three spatial folds on this subset; empirical applications should use a broader grid and more iterations.
cv <- cv_ss_svcqr(
y = y, Z = Z, X = X, u = u,
tau = 0.5,
lambda1_seq = c(2, 3),
lambda2_seq = c(0.5, 1),
K_folds = 3, adaptive_weights = FALSE,
control = list(max_iter = 25, warn_nonconvergence = FALSE)
)
cv
#> Spatially blocked CV for SS-SVCQR
#> tau = 0.5
#> best lambda1 = 2
#> best lambda2 = 0.5
#> best mean check loss = 0.1643256kkt_sssvcqr() returns first-order optimality summaries
and the maximum violation of the per-component degree-weighted centering
constraints. Both should be small after a converged fit.
The predict() method returns fitted conditional
quantiles when given new \(Z\), \(X\), and \(u\) (type = "response", the
default), or local coefficient surfaces when called with
type = "coefficients". New-location deviations are
extrapolated by inverse-distance-weighted averaging of the \(k\) nearest training deviations.
?ss_svcqr,
?cv_ss_svcqr, ?build_graph_laplacian,
?kkt_sssvcqr, ?predict.sssvcqr) document each
argument and return value.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.