| Type: | Package |
| Title: | Examples for Integrating Prediction Error Estimation into Regression Models |
| Version: | 0.1.1 |
| Date: | 2021-11-03 |
| Depends: | R (≥ 2.14.1), perry (≥ 0.3.0), robustbase |
| Imports: | stats, quantreg, lars |
| Description: | Examples for integrating package 'perry' for prediction error estimation into regression models. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LazyLoad: | yes |
| Author: | Andreas Alfons [aut, cre] |
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.1.2 |
| NeedsCompilation: | no |
| Packaged: | 2021-11-03 10:28:20 UTC; andreas |
| Repository: | CRAN |
| Date/Publication: | 2021-11-03 11:20:02 UTC |
Examples for Integrating Prediction Error Estimation into Regression Models
Description
Examples for integrating package 'perry' for prediction error estimation into regression models.
Details
The DESCRIPTION file:
| Package: | perryExamples |
| Type: | Package |
| Title: | Examples for Integrating Prediction Error Estimation into Regression Models |
| Version: | 0.1.1 |
| Date: | 2021-11-03 |
| Depends: | R (>= 2.14.1), perry (>= 0.3.0), robustbase |
| Imports: | stats, quantreg, lars |
| Description: | Examples for integrating package 'perry' for prediction error estimation into regression models. |
| License: | GPL (>= 2) |
| LazyLoad: | yes |
| Authors@R: | person("Andreas", "Alfons", email = "alfons@ese.eur.nl", role = c("aut", "cre")) |
| Author: | Andreas Alfons [aut, cre] |
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.1.2 |
Index of help topics:
Bundesliga Austrian Bundesliga football player data
TopGearMPG Top Gear fuel consumption data
ladlasso LAD-lasso with penalty parameter selection
lasso Lasso with penalty parameter selection
perry-methods Resampling-based prediction error for fitted
models
perryExamples-package Examples for Integrating Prediction Error
Estimation into Regression Models
ridge Ridge regression with penalty parameter
selection
Author(s)
Andreas Alfons [aut, cre]
Maintainer: Andreas Alfons <alfons@ese.eur.nl>
Austrian Bundesliga football player data
Description
The data set contains information on the market value of midfielders and forwards in the Austrian Bundesliga, together with player information and performance measures. The data are collected for the (still ongoing) 2013/14 season, with performance measures referring to competitions on the Austrian level (Bundesliga, Cup) or the European level (UEFA Champions League, UEFA Europa League). Only players with complete information are included in the data set.
Usage
data("Bundesliga")
Format
A data frame with 123 observations on the following 20 variables.
Playerfactor; the player's name.
Teamfactor; the player's team.
MarketValuenumeric; the player's market value (in Euros).
Agenumeric; the player's age (in years).
Heightnumeric; the player's height (in cm).
Foreigna dummy variable indicating whether the player is foreign or Austrian.
Forwarda dummy variable indicating whether the player is a forward or midfielder.
BothFeeta dummy variable indicating whether the player is equally strong with both feet or has one stronger foot.
AtClubnumeric; the number of seasons the player is with his current club (at the upcoming transfer window).
Contractnumeric; the remaining number of seasons in the player's contract (at the upcoming transfer window).
Matchesnumeric; the number of matches in which the player was on the field.
Goalsnumeric; the number of goals the player scored.
OwnGoalsnumeric; the number of own goals the player scored.
Assistsnumeric; the number of assists the player gave.
Yellownumeric; the number of yellow cards the player received.
YellowRednumeric; the number of times the player was sent off with two yellow cards within one game.
Rednumeric; the number of times the player was sent off with a red card.
SubOnnumeric; the number of times the player was substituted on.
SubOffnumeric; the number of times the player was substituted off.
Minutesnumeric; the total number of minutes the player was on the field.
Source
The data were scraped from http://www.transfermarkt.com on 2014-03-02.
Examples
data("Bundesliga")
summary(Bundesliga)
Top Gear fuel consumption data
Description
The data set contains information on fuel consumption of cars featured on the website of the popular BBC television show Top Gear, together with car information and performance measures. Only cars with complete information are included in the data set.
Usage
data("TopGearMPG")
Format
A data frame with 255 observations on the following 11 variables.
Makerfactor; the car maker.
Modelfactor; the car model.
Typefactor; the exact model type.
MPGnumeric; the combined fuel consuption (urban + extra urban; in miles per gallon).
Cylindersnumeric; the number of cylinders in the engine.
Displacementnumeric; the displacement of the engine (in cc).
BHPnumeric; the power of the engine (in bhp).
Torquenumeric; the torque of the engine (in lb/ft).
Accelerationnumeric; the time it takes the car to get from 0 to 62 mph (in seconds).
TopSpeednumeric; the car's top speed (in mph).
Weightnumeric; the car's curb weight (in kg).
Source
The data were scraped from http://www.topgear.com/uk/ on 2014-02-24.
Examples
data("TopGearMPG")
plot(TopGearMPG[, -(1:3)])
LAD-lasso with penalty parameter selection
Description
Fit LAD-lasso models and select the penalty parameter by estimating the
respective prediction error via (repeated) K-fold cross-validation,
(repeated) random splitting (also known as random subsampling or Monte Carlo
cross-validation), or the bootstrap.
Usage
ladlasso(
x,
y,
lambda,
standardize = TRUE,
intercept = TRUE,
splits = foldControl(),
cost = mape,
selectBest = c("hastie", "min"),
seFactor = 1,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
ladlasso.fit(x, y, lambda, standardize = TRUE, intercept = TRUE, ...)
Arguments
x |
a numeric matrix containing the predictor variables. |
y |
a numeric vector containing the response variable. |
lambda |
for |
standardize |
a logical indicating whether the predictor variables
should be standardized to have unit MAD (the default is |
intercept |
a logical indicating whether a constant term should be
included in the model (the default is |
splits |
an object giving data splits to be used for prediction error
estimation (see |
cost |
a cost function measuring prediction loss (see
|
selectBest, seFactor |
arguments specifying a criterion for selecting
the best model (see |
ncores, cl |
arguments for parallel computing (see
|
seed |
optional initial seed for the random number generator (see
|
... |
for |
Value
For ladlasso, an object of class "perryTuning", see
perryTuning). It contains information on the
prediction error criterion, and includes the final model with the optimal
tuning paramter as component finalModel.
For ladlasso.fit, an object of class ladlasso with the
following components:
lambdanumeric; the value of the penalty parameter.
coefficientsa numeric vector containing the coefficient estimates.
fitted.valuesa numeric vector containing the fitted values.
residualsa numeric vector containing the residuals.
standardizea logical indicating whether the predictor variables were standardized to have unit MAD.
intercepta logical indicating whether the model includes a constant term.
muXa numeric vector containing the medians of the predictors.
sigmaXa numeric vector containing the MADs of the predictors.
muYnumeric; the median of the response.
callthe matched function call.
Author(s)
Andreas Alfons
References
Wang, H., Li, G. and Jiang, G. (2007) Robust regression shrinkage and consistent variable selection through the LAD-lasso. Journal of Business & Economic Statistics, 25(3), 347–355.
See Also
Examples
## load data
data("Bundesliga")
Bundesliga <- Bundesliga[, -(1:2)]
f <- log(MarketValue) ~ Age + I(Age^2) + .
mf <- model.frame(f, data=Bundesliga)
x <- model.matrix(terms(mf), mf)[, -1]
y <- model.response(mf)
## set up repeated random splits
splits <- splitControl(m = 40, R = 10)
## select optimal penalty parameter
lambda <- seq(40, 0, length.out = 20)
fit <- ladlasso(x, y, lambda = lambda, splits = splits, seed = 2014)
fit
## plot prediction error results
plot(fit, method = "line")
Lasso with penalty parameter selection
Description
Fit lasso models and select the penalty parameter by estimating the
respective prediction error via (repeated) K-fold cross-validation,
(repeated) random splitting (also known as random subsampling or Monte Carlo
cross-validation), or the bootstrap.
Usage
lasso(
x,
y,
lambda = seq(1, 0, length.out = 50),
mode = c("fraction", "lambda"),
standardize = TRUE,
intercept = TRUE,
splits = foldControl(),
cost = rmspe,
selectBest = c("hastie", "min"),
seFactor = 1,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
lasso.fit(
x,
y,
lambda = seq(1, 0, length.out = 50),
mode = c("fraction", "lambda"),
standardize = TRUE,
intercept = TRUE,
...
)
Arguments
x |
a numeric matrix containing the predictor variables. |
y |
a numeric vector containing the response variable. |
lambda |
for |
mode |
a character string specifying the type of penalty parameter. If
|
standardize |
a logical indicating whether the predictor variables
should be standardized to have unit variance (the default is |
intercept |
a logical indicating whether a constant term should be
included in the model (the default is |
splits |
an object giving data splits to be used for prediction error
estimation (see |
cost |
a cost function measuring prediction loss (see
|
selectBest, seFactor |
arguments specifying a criterion for selecting
the best model (see |
ncores, cl |
arguments for parallel computing (see
|
seed |
optional initial seed for the random number generator (see
|
... |
for |
Value
For lasso, an object of class "perryTuning", see
perryTuning). It contains information on the
prediction error criterion, and includes the final model with the optimal
tuning paramter as component finalModel.
For lasso.fit, an object of class lasso with the following
components:
lambdanumeric; the value of the penalty parameter.
coefficientsa numeric vector containing the coefficient estimates.
fitted.valuesa numeric vector containing the fitted values.
residualsa numeric vector containing the residuals.
standardizea logical indicating whether the predictor variables were standardized to have unit variance.
intercepta logical indicating whether the model includes a constant term.
muXa numeric vector containing the means of the predictors.
sigmaXa numeric vector containing the standard deviations of the predictors.
munumeric; the mean of the response.
callthe matched function call.
Author(s)
Andreas Alfons
References
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.
See Also
Examples
## load data
data("Bundesliga")
Bundesliga <- Bundesliga[, -(1:2)]
f <- log(MarketValue) ~ Age + I(Age^2) + .
mf <- model.frame(f, data=Bundesliga)
x <- model.matrix(terms(mf), mf)[, -1]
y <- model.response(mf)
## set up repeated random splits
splits <- splitControl(m = 40, R = 10)
## select optimal penalty parameter
fit <- lasso(x, y, splits = splits, seed = 2014)
fit
## plot prediction error results
plot(fit, method = "line")
Resampling-based prediction error for fitted models
Description
Estimate the prediction error of a fitted model via (repeated) K-fold
cross-validation, (repeated) random splitting (also known as random
subsampling or Monte Carlo cross-validation), or the bootstrap. Methods are
available for least squares fits computed with lm as
well as for the following robust alternatives: MM-type models computed with
lmrob and least trimmed squares fits computed with
ltsReg.
Usage
## S3 method for class 'lm'
perry(
object,
splits = foldControl(),
cost = rmspe,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
## S3 method for class 'lmrob'
perry(
object,
splits = foldControl(),
cost = rtmspe,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
## S3 method for class 'lts'
perry(
object,
splits = foldControl(),
fit = c("reweighted", "raw", "both"),
cost = rtmspe,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
Arguments
object |
the fitted model for which to estimate the prediction error. |
splits |
an object of class |
cost |
a cost function measuring prediction loss. It should expect
the observed values of the response to be passed as the first argument and
the predicted values as the second argument, and must return either a
non-negative scalar value, or a list with the first component containing
the prediction error and the second component containing the standard
error. The default is to use the root mean squared prediction error
for the |
ncores |
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to |
cl |
a parallel cluster for parallel computing as generated by
|
seed |
optional initial seed for the random number generator (see
|
... |
additional arguments to be passed to the prediction loss
function |
fit |
a character string specifying for which fit to estimate the
prediction error. Possible values are |
Value
An object of class "perry" with the following components:
pea numeric vector containing the estimated prediction errors. For the
"lm"and"lmrob"methods, this is a single numeric value. For the"lts"method, this contains one value for each of the requested fits. In case of more than one replication, those are average values over all replications.sea numeric vector containing the estimated standard errors of the prediction loss. For the
"lm"and"lmrob"methods, this is a single numeric value. For the"lts"method, this contains one value for each of the requested fits.repsa numeric matrix containing the estimated prediction errors from all replications. For the
"lm"and"lmrob"methods, this is a matrix with one column. For the"lts"method, this contains one column for each of the requested fits. However, this is only returned in case of more than one replication.splitsan object giving the data splits used to estimate the prediction error.
ythe response.
yHata list containing the predicted values from all replications.
callthe matched function call.
Note
The perry methods extract the data from the fitted model and
call perryFit to perform resampling-based prediction
error estimation.
Author(s)
Andreas Alfons
See Also
Examples
## load data
data("Bundesliga")
n <- nrow(Bundesliga)
## fit linear model
Bundesliga$logMarketValue <- log(Bundesliga$MarketValue)
fit <- lm(logMarketValue ~ Contract + Matches + Goals + Assists,
data=Bundesliga)
## perform K-fold cross-validation
perry(fit, foldControl(K = 5, R = 10), seed = 1234)
## perform random splitting
perry(fit, splitControl(m = n/3, R = 10), seed = 1234)
## perform bootstrap prediction error estimation
# 0.632 estimator
perry(fit, bootControl(R = 10, type = "0.632"), seed = 1234)
# out-of-bag estimator
perry(fit, bootControl(R = 10, type = "out-of-bag"), seed = 1234)
Ridge regression with penalty parameter selection
Description
Fit ridge regression models and select the penalty parameter by estimating
the respective prediction error via (repeated) K-fold
cross-validation, (repeated) random splitting (also known as random
subsampling or Monte Carlo cross-validation), or the bootstrap.
Usage
ridge(
x,
y,
lambda,
standardize = TRUE,
intercept = TRUE,
splits = foldControl(),
cost = rmspe,
selectBest = c("hastie", "min"),
seFactor = 1,
ncores = 1,
cl = NULL,
seed = NULL,
...
)
ridge.fit(x, y, lambda, standardize = TRUE, intercept = TRUE, ...)
Arguments
x |
a numeric matrix containing the predictor variables. |
y |
a numeric vector containing the response variable. |
lambda |
a numeric vector of non-negative values to be used as penalty parameter. |
standardize |
a logical indicating whether the predictor variables
should be standardized to have unit variance (the default is |
intercept |
a logical indicating whether a constant term should be
included in the model (the default is |
splits |
an object giving data splits to be used for prediction error
estimation (see |
cost |
a cost function measuring prediction loss (see
|
selectBest, seFactor |
arguments specifying a criterion for selecting
the best model (see |
ncores, cl |
arguments for parallel computing (see
|
seed |
optional initial seed for the random number generator (see
|
... |
for |
Value
For ridge, an object of class "perryTuning", see
perryTuning). It contains information on the
prediction error criterion, and includes the final model with the optimal
tuning paramter as component finalModel.
For ridge.fit, an object of class ridge with the following
components:
lambdaa numeric vector containing the values of the penalty parameter.
coefficientsa numeric vector or matrix containing the coefficient estimates.
fitted.valuesa numeric vector or matrix containing the fitted values.
residualsa numeric vector or matrix containing the residuals.
standardizea logical indicating whether the predictor variables were standardized to have unit variance.
intercepta logical indicating whether the model includes a constant term.
muXa numeric vector containing the means of the predictors.
sigmaXa numeric vector containing the standard deviations of the predictors.
muYnumeric; the mean of the response.
callthe matched function call.
Author(s)
Andreas Alfons
References
Hoerl, A.E. and Kennard, R.W. (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
See Also
Examples
## load data
data("Bundesliga")
Bundesliga <- Bundesliga[, -(1:2)]
f <- log(MarketValue) ~ Age + I(Age^2) + .
mf <- model.frame(f, data=Bundesliga)
x <- model.matrix(terms(mf), mf)[, -1]
y <- model.response(mf)
## set up repeated random splits
splits <- splitControl(m = 40, R = 10)
## select optimal penalty parameter
lambda <- seq(600, 0, length.out = 50)
fit <- ridge(x, y, lambda = lambda, splits = splits, seed = 2014)
fit
## plot prediction error results
plot(fit, method = "line")