The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Variable Importance Vignette

A paper that describes the variable importance measures in more detail should be available soon.

1 Variable Importance

1.1 Definition

L0-penalization based modified variable importance is defined in the following way

\[mVI(i|X, y, \lambda) = \min_{\beta:\beta_i \neq 0} Q(\beta|X, y, \lambda) - \min_{\beta:\beta_i = 0} Q(\beta|X, y, \lambda) + \lambda |S_i|\] where \(Q(\beta|X, y, \lambda) = -2l(\beta|X, y) + \lambda ||\beta||_0\), \(||\beta||_0\) is the number of nonzero elements in \(\beta\), and \(||S_i||\) is the number of beta parameters associated with the ith set of variables. The number of parameters in the ith set of variables is 1 for continuous variables and is the number of levels minus 1 for categorical variables. \(\lambda\) is defined by the chosen metric where AIC results in \(\lambda = 2\), BIC results in \(\lambda = \log{(n)}\), and HQIC results in \(\lambda = 2\log{(\log{(n)})}\).

These variable importance values are equivalent to the traditional likelihood ratio test for beta parameters when \(\lambda = 0\). However, when \(\lambda > 0\), the null distribution of the variable importance values may not be chi-squared distributed. P-values for the variable importance values may be obtained from the VariableImportance.boot() function which uses a parametric bootstrap approach to approximate the null distribution. This process entails performing best subset selection many times over, so it is quite slow.

1.2 Variable importance example

L0-penalization based variable importance values may be calculated with the VariableImportance() function. The VariableImportance() function requires an object returned from calling the VariableSelection() function. The exact variable importance values are returned if a branch and bound algorithm is used with the VariableSelection() function. If a heuristic method is used with the VariableSelection() function, then approximate variable importance values based on the specified heuristic method are returned.

# Loading BranchGLM package
library(BranchGLM)

# Using iris dataset to demonstrate usage of VI
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI
#> Best Subset Selection VI(BIC)
#> ------------------------------
#>                   VI    mVI df
#> (Intercept)       NA     NA NA
#> Sepal.Width  21.6472  26.66  1
#> Petal.Length 99.3510 104.36  1
#> Petal.Width  -0.0572   4.95  1
#> Species       0.0572  10.08  2

We can visualize the variable importance values with the barplot() function.

# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)

par(mar = oldmar)

1.2.1 P-values

We can get approximate p-values based on the L0-penalization based variable importance values from the VariableImportance.boot() function. This function uses a parametric bootstrap approach to create an approximate null distribution for the variable importance values. This approach is very slow, so it is not feasible to get these p-values when there are many sets of variables.

# Getting approximate null distributions
set.seed(59903)
myBoot <- VariableImportance.boot(VI, nboot = 1000, showprogress = FALSE)
myBoot
#> Null Distributions of BSS mVI(BIC)
#> -----------------------------------
#>                 mVI p.values
#> (Intercept)      NA       NA
#> Sepal.Width   26.66    0.000
#> Petal.Length 104.36    0.000
#> Petal.Width    4.95    0.028
#> Species       10.08    0.004

We can visualize the results from VariableImportance.boot() with the hist() function or the boxplot() function. The boxplot() approach is convenient because we can look at all of the results in one plot while the hist() approach only contains the results for one set of variables in each plot.

# Plotting histogram of results for second set of variables
hist(myBoot)


# Plotting boxplots of results
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)

par(mar = oldmar)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.