The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This release is intended to be the last before stable version 1.0.0.
Passing a background dataset bg_X
is now optional.
If the explanation data X
is sufficiently large (>=
50 rows), bg_X
is derived as a random sample of
bg_n = 200
rows from X
. If X
has
less than bg_n
rows, then simply bg_X = X
. If
X
has too few rows (< 50), you will have to pass an
explicit bg_X
.
ranger()
survival models now also work out-of-the-box
without passing a tailored prediction function. Use the new argument
survival = "chf"
in kernelshap()
and
permshap()
to distinguish cumulative hazards (default) and
survival probabilities per time point.kernelshap()
and
permshap()
now contain bg_X
and
bg_w
used to calculate the SHAP values.gam::gam()
.New additive explainer additive_shap()
that works for
models fitted via
lm()
,glm()
,mgcv::gam()
,mgcv::bam()
,gam::gam()
,survival::coxph()
,survival::survreg()
.The explainer uses predict(..., type = "terms")
, a
beautiful trick used in fastshap::explain.lm()
. The result
will be identical to those returned by kernelshap()
and
permshap()
but exponentially faster. Thanks David Watson
for the great idea discussed in #130.
permshap()
now returns an object of class “kernelshap”
to reduce the number of redundant methods.kernelshap()
,
permshap()
(and additive_shap()
) got an
element “algorithm”.is.permshap()
has been removed.predict_type = "prob"
.permshap()
by caching calculations
for the two special permutations of all 0 and all 1. Consequently, the
m_exact
component in the output is reduced by 2.permshap()
to calculate exact permutation SHAP
values. The function currently works for up to 14 features.S
and SE
lists.feature_names
as
dimnames (https://github.com/ModelOriented/kernelshap/issues/96).ks_extract()
function. It was designed to
extract objects like the matrix S
of SHAP values from the
resulting “kernelshap” object x
. We feel that the standard
extraction options (x$S
, x[["S"]]
, or
getElement(x, "S")
) are sufficient.X
, and \(K\) is the dimension of a single prediction
(usually 1).verbose = FALSE
now does not suppress the
warning on too large background data anymore. Use
suppressWarnings()
instead.bg_X
contained more columns than X
,
unflexible prediction functions could fail when being applied to
bg_X
.feature_names
allows to specify the
features to calculate SHAP values for. The default equals to
colnames(X)
. This should be changed only in situations when
X
(the dataset to be explained) contains non-feature
columns.Thanks to David Watson, exact calculations are now also possible for \(p>5\) features. By default, the algorithm uses exact calculations for \(p \le 8\) and a hybrid strategy otherwise, see the next section. At the same time, the exact algorithm became much more efficient.
A word of caution: Exact calculations mean to create \(2^p-2\) on-off vectors \(z\) (cheap step) and evaluating the model on a whopping \((2^p-2)N\) rows, where \(N\) is the number of rows of the background data (expensive step). As this explodes with large \(p\), we do not recommend the exact strategy for \(p > 10\).
The iterative Kernel SHAP sampling algorithm of Covert and Lee (2021) [1] works by randomly sample \(m\) on-off vectors \(z\) so that their sum follows the SHAP Kernel weight distribution (renormalized to the range from \(1\) to \(p-1\)). Based on these vectors, many predictions are formed. Then, Kernel SHAP values are derived as the solution of a constrained linear regression, see [1] for details. This is done multiple times until convergence.
A drawback of this strategy is that many (at least 75%) of the \(z\) vectors will have \(\sum z \in \{1, p-1\}\), producing many duplicates. Similarly, at least 92% of the mass will be used for the \(p(p+1)\) possible vectors with \(\sum z \in \{1, 2, p-1, p-2\}\) etc. This inefficiency can be fixed by a hybrid strategy, combining exact calculations with sampling. The hybrid algorithm has two steps:
The default behaviour of kernelshap()
is as follows:
It is also possible to use a pure sampling strategy, see Section
“User visible changes” below. While this is usually not advisable
compared to a hybrid approach, the options of kernelshap()
allow to study different properties of Kernel SHAP and doing empirical
research on the topic.
Kernel SHAP in the Python implementation “shap” uses a quite similar hybrid strategy, but without iterating. The new logic in the R package thus combines the efficiency of the Python implementation with the convergence monitoring of [1].
[1] Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
m
is reduced from \(8p\) to \(2p\) except when
hybrid_degree = 0
(pure sampling).exact
is now TRUE
for
\(p \le 8\) instead of \(p \le 5\).hybrid_degree
is introduced to control
the exact part of the hybrid algorithm. The default is 2 for \(4 \le p \le 16\) and degree 1 otherwise.
Set to 0 to force a pure sampling strategy (not recommended but useful
to demonstrate superiority of hybrid approaches).tol
was reduced from 0.01 to
0.005.max_iter
was reduced from 250 to
100.m
.print()
is now more slim.summary()
function shows more infos.m_exact
(the number
of on-off vectors used for the exact part), prop_exact
(proportion of mass treated in exact fashion), exact
flag,
and txt
(the info message when starting the
algorithm).mgcv::gam()
would cause an error in
check_pred()
(they are 1D-arrays).The interface of kernelshap()
has been revised. Instead
of specifying a prediction function, it suffices now to pass the fitted
model object. The default pred_fun
is now
stats::predict
, which works in most cases. Some other cases
are catched via model class (“ranger” and mlr3 “Learner”). The
pred_fun
can be overwritten by a function of the form
function(object, X, ...)
. Additional arguments to the
prediction function are passed via ...
of
kernelshap()
.
Some examples:
kernelshap(fit, X, bg_X)
kernelshap(fit, X, bg_X, type = "response")
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
kernelshap()
has received a more intuitive interface,
see breaking change above.kernelshap()
, e.g., using the “doFuture” package, and then
set parallel = TRUE
. Especially on Windows, sometimes not
all global variables or packages are loaded in the parallel instances.
These can be specified by parallel_args
, a list of
arguments passed to foreach()
.kernelshap()
has
become much faster.matrix
, data.frame
s, and
tibble
s, the package now also accepts
data.table
s (if the prediction function can deal with
them).kernelshap()
is less picky regarding the output
structure of pred_fun()
.kernelshap()
is less picky about the column structure
of the background data bg_X
. It should simply contain the
columns of X
(but can have more or in different order). The
old behaviour was to launch an error if
colnames(X) != colnames(bg_X)
.m = "auto"
has been changed from
trunc(20 * sqrt(p))
to
max(trunc(20 * sqrt(p)), 5 * p
. This will have an effect
for cases where the number of features \(p
> 16\). The change will imply more robust results for large
p.ks_extract(, what = "S")
.MASS::ginv()
, the Moore-Penrose pseudoinverse using
svd()
.This is the initial release.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.