The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
defaultSummary()
function from
caret
which uses the square of Pearson correlation
coefficient (r-squared), instead of the correct coefficient of
determination which is calculated as 1 - rss/tss
, where
rss
= residual sum of squares, tss
= total sum
of squares. The correct formula for R-squared is now being applied.x
is a single predictor.prc()
which enables easy building of
precision-recall curves from ‘nestedcv’ models and
repeatcv()
results.predict
method for cva.glmnet
.|>
can be used instead.metrics()
which gives additional performance
metrics for binary classification models such as F1 score, Matthew’s
correlation coefficient and precision recall AUC.pls_filter()
which uses partial least squares
regression to filter features.repeatedcv()
leading to significant improvement in speed.nestcv.train()
. If argument cv.cores
>1,
openMP multithreading is now disabled, which prevents caret models
xgbTree
and xgbLinear
from crashing, and
allows them to be parallelised efficiently over the outer CV loops.var_stability()
and its plots.nestcv.glmnet()
repeatcv()
to apply repeated nested
CV to the main nestedcv
model functions for robust
measurement of model performance.modifyX
argument to all
nestedcv
models. This allows more powerful manipulation of
the predictors such as scaling, imputing missing values, adding extra
columns through variable manipulations. Importantly these are applied to
train and test input data separately.predict()
function for
nestcv.SuperLearner()
pred_SuperLearner
wrapper for use with
fastshap::explain
nestcv.SuperLearner()
on
windows.nestcv.glmnet()
verbose
in nestcv.train()
,
nestcv.glmnet()
and outercv()
to show
progress.multicore_fork
in
nestcv.train()
and outercv()
to allow choice
of parallelisation between forked multicore processing using
mclapply
or non-forked using parLapply
. This
can help prevent errors with certain multithreaded caret models
e.g. model = "xgbTree"
.one_hot()
changed all_levels
argument
default to FALSE
to be compatible with regression models by
default.lm_filter()
full results
tablelm_filter()
where variables
with zero variance were incorrectly reporting very low p-values in
linear models instead of returning NA
. This is due to how
rank deficient models are handled by RcppEigen::fastLmPure
.
Default method for fastLmPure
has been changed to
0
to allow detection of rank deficient models.weight()
caused by NA
. Allow
weight()
to tolerate character vectors.keep_factors
option has been added to filters to control filtering of factors with 3
or more levels.one_hot()
for fast one-hot encoding of factors
and character columns by creating dummy variables.stat_filter()
which applies univariate filtering
to dataframes with mixed datatype (continuous & categorical
combined).anova_filter()
from
Rfast::ftests()
to
matrixTests::col_oneway_welch()
for much better
accuracynestcv.train()
(Matt Siggins suggestion)n_inner_folds
argument to
nestcv.train()
to make it easier to set the number of inner
CV folds, and inner_folds
argument which enables setting
the inner CV fold indices directly (suggestion Aline Wildberger)plot_shap_beeswarm()
caused by change in
fastshap 0.1.0 output from tibble to matrixnestcv.train()
pass_outer_folds
to both
nestcv.glmnet
and nestcv.train
: this enables
passing of passing of outer CV fold indices stored in
outer_folds
to the final round of CV. Note this can only
work if n_outer_folds
= number of inner CV folds and
balancing is not applied so that y
is a consistent
length.nfolds
for final CV equals
n_inner_folds
in nestcv.glmnet()
plot_var_stability()
to be more user
friendlytop
argument to shap plotsfastshap
for calculating SHAP values.force_vars
argument to
glmnet_filter()
ranger_filter()
nestcv.train()
from models such as
gbm
. This fixes multicore bug when using standard R gui on
mac/linux.nestcv.glmnet()
model has 0 or 1
coefficients.nestedcv
models now return xsub
containing
a subset of the predictor matrix x
with filtered variables
across outer folds and the final fitboxplot_model()
no longer needs the predictor matrix to
be specified as it is contained in xsub
in
nestedcv
modelsboxplot_model()
now works for all nestedcv
model typesvar_stability()
to assess variance and
stability of variable importance across outer folds, and directionality
for binary outcomeplot_var_stability()
to plot variable
stability across outer foldsfinalCV = NA
option which skips fitting the final
model completely. This gives a useful speed boost if performance metrics
are all that is needed.model
argument in outercv
now prefers a
character value instead of a function for the model to be fittedoutercv
nestcv.train
which
improves error detection in caret. So nestcv.train
can be
run in multicore mode straightaway.nestcv.glmnet
nestcv.glmnet
outer_train_predict
argument to enable saving of
predictions on outer training foldstrain_preds
to obtain outer training fold
predictionstrain_summary
to show performance metrics
on outer training foldssmote()
SuperLearner
packagenestcv.train
and nestcv.glmnet
nestcv.train
for caret models with tuning
parameters which are factorsnestcv.train
for caret models using
regressionnestcv.train
and
nestcv.glmnet
to tune final model parameters using a final
round of CV on the whole datasetnestcv.train
and
outercv
randomsample()
to handle class imbalance using
random over/undersamplingsmote()
for SMOTE algorithm for increasing minority
class databoot_ttest()
nestcv.glmnet()
is mean of best lambdas
on log scaleplot_varImp
for plotting variable importance for
nestcv.glmnet
final modelsnestcv.glmnet()
cva.glmnet()
plot.cva.glmnet
alphaSet
in
plot.cva.glmnet
train
function of caret
filterFUN
is no longer done through ...
but
with a list of arguments passed through a new argument
filter_options
.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.