The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
mirai support for parallelization and
encapsulation.when argument.$selected_features returns error when model is not
trained yet.data_format argument of
$data() method of DataBackend.data_formats field from
Learner.DataBackendMatrix class.materialize_view() method to
Task to replace the internal DataBackend with
a new one after operations like $select() and
$filter().fget in assert_predictable.$levels() of Task returns in the
correct order.Task
printer.quantiles and
quantile_response are set.MeasureRegrRQR for quantile
regression.$predict_newdata_fast() method to
Learner to speed up prediction.configure_learner is passed on
run_experiment() for autotest learners.Learner failed when the
validate field was set.data.table::setattr() for less copying.BREAKING CHANGE: The mlr3 ecosystem has a base logger now which
is named mlr3. The mlr3/core logger is a child
of the mlr3 logger and is used for logging messages from
the mlr3 package. Some extension packages have their own
loggers which are children of the mlr3 logger e.g. mlr3/mlr3pipelines
and mlr3/bbotk for tuning.
BREAKING CHANGE: weights property and functionality
is split into weights_learner and
weights_measure:
weights_learner: Weights used during training by the
Learner.weights_measure: Weights used during scoring
predictions via measures.Each of these can be disabled via the new field
use_weights in Learner and
Measure objects.
feat: Add $confusion_weighted field to
PredictionClassif.
feat: Add $weights field to Prediction.
It contains the weights_measure weights from the
Task that was used for prediction.
feat: Add "macro_weighted" option to
Measure$average field.
feat: MeasureRegrRSQ and
MeasureClassifCost gain "weights"
property.
feat: LearnerClassifFeatureless,
LearnerRegrFeatureless, LearnerClassifDebug,
LearnerRegrDebug gain "weights"
property.
feat: Learner printer now prints information about
encapsulation and weights use.
feat: Add score_roc_measures() to score a prediction
on various roc measures.
feat: A better error message is thrown, which often happens when
incorrectly configuring the validate field of a
GraphLearner
feat: Added method $set_threshold() to
BenchmarkResult and ResamplingResult, which
allows to set the threshold for the response prediction of
classification learners, given they have output a probability prediction
(#1270).
feat: Added field $uhash_table to
BenchmarkResult and functions uhash() and
uhashes() to easily compute uhashes for given learner,
task, or resampling ids (#1270).
feat: You can now change the default predict type of
classification learners to "prob" by setting the option
mlr3.prob_as_default to TRUE (#1273).
feat: benchmark_grid() will now throw a warning if
you mix different predict types in the design (#1273).
feat: Converting a BenchmarkResult to a
data.table now includes the task_id,
learner_id, and resampling_id columns
(#1275).
fix: Add missing parameters for "regr.pinball" and
"sim.phi" measures.
feat: Add new measure "regr.rqr" for quantile
regression.
col_role offset in Task and
offset Learner property. A warning is produced if a learner
that doesn’t support offsets is trained with a task that has an offset
column.$predict_newdata() method of
Learner now automatically conducts type conversions
(#685).Task with the wrong
column information is now an error and not a warning.mlr3.allow_utf8_names is removed.Learner$predict_types is read-only
now.Learner$predict_type after
training.resample() and
benchmark().assert_measure() with checks for trained
models in assert_scorable().tsk("boston_housing") with
tsk("california_housing").benchmark_grid().$loglik() method from all
learners.future.globals.maxSize when
future::plan("sequential") is used.$characteristics field to Task
to store additional information.mlr_reflections were broken when an extension
package was not loaded on the workers. Extension packages must now
register themselves in the mlr_reflections$loaded_packages
field.data_format and
data_formats for Learner, Task,
and DataBackend classes.partition() function creates training, test
and validation sets now.Task$col_info.Learner$predict (#943).$internal_valid_task can now be set to an
integer vector.$predict_sets
(#1094). This is relevant for measures that only extract information
from the model of a learner (such as internal validation scores or AIC /
BIC)$divide() methodTask$cbind() now works with non-standard primary
keys for data.frames (#961)."info" instead of "debug" (#972).regr.pinball here and in
mlr3measures.mu_auc here and in
mlr3measures.msr("regr.rsq").classif.debug and
regr.debug have new methods $importance() and
$selected_features() for testing, also in downstream
packages.default_fallback().$set_col_roles()
and $col_roles.$encapsulate(method, fallback) method. The
$fallback field is read-only now and the encapsulate status
can be retrieved from the $encapsulation field."primary_iters"$obs_loss. This is possible for
Prediction, ResampleResult and
BenchmarkResult.Measures now also return a vector of
numerics.msr("classif.mcc")."marshal" property, which allows
learners to process models so they can be serialized. This happens
automatically during resample() and
benchmark().default_values.Learner()
function.lgr package.mlr_learners respects
prototype arguments recently added in mlr3misc.resample().data.table tests on mac.data_prototype when resampling from
learner$state to reduce memory consumption.data.table and BLAS to
1 when running resample() or benchmark() in
parallel.resample() and
benchmark() by reducing the number of hashing
operations.HotstartStack anymore
when the model is missing.hotstart_threshold are not added to
the HotstartStack anymore.learner$state$train_time in hotstarted learners is
now only the time of the last training.HotstartStack did not work with
column roles set in the task.design of benchmark() can now include
parameter settings.packageVersion().col_info to allow adding new
methods for backends."mlr3.exec_chunk_bins" option to split the
resampling iterations into a number of bins.data.table() is now re-exported."try", which works similar to
"none" but captures errorspaired to benchmark_grid()
function, which can be used to create a benchmark design, where
resamplings have been instantiated on tasks.ResultData for
as_resample_result() converter.list for
as_resample_result() converter.print method to make the output more readable.distr6.GraphLearner.as_prediction_classif() for
data.frame() input (#872).Learner during
train for early stopping.mauc_aunu,
mauc_aunp, mauc_au1u,
mauc_au1p.classif.costs does not require a
Task anymore.as_task_unsupervised()mlr_reflections."mlr3.exec_random" and
"mlr3.exec_chunk_size"). These options are passed down to
the respective map functions in package future.apply.head() and tail() methods for
Task.label,
i.e. Task, TaskGenerator,
Learner, Resampling, and
Measure.as.data.table() methods for objects of class
Dictonary have been extended with additional columns.as_task_classif.formula() and
as_task_regr.formula() now remove additional atrributes
attached to the data which caused some some learners to break.$train()
and $predict() methods of a Learner. This
ensures that package loading errors are properly propagated and not
affected by encapsulation (#771)."evaluate" (#763).as_task_classif() and as_task_regr() now
support the construction of tasks using the formula interface,
e.g. as_task_regr(mpg ~ ., data = mtcars) (#761).default_values() function to extract parameter
default values from Learner objects."validation" has been renamed to
"holdout". In the next release, mlr3 will
start switching to the now more common terms
"train"/"validation" instead of
"train"/"test" for the sets created during
resampling.ResampleResult and BenchmarkResult.resample() and benchmark() got a new
argument clone to control which objects to clone before
performing computations.data.frame to Task in
as_task_classif() and as_task_regr(). A
warning is signaled if any column contains infinite values.(classif|regr|surv).xgboost with hyperparameter
nrounds updated) can now optionally store a stack of
trained learners to be used to hotstart their training. Note that this
feature is still somewhat experimental. See HotstartStack
and #719.sim.jaccard (Jaccard Index) and sim.phi (Phi
coefficient) (#690).predict_newdata() now also supports
DataBackend as input.install_pkgs() to install required
packages. This generic works for all objects with a
packages field as well as ResampleResult and
BenchmarkResult (#728).regr.debug for debugging.Task method $set_levels() to control
how data with factor columns is returned, independent of the used
DataBackend.NA if prerequisite are not met
(#699). This allows to conveniently score your experiments with multiple
measures having different requirements.%.Task$label(). These will be used in visualizations in the
future.Task$add_strata().partition() to split a task into a
training and test set.loglik() for class
Learner."aic" and "bic" to compute
the Akaike Information Criterion or the Bayesian Information Criterion,
respectively.ResamplingCustomCV. Creates a
custom resampling split based on the levels of a user-provided factor
variable.encapsulate for resample()
and benchmark() to conveniently enable encapsulation and
also set the fallback learner to the featureless learner. This is simply
for convenience, configuring each learner individually is still possible
and allows a more fine-grained control (#634, #642).parallel_predict for Learner to
enable parallel predictions via the future backend. This currently is
only enabled while calling the $predict() or
$predict_newdata methods and is disabled during
resample() and benchmark() where you have
other means to parallelize.$data in ResampleResult and
BenchmarkResult to simplify the API and avoid confusion.
The converter as.data.table() can be used instead to access
the internal data.beta.ordered in
Task$data() from TRUE to
FALSE.ResamplingRepeatedCV$folds() (#643).uri. This role be
split up into multiple roles by the mlr3keras package.as.data.table.Resampling method."row_id" to "row_ids" in
the as.data.table() methods for
PredictionClassif and PredictionRegr
(#547).as_prediction_classif() and
as_prediction_regr() to reverse the operation of
as.data.table.PredictionClassif() and
as.data.table.PredictionRegr().learner$predict_newdata() is not mandatory anymore
(#563).Task$data() defaults to return only active rows and
columns, instead of asserting to only return rows and columns. As a
result, the $data() method can now also be used to query
inactive rows and cols from the DataBackend.uri which is intended to
point to external resources, e.g. images on the file system.set_threads() to control the number of
threads during calls to external packages. All objects will be migrated
to have threading disabled in their defaults to avoid conflicting
parallelization techniques (#605).mlr3.debug: avoid calls to
future in resample() and
benchmark() to improve the readability of tracebacks.mlr3.allow_utf8_names: allow
non-ascii characters in column names in tasks.ResampleResult and
BenchmarkResult now optionally remove the DataBackend of
the Tasks in order to reduce file size and memory footprint after
serialization. To remove the backends from the containers, set
store_backends to FALSE in
resample() or benchmark(), respectively. Note
that this behavior will eventually will be the default for future
releases.Learner$predict_newdata() now have row ids starting from 1
instead auto incremented row ids of the training task.as.data.table.DictionaryTasks now returns an additional
column properties.conditions to
ResampleResult$score() and
BenchmarkResult$score() to allow to work with failing
learners more conveniently.Task: $set_col_roles and
$set_row_roles as a replacement for the deprecated and less
flexible $set_col_role and $set_row_role.friedman.test.BenchmarkResult() in
favor of the new mlr3benchmark package.MeasureOOBError now has set property
minimize to TRUE."featureless" to tag learners
which can operate on featureless tasks.predict_sets
for returned [Prediction] objects.lgr.NaN for
BenchmarkResult for resamplings with a single iteration
(#551).future (mlr3tuning#270).ResampleResult and BenchmarkResult now
share a common interface to store the experiment results. Manual
construction is still possible with helper function
as_result_data()ResamplingCV and
ResamplingRepeatedCV.classif.prauc (area under precision-recall
curve).bibtex.saveRDS() or
serialize().ResampleResult or
BenchmarkResult are now de-duplicated for an optimized
serialization.breast_cancer: all factor features are
now correctly stored as ordered factors.convert_task().breast_cancerResamplingLOO for leave-one-out resampling."distr" using the
distr6 package.ResamplingBootstrap in combination with grouping
(#514).TaskGeneratorMoons.keep_model to learners
"classif.rpart" and "regr.rpart"."cassini",
"circle", "simplex", "spirals",
and "moons").plot() method for most task generators.german_credit (#514).future.apply is now imported (instead of
suggested). This is necessary to ensure reproducibility: This way
exactly the same result is calculated, independent of the parallel
backend.Task$order.classif.bbrier (binary Brier score)
and classif.mbrier (multi-class Brier score).ResamplingInsample.TaskUnsupervised.ResampleResults and
BenchmarkResults with c().Task$predict_newdata()/Task$rbind()
(#423).Switched to new roxygen2 documentation format for R6
classes.
resample() and benchmark() now support
progress bars via the package progressr.
Row ids now must be numeric. It was previously allowed to have
character row ids, but this lead to confusion and unnecessary code
bloat. Row identifiers (e.g., to be used in plots) can still be part of
the task, with row role "name".
Row names can now be queried with
Task$row_names.
DataBackendMatrix now supports to store an optional
(numeric) dense part.
Added new method $filter() to filter
ResampleResults to a subset of iterations.
Removed deprecated character() -> object
converters.
Empty test sets are now handled separately by learners (#421). An empty prediction object is returned for all learners.
The internal train and predict function of Learner
now should be implemented as private method: instead of public methods
train_internal and predict_internal, private
methods .train and .predict are now
encouraged.
It is now encouraged to move some internal methods from public to private:
Learner$train_internal should now be private method
$.train.Learner$predict_internal should now be private method
$.predict.Measure$score_internal should now be private method
$.score. The public methods will be deprecated in a future
release.Removed arguments from the constructor of measures
classif.debug and classif.costs. These can be
set directly by msr().
We have published an article about mlr3 in the Journal of Open
Source Software: https://joss.theoj.org/papers/10.21105/joss.01903. See
citation("mlr3") for the citation info.
New method Learner$reset().
New method BenchmarkResult$filter().
Learners returned by BenchmarkResult$learners are
reset to encourage the safer alternative
BenchmarkResult$score() to access trained models.
Fix ordering of levels in
PredictionClassif$set_threshold() (triggered an
assertion).
Switched from package Metrics to package
mlr3measures.
Measures can now calculate all scores using micro or macro averaging (#400).
Measures can now be configured to return a customizable
performance score (instead of NA) in case the score cannot
be calculated.
Character columns are now treated differently from factor
columns. In the long term, character() columns are supposed
to store text.
Fixed a bug triggered by integer grouping variables in
Task (#396).
benchmark_grid() now accepts instantiated
resamplings under certain conditions.
Task$set_col_roles() and
Task$set_row_roles() are now deprecated. Instead it is
recommended for now to work with the lists Task$col_roles
and Task$row_roles directly.
Learner$predict_newdata() now works without argument
task if the learner has been fitted with
Learner$train() (#375).
Names of column roles have been unified ("weights",
"label", "stratify" and "groups"
have been renamed).
Replaced MeasureClassifF1 with
MeasureClassifFScore and fixed a bug in the F1 performance
calculation (#353). Thanks to @001ben for reporting.
Stratification is now controlled via a task column role (was a
parameter of class Resampling before).
Added a S3 predict() method for class
Learner to increase interoperability with other
packages.
Many objects now come with a $help() which opens the
respective manual page.
It is now possible to predict and score results on the training
set or on both training and test set. Learners can be instructed to
predict on multiple sets by setting predict_sets (default:
"test"). Measures operate on all sets specified in their
field predict_sets (default: "test").
ResampleResult$prediction and
ResampleResult$predictions() are now methods instead of
fields, and allow to extract predictions for different predict
sets.
ResampleResult$performance() has been renamed to
ResampleResult$score() for consistency.
BenchmarkResult$performance() has been renamed to
BenchmarkResult$score() for consistency.
Changed API for (internal) constructors accepting
paradox::ParamSet(). Instead of passing the initial values
separately, the initial values must now be set directly in the
ParamSet.
Deprecated support of automatically creating objects from
strings. Instead, mlr3 provides the following helper
functions intended to ease the creation of objects stored in
dictionaries: tsk(), tgen(),
lrn(), rsmp(), msr().
BenchmarkResult now ensures that the stored
ResampleResults are in a persistent order. Thus,
ResampleResults can now be addressed by their position
instead of their hash.
New field
BenchmarkResult$n_resample_results.
New field BenchmarkResult$hashes.
New method Task$rename().
New S3 generic as_benchmark_result().
Renamed Generator to
TaskGenerator.
Removed the control object mlr_control().
Removed ResampleResult$combine().
Removed BenchmarkResult$best().
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.