The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Type: Package
Title: Automatic Stacked Ensemble for Regression Tasks
Version: 1.1.0
Author: Giancarlo Vercellino
Maintainer: Giancarlo Vercellino <giancarlo.vercellino@gmail.com>
Description: Stacked ensemble for regression tasks based on 'mlr3' framework with a pipeline for preprocessing numeric and factor features and hyper-parameter tuning using grid or random search.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
Depends: R (≥ 4.1)
Imports: mlr3 (≥ 0.12.0), mlr3learners (≥ 0.5.0), mlr3filters (≥ 0.4.2), mlr3pipelines (≥ 0.3.5-1), mlr3viz (≥ 0.5.5), paradox (≥ 1.0.0), mlr3tuning (≥ 0.8.0), bbotk (≥ 0.3.2), tictoc (≥ 1.0.1), forcats (≥ 0.5.1), readr (≥ 2.0.1), lubridate (≥ 1.7.10), purrr (≥ 0.3.4), Metrics (≥ 0.1.4), data.table (≥ 1.14.0), visNetwork (≥ 2.0.9)
Suggests: xgboost (≥ 1.4.1.1), rpart (≥ 4.1-15), ranger (≥ 0.13.1), kknn (≥ 1.3.1), glmnet (≥ 4.1-2), e1071 (≥ 1.7-8), mlr3misc (≥ 0.9.3), FSelectorRcpp (≥ 0.3.8), care (≥ 1.1.10), praznik (≥ 8.0.0), lme4 (≥ 1.1-27.1), nloptr (≥ 1.2.2.2)
URL: https://mlr3.mlr-org.com/
NeedsCompilation: no
Packaged: 2024-06-19 03:51:07 UTC; gianc
Repository: CRAN
Date/Publication: 2024-06-19 10:20:02 UTC

sense

Description

Stacked ensamble for regression tasks based on 'mlr3' framework.

Usage

sense(
  df,
  target_feat,
  benchmarking = "all",
  super = "avg",
  algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"),
  sampling_rate = 1,
  metric = "mae",
  collapse_char_to = 10,
  num_preproc = "scale",
  fct_preproc = "one-hot",
  impute_num = "sample",
  missing_fusion = FALSE,
  inner = "holdout",
  outer = "holdout",
  folds = 3,
  repeats = 3,
  ratio = 0.5,
  selected_filter = "information_gain",
  selected_n_feats = NULL,
  tuning = "random_search",
  budget = 30,
  resolution = 5,
  n_evals = 30,
  minute_time = 10,
  patience = 0.3,
  min_improve = 0.01,
  java_mem = 64,
  decimals = 2,
  seed = 42
)

Arguments

df

A data frame with features and target.

target_feat

String. Name of the numeric feature for the regression task.

benchmarking

Positive integer. Number of base learners to stack. Default: "all".

super

String. Super learner of choice among the available learners. Default: "avg".

algos

String vector. Available learners are: "glmnet", "ranger", "xgboost", "rpart", "kknn", "svm".

sampling_rate

Positive numeric. Sampling rate before applying the stacked ensemble. Default: 1.

metric

String. Evaluation metric for outer and inner cross-validation. Default: "mae".

collapse_char_to

Positive integer. Conversion of characters to factors with predefined maximum number of levels. Default: 10.

num_preproc

String. Options for scalar pre-processing: "scale" or "range". Default: "scale".

fct_preproc

String. Options for factor pre-processing: "encodeimpact", "encodelmer", "one-hot", "treatment", "poly", "sum", "helmert". Default: "one-hot".

impute_num

String. Options for missing imputation in case of numeric: "sample" or "hist". Default: "sample". For factor the default mode is Out-Of-Range.

missing_fusion

String. Adding missing indicator features. Default: "FALSE".

inner

String. Cross-validation inner cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout".

outer

String. Cross-validation outer cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout".

folds

Positive integer. Number of repetitions used in "cv" and "repeated_cv". Default: 3.

repeats

Positive integer. Number of repetitions used in "subsampling" and "repeated_cv". Default: 3.

ratio

Positive numeric. Percentage value for "holdout" and "subsampling". Default: 0.5.

selected_filter

String. Filters available for regression tasks: "carscore", "cmim", "correlation", "find_correlation", "information_gain", "relief", "variance". Default: "information_gain".

selected_n_feats

Positive integer. Number of features to select through the chosen filter. Default: NULL.

tuning

String. Available options are "random_search" and "grid_search". Default: "random_search".

budget

Positive integer. Maximum number of trials during random search. Default: 30.

resolution

Positive integer. Grid resolution for each hyper-parameter. Default: 5.

n_evals

Positive integer. Number of evaluation for termination. Default: 30.

minute_time

Positive integer. Maximum run time before termination. Default: 10.

patience

Positive numeric. Percentage of stagnating evaluations before termination. Default: 0.3.

min_improve

Positive numeric. Minimum error improvement required before termination. Default: 0.01.

java_mem

Positive integer. Memory allocated to Java. Default: 64.

decimals

Positive integer. Decimal format of prediction. Default: 2.

seed

Positive integer. Default: 42.

Value

This function returns a list including:

Author(s)

Giancarlo Vercellino giancarlo.vercellino@gmail.com

See Also

Useful links:

Examples

## Not run: 
sense(benchmark, "y", algos = c("glmnet", "rpart"))


## End(Not run)


benchmark data set

Description

A data frame for regression task generated with mlbench friedman1.

Usage

benchmark

Format

A data frame with 11 columns and 150 rows.

Source

mlbench, friedman1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.