The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

FeatureFinder

Richard Davis

2024-12-17

FeatureFinder is designed to give comprehensive and accurate sets of features which can be used in modelling, either to build a new model or to enhance and diagnose an existing model. Both methods are available through a single function, FindFeatures. The following give examples for each method.

Identifying Features for a model target

A typical modelling scenario involves a table consisting of a set of predictors \(\{x_i\}\) and a model target \(y\).

Identifying Features for a model residual

A typical modelling scenario involves a table consisting of a set of predictors \(\{x_i\}\) and a model residual \(r = y - p\), where \(p\) is a prediction from a previously-fit model and \(y\) is the model target

Finding features

For a model target, we simply define \(r=y\), and for a model residual we use the residual \(r = y - p\). We supply a single table consisting of all predictors together with \(r\) and call the FindFeatures function.

The function generates a decision tree for the entire table, as well as decision trees for every possible subset of the table. Subsets are defined using factor-valued columns in the data. They can either be user-defined or already included as predictors.

Each decision tree will consist of an rpart tree as shown:

requireNamespace("png", quietly = TRUE)
img <- png::readPNG(paste0(getwd(),"/../inst/extdata/test/ALL_ALL_medres2.png"))
grid::grid.raster(img)

A summary of residual nodes according to user-specified criteria for residual value and leaf volume will also be generated, in txt files (for example treesAll.txt and allfactors.txt). These contain a summary of each significant term with its definition, volume and other parameters as shown:

In the example, partitioning enables significant leaves to be found for each partition, although the full dataset does not yield leaves in the fitted tree. This illustrates the benefits of the partitioning technique.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.