Calculating MDI and MDI-oob with tree.interpreter

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

The R package tree.interpreter at its core implements the interpretation algorithm proposed by (Saabas 2014) for popular RF packages such as randomForest and ranger. This vignette illustrates how to calculate the MDI, a.k.a Mean Decrease Impurity, and MDI-oob, a debiased MDI feature importance measure proposed by (Li et al. 2019), with it.

If you use this package for data analysis, please consider citing it with citation('tree.interpreter').

Saabas’s Prediction Interpretation Algorithm

Let’s start with the interpretation algorithm by (Saabas 2014). The idea is to decompose the prediction for a specific sample by looking at the decision rule associated with it.

Define for a tree \(T\), a feature \(k\), and a sample \(X\), the function \(f_{T, k}(X)\) to be

\[ f_{T, k}(X) = \sum_{t \in I(T): v(t)=k} \left\{ \mu_n (t^{\text{left}}) \mathbb{1} \left(X \in R_{t^{\text{left}}}\right) + \mu_n (t^{\text{right}}) \mathbb{1} \left(X \in R_{t^{\text{right}}}\right) - \mu_n (t) \mathbb{1} \left(X \in R_t\right) \right\}, \]

where \(I(T)\) is inner nodes of the tree \(T\), \(v(t)\) is the feature on which the node \(t\) is split on, \(R(t)\) is the hyper-rectangle in the feature space “occupied” by the node \(t\), \(\mu_n(t)\) is the average response of samples falling into \(R(t)\), and \(\mathbb{1}\) is the indicator function. This is calculated by the function tree.interpreter::featureContribTree.

Intuitively, it calculates the lagged differences of the responses for the nodes on the decision path of an individual sample, groupped by the feature on which the nodes are split on. Consequently, the sum of the response of the root node and \(\sum_{k} f_{T, k}(X)\) is exactly the prediction of \(X\) by \(T\).

In order to move from a decision tree to a forest, define for a feature \(k\) and a sample \(X\) the function \(f_{k}(X)\) to be

\[ f_{k}(X) = \frac{1}{n_{\text{tree}}} \sum_{s=1}^{n_{\text{tree}}} f_{T_s, k}(X), \]

where the forest is represented by an ensemble of \(n_{\text{tree}}\) trees \(T_1, \dots, T_{n_{\text{tree}}}\). This is sensible because (at least for regression trees) a forest makes prediction by averaging over the predictions of its trees, so all trees naturally have the same weight. It follows that the prediction of \(X\) by the whole forest is exactly the sum of the average response of the root nodes in the forest and \(\sum_{k} f_{k}(X)\). This is calculated by the function tree.interpreter::featureContrib.

Later, (Saabas 2015) released a Python library named treeinterpreter on PyPI, implementing this interpretation algorithm for random forest models by the RF library scikit-learn. This R package effectively serves as its R counterpart.

MDI in \(f_{T, k}(X)\)

Recently, (Li et al. 2019) have shown that for a tree \(T\), the MDI of the feature \(k\) can be written as:

\[ \frac{1}{|\mathcal{D}^{(T)}|} \sum_{i \in \mathcal{D}^{(T)}} f_{T, k}(x_i) \cdot y_i. \]

You can calculate the MDI for a tree with tree.interpreter::MDITree.

MDI-oob in \(f_{T, k}(X)\)

They also proposed a debiased MDI feature importance measure using out-of-bag samples, called MDI-oob:

\[ \frac{1}{|\mathcal{D} \setminus \mathcal{D}^{(T)}|} \sum_{i \in \mathcal{D} \setminus \mathcal{D}^{(T)}} f_{T, k}(x_i) \cdot y_i. \]

You can calculate the MDI-oob for a tree with tree.interpreter::MDIoobTree.

MDI and MDI-oob of the forest

The MDI(-oob) of a forest is simply the average MDI(-oob) of all its trees. As remarked by (Li et al. 2019), for classification trees, we must convert the factorial response to one-hot vectors.

You can calculate the MDI and MDI-oob for a forest with tree.interpreter::MDI and tree.interpreter::MDIoob, respectively.

Examples

Below we present two examples to demonstrate how to calculate MDI and MDI-oob with tree.interpreter for regression and classification trees.

library(MASS)
library(ranger)
library(tree.interpreter)

MDI.R

Regression

In the first example, we build a random forest on the Boston housing data set, and calculate the MDI/MDI-oob of each feature.

# Setup
set.seed(42L)
rfobj <- ranger(medv ~ ., Boston, keep.inbag = TRUE, importance = 'impurity')
tidy.RF <- tidyRF(rfobj, Boston[, -14], Boston[, 14])

# MDI
t(Boston.MDI <- MDI(tidy.RF, Boston[, -14], Boston[, 14]))
#>              crim        zn    indus      chas      nox       rm      age
#> Response 5.506154 0.9710535 5.119131 0.7282288 6.037479 20.89702 2.448776
#>               dis      rad      tax  ptratio    black    lstat
#> Response 5.182923 1.274512 3.073693 5.798018 1.914309 23.48295
all.equal(as.vector(Boston.MDI),
          as.vector(importance(rfobj) /
                      sum(rfobj$inbag.counts[[1]])))
#> [1] TRUE

# MDI-oob
t(MDIoob(tidy.RF, Boston[, -14], Boston[, 14]))
#>              crim        zn    indus      chas      nox       rm       age
#> Response 3.616714 0.8027427 4.854788 0.1319444 4.693773 17.44773 0.7878314
#>               dis       rad      tax  ptratio     black    lstat
#> Response 1.523611 0.9568308 1.971855 4.924334 0.9704446 21.90341

MDI.R

Classification

In the second example, we build a random forest on Anderson’s iris data set, and calculate the MDI/MDI-oob of each feature.

# Setup
set.seed(42L)
rfobj <- ranger(Species ~ ., iris, keep.inbag = TRUE, importance = 'impurity')
tidy.RF <- tidyRF(rfobj, iris[, -5], iris[, 5])

# MDI
(iris.MDI <- rowSums(MDI(tidy.RF, iris[, -5], iris[, 5])))
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>   0.06035806   0.01499690   0.27461409   0.31154473
all.equal(as.vector(iris.MDI),
          as.vector(importance(rfobj) /
                      sum(rfobj$inbag.counts[[1]])))
#> [1] TRUE

# MDI-oob
rowSums(MDIoob(tidy.RF, iris[, -5], iris[, 5]))
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>  0.040298928  0.003348461  0.258391249  0.304162920

MDI.R

References

Li, Xiao, Yu Wang, Sumanta Basu, Karl Kumbier, and Bin Yu. 2019. “A Debiased MDI Feature Importance Measure for Random Forests.” arXiv:1906.10845 [Cs, Stat], June. https://arxiv.org/abs/1906.10845.

Saabas, Ando. 2014. “Interpreting Random Forests.” https://blog.datadive.net/interpreting-random-forests/.

———. 2015. “Random Forest Interpretation with Scikit-Learn.” https://blog.datadive.net/random-forest-interpretation-with-scikit-learn/.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.