In this tutorial, we show how to use morf
to estimate
conditional choice probabilities and marginal effects, and conduct
inference about these statistical targets. For illustration purposes, we
use the synthetic data set provided in the orf
package:
## Load data from orf package.
set.seed(1986)
library(orf)
data(odata)
<- as.numeric(odata[, 1])
y <- as.matrix(odata[, -1]) X
The morf
function constructs a collection of forests,
one for each category of y
(three in this case). We can
then use the forests to predict out-of-sample conditional probabilities
using the predict
method. By default, predict
returns a matrix with the predicted probabilities and a vector of
predicted class labels (each observation is labelled to the
highest-probability class).
## Training-test split.
<- sample(seq_len(length(y)), length(y)/2)
train_idx
<- y[train_idx]
y_tr <- X[train_idx, ]
X_tr
<- y[-train_idx]
y_test <- X[-train_idx, ]
X_test
## Fit morf on training sample. Use default settings.
<- morf(y_tr, X_tr)
forests
## Summary of data and tuning parameters.
summary(forests)
#> Call:
#> morf(y_tr, X_tr)
#>
#> Data info:
#> Full sample size: 500
#> N. covariates: 5
#> Classes: 1 2 3
#>
#> Relative variable importance:
#> X1 X2 X3 X4
#> 0.410 0.190 0.104 0.295
#>
#> Tuning parameters:
#> N. trees: 2000
#> mtry: 2
#> min.node.size 5
#> Subsampling scheme: No replacement
#> Honesty: FALSE
#> Honest fraction: 0
## Out-of-sample predictions.
<- predict(forests, X_test)
predictions
head(predictions$probabilities)
#> P(Y=1) P(Y=2) P(Y=3)
#> [1,] 0.85140961 0.1185089 0.03008149
#> [2,] 0.70315449 0.2661564 0.03068916
#> [3,] 0.36277830 0.2800366 0.35718506
#> [4,] 0.01424455 0.1056402 0.88011530
#> [5,] 0.11414762 0.4527127 0.43313970
#> [6,] 0.03066153 0.2633758 0.70596271
table(y_test, predictions$classification)
#>
#> y_test 1 2 3
#> 1 104 41 20
#> 2 61 59 46
#> 3 38 45 86
We can also implement honesty, which is a necessary condition to get
asymptotically normal and consistent predictions. However, honesty
generally comes at the expense of a larger mean squared error. Thus, if
inference is not of interest we recommend adaptive forests. In the
following, we set honesty = TRUE
to construct honest
forests.
## Honest forests.
<- morf(y_tr, X_tr, honesty = TRUE)
honest_forests <- predict(honest_forests, X_test)
honest_predictions
## Compare predictions with adaptive fit.
cbind(head(predictions$probabilities), head(honest_predictions$probabilities))
#> P(Y=1) P(Y=2) P(Y=3) P(Y=1) P(Y=2) P(Y=3)
#> [1,] 0.85140961 0.1185089 0.03008149 0.70950736 0.2211598 0.06933279
#> [2,] 0.70315449 0.2661564 0.03068916 0.57201597 0.2958081 0.13217592
#> [3,] 0.36277830 0.2800366 0.35718506 0.45993504 0.2434703 0.29659463
#> [4,] 0.01424455 0.1056402 0.88011530 0.09120661 0.2322893 0.67650412
#> [5,] 0.11414762 0.4527127 0.43313970 0.13720794 0.4634872 0.39930481
#> [6,] 0.03066153 0.2633758 0.70596271 0.07797359 0.3204103 0.60161614
To extract the weights induced by each forest and estimate the
standard errors we set inference = TRUE
. This requires also
to set honesty = TRUE
, although in principle we could
estimate standard errors also for adaptive forests. The point of
implementing honesty is that it allows us to use the estimated standard
errors to construct valid confidence intervals for the conditional
probabilities. We again stress that if we care only about prediction
performance we should set honesty = FALSE
(this is the
default). As a final remark, notice that the weights extraction
considerably slows down the routine.
## Compute standard errors.
<- morf(y_tr, X_tr, honesty = TRUE, inference = TRUE)
honest_forests head(honest_forests$predictions$standard.errors)
#> P(Y=1) P(Y=2) P(Y=3)
#> [1,] 0.06114787 0.09194353 0.10702135
#> [2,] 0.13360556 0.07127975 0.06315303
#> [3,] 0.05675549 0.07449825 0.15692682
#> [4,] 0.15461445 0.13627816 0.08301800
#> [5,] 0.18850488 0.11299775 0.01976499
#> [6,] 0.15261776 0.07227305 0.01801186
The package implements a nonparametric estimator of marginal effects.
This can be used by calling the marginal_effects
function,
which allows the estimation of mean marginal effects, marginal effects
at the mean, or marginal effects at the median, according to the
eval
argument. In the following, we construct our forests
in the training sample and use them to estimate the marginal effects at
the mean in the test sample. We use a larger number of trees here to get
more stable results.
## Fit morf. Use large number of trees.
<- morf(y_tr, X_test, n.trees = 4000)
forests
## Marginal effects at the mean.
<- marginal_effects(forests, data = X_test, eval = "atmean")
me_atmean summary(me_atmean)
#> Morf marginal effects results
#>
#> Data info:
#> Number of classes: 3
#> Sample size: 500
#>
#> Tuning parameters:
#> Evaluation: atmean
#> Bandwidth: 0.01
#> Number of trees: 4000
#> Honest forests: FALSE
#> Honesty fraction: 0
#>
#> Marginal Effects:
#> P'(Y=1) P'(Y=2) P'(Y=3)
#> X1 2.171 -0.770 -1.401
#> X2 -0.023 -0.060 0.084
#> X3 -0.002 0.003 -0.001
#> X4 -0.067 0.111 -0.044
As before, we can set inference = TRUE
to estimate the
standard errors. Again, this requires the use of honest forests and
considerably slows down the routine.
##Honest forests.
<- morf(y, X, n.trees = 4000, honesty = TRUE) # Notice we do not need inference here!
honest_forests
## Compute standard errors.
<- marginal_effects(honest_forests, data = X_test , eval = "atmean", inference = TRUE)
honest_me_atmean $standard.errors
honest_me_atmean#> P'(Y=1) P'(Y=2) P'(Y=3)
#> X1 0.17927181 0.66833784 0.74615354
#> X2 0.12220391 0.13619835 0.08142314
#> X3 0.12813135 0.09401420 0.10277211
#> X4 0.04416793 0.01535275 0.01553786
$p.values # These are not corrected for multiple hypotheses testing!
honest_me_atmean#> P'(Y=1) P'(Y=2) P'(Y=3)
#> X1 0.43854102 0.9097233 0.773589290
#> X2 0.11232883 0.7699091 0.004074591
#> X3 0.01495845 0.4575155 0.018562943
#> X4 0.58402693 0.7230779 0.227728291
## LATEX.
print(honest_me_atmean, latex = TRUE)
#> \begingroup
#> \setlength{\tabcolsep}{8pt}
#> \renewcommand{\arraystretch}{1.1}
#> \begin{table}[H]
#> \centering
#> \begin{adjustbox}{width = 0.75\textwidth}
#> \begin{tabular}{@{\extracolsep{5pt}}l c c c}
#> \\[-1.8ex]\hline
#> \hline \\[-1.8ex]
#> & Class 1 & Class 2 & Class 3 \\
#> \addlinespace[2pt]
#> \hline \\[-1.8ex]
#>
#> \texttt{X1} & -0.139 & -0.076 & 0.215 \\
#> & (0.179) & (0.668) & (0.746) \\
#> \texttt{X2} & -0.194 & -0.04 & 0.234 \\
#> & (0.122) & (0.136) & (0.081) \\
#> \texttt{X3} & -0.312 & 0.07 & 0.242 \\
#> & (0.128) & (0.094) & (0.103) \\
#> \texttt{X4} & 0.024 & -0.005 & -0.019 \\
#> & (0.044) & (0.015) & (0.016) \\
#>
#> \addlinespace[3pt]
#> \\[-1.8ex]\hline
#> \hline \\[-1.8ex]
#> \end{tabular}
#> \end{adjustbox}
#> \caption{Marginal effects.}
#> \label{table:morf.marginal.effects}
#> \end{table}
#> \endgroup