The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider. If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Exploring Random Forests with ggRandomForests
John Ehrlinger
2026-04-28
The ggRandomForests package extracts tidy data objects from either randomForestSRC or randomForest fits and feeds them into familiar ggplot2 workflows. This vignette highlights the most common objects— gg_error, gg_variable, and gg_vimp—along with a small helper for building balanced conditioning intervals.
Error trajectories with gg_error()
library(randomForest)set.seed(42)rf_iris <-randomForest(Species ~ ., data = iris, ntree =200, keep.forest =TRUE)err_df <- ggRandomForests::gg_error(rf_iris, training =TRUE)head(err_df)
The gg_error() object stores the cumulative OOB error rate for each outcome column plus the ntree counter. When training = TRUE, the function reconstructs the original model frame and appends the in-bag error trajectory (train). Plotting overlays both curves by default:
Classes 'gg_variable', 'regression' and 'data.frame': 506 obs. of 2 variables:
$ lstat: num 4.98 9.14 4.03 2.94 5.33 ...
$ yhat : num 29.2 22.5 35.1 36.4 33.4 ...
Because the original training data are recovered from the model call, gg_variable() works even when the forest was trained within helper functions or against a subset() expression. The output keeps the raw predictors plus either a continuous yhat column (regression) or per-class probabilities (yhat.<class> for classification). Plotting a single variable is straightforward:
plot(var_df, xvar ="lstat")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Survival forests can request multiple horizons using the time argument; non-OOB predictions are available by setting oob = FALSE.
If a randomForest object lacks stored importance scores, gg_vimp() tries to compute them on the fly. When the forest truly cannot provide the information (for example when importance = FALSE and the predictors are no longer accessible), the function emits a warning and returns NA placeholders so plots still render.
Use ?gg_error, ?gg_variable, ?gg_vimp, and ?quantile_pts for additional arguments and examples.
Pair these data objects with your own ggplot2 themes to align with your preferred publication style.
These binaries (installable software) and packages are in development. They may not be fully stable and should be used with caution. We make no claims about them.