The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
{missRanger} is a multivariate imputation algorithm based on random forests. It is a fast alternative to the famous ‘MissForest’ algorithm (Stekhoven and Buehlmann, 2012), and uses the {ranger} package (Wright and Ziegler, 2017) to fit the random forests. Since version 2.6.0, out-of-sample application is possible.
# From CRAN
install.packages("missRanger")
# Development version
::install_github("mayer79/missRanger") devtools
library(missRanger)
set.seed(3)
<- generateNA(iris, p = 0.1)
iris_NA head(iris_NA)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 5.1 3.5 1.4 0.2 setosa
# 4.9 3.0 1.4 NA setosa
# 4.7 3.2 1.3 0.2 setosa
# 4.6 3.1 1.5 0.2 <NA>
# NA 3.6 1.4 0.2 setosa
# 5.4 3.9 1.7 0.4 <NA>
<- missRanger(iris_NA, pmm.k = 5, num.trees = 100)
iris_filled head(iris_filled)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.2 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
The algorithm iterates until the average out-of-bag (OOB) error of the forests stops improving. The missing values are filled by OOB predictions of the best iteration, optionally followed by predictive mean matching (PMM). The PMM step avoids values not present in the original data (like a value 0.3334 in a 0-1 coded variable). Furthermore, PMM raises the variance in the resulting conditional distributions to a more realistic level, a crucial property for multiple imputation.
Check-out the vignettes for more info, and for how to use
missRanger()
in multiple imputation.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.