The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Bagged OutlierTrees is an explainable unsupervised outlier detection method based on an ensemble implementation of the existing OutlierTree procedure (Cortes, 2020). This implementation takes advantage of bootstrap aggregating (bagging) to improve robustness by reducing the possible masking effect and subsequent high variance (similarly to Isolation Forest), hence the name “Bagged OutlierTrees”.
To learn more about the base procedure OutlierTree (Cortes, 2020), please refer to <arXiv:2001.00636> (the corresponding GitHub repository can be found here). This repository and its documentation are heavily based on the latter to ensure consistency and ease-of-use between the packages.
You can install the development version of
bagged.outliertrees
from GitHub with:
# install.packages("devtools")
::install_github("RafaJPSantos/bagged.outliertrees") devtools
This is a basic example which shows you how to find outliers in the hypothyroid dataset:
library(bagged.outliertrees)
### example dataset with interesting outliers
data(hypothyroid)
### fit a Bagged OutlierTrees model
<- bagged.outliertrees(hypothyroid,
model ntrees = 100,
subsampling_rate = 0.75,
z_outlier = 5,
nthreads = 1
)
### use the fitted model to find outliers in the training dataset
<- predict(model,
outliers newdata = hypothyroid,
min_outlier_score = 0.5,
nthreads = 1
)
### print the top-5 outliers in human-readable format
print(outliers, outliers_print = 5)
#> Reporting top 5 outliers [out of 28 found]
#>
#> row [1438] - suspicious column: [FTI] - suspicious value: [394.495412844037]
#> distribution: 99.93% <= [294.9661] - [mean: 109.855] - [sd: 30.3889] - [norm. obs: 956]
#>
#>
#> row [623] - suspicious column: [age] - suspicious value: [455]
#> distribution: 99.92% <= [92.03] - [mean: 53.3543] - [sd: 18.9409] - [norm. obs: 956]
#>
#>
#> row [745] - suspicious column: [T4U] - suspicious value: [2.12]
#> distribution: 99.89% <= [1.7222] - [mean: 0.9971] - [sd: 0.1542] - [norm. obs: 700]
#> [age] > [37.5859] (value: 87)
#>
#>
#> row [1425] - suspicious column: [FTI] - suspicious value: [161.290322580645]
#> distribution: 98.70% <= [104.4645] - [mean: 62.5452] - [sd: 17.6197] - [norm. obs: 89]
#> [TT4] <= [99.0122] (value: 50)
#>
#>
#> row [2110] - suspicious column: [FTI] - suspicious value: [2.38095238095238]
#> distribution: 99.10% >= [49.6384] - [mean: 93.4009] - [sd: 15.6965] - [norm. obs: 188]
#> [TT4] <= [112.4091] (value: 2)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.