The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
quickOutlier is a comprehensive toolkit for Data Mining
in R. It simplifies the process of detecting, visualizing, and treating
anomalies in your datasets using both statistical and machine learning
approaches.
The most common way to find outliers is looking at one variable at a time.
# Create dummy data with one obvious outlier (500)
df <- data.frame(
id = 1:10,
revenue = c(10, 12, 11, 10, 12, 11, 13, 10, 500, 11)
)
# Detect using Interquartile Range (IQR)
outliers <- detect_outliers(df, column = "revenue", method = "iqr")
print(outliers)
#> id revenue iqr_bounds
#> 9 9 500 [5 - 17.25]Visual inspection is crucial. quickOutlier provides an
instant ggplot2 visualization to see where your anomalies
fall compared to the distribution.
Sometimes you don’t want to delete the data, but “cap” it to a maximum reasonable value. This is called Winsorization.
Some outliers are only visible when looking at two variables together (e.g., a person who is short but weighs a lot).
# Generate data: y correlates with x
df_multi <- data.frame(x = rnorm(50), y = rnorm(50))
df_multi$y <- df_multi$x * 2 + rnorm(50, sd = 0.5)
# Add an anomaly: normal x, but impossible y given x
anomaly <- data.frame(x = 0, y = 10)
df_multi <- rbind(df_multi, anomaly)
# Detect using Mahalanobis Distance
detect_multivariate(df_multi, columns = c("x", "y"))
#> x y mahalanobis_dist
#> 51 0 10 43.42For complex clusters where statistical methods fail, we use the Local Outlier Factor (LOF). This identifies points that are isolated from their local neighbors.
If you have a large dataset, you can scan all numeric columns at once to get a summary report.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.