The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The fitur
package includes several tools for visually inspecting how good of a fit a distribution is. To start, fictional empirical data is generated below. Typically this would come from a real-world dataset such as the time it takes to serve a customer at a bank, the length of stay in an emergency department, or customer arrivals to a queue.
set.seed(438)
<- rweibull(10000, shape = 5, scale = 1) x
Below is a histogram showing the shape of the distribution and the y-axis has been set to show the probability density.
<- data.frame(x)
dt <- 30
nbins <- ggplot(dt, aes(x)) +
g geom_histogram(aes(y = ..density..),
bins = nbins, fill = NA, color = "black") +
theme_bw() +
theme(panel.grid = element_blank())
g
Three distributions have been chosen below to test against the dataset. Using the fit_univariate
function, each of the distributions are fit to a fitted object. The first item in each of the fits is the probabilty density function. Each fit is overplotted onto the histogram to see which distribution fits best.
<- c('gamma', 'lnorm', 'weibull')
dists <- lapply(dists, fit_univariate, x = x) multipleFits
## $start.arg
## $start.arg$shape
## [1] 18.97398
##
## $start.arg$rate
## [1] 20.68217
##
##
## $fix.arg
## NULL
##
## $start.arg
## $start.arg$meanlog
## [1] -0.1162831
##
## $start.arg$sdlog
## [1] 0.2560369
##
##
## $fix.arg
## NULL
##
## $start.arg
## $start.arg$shape
## [1] 4.686591
##
## $start.arg$scale
## [1] 1.005784
##
##
## $fix.arg
## NULL
plot_density(x, multipleFits, 30) + theme_bw() +
theme(panel.grid = element_blank())
The next plot used is the quantile-quantile plot. The plot_qq
function takes a numeric vector x of the empirical data and sorts them. A range of probabilities are computed and then used to compute comparable quantiles using the q
distribution function from the fitted objects. A good fit would closely align with the abline y = 0 + 1*x. Note: the q-q plot tends to be more sensitive around the “tails” of the distributions.
plot_qq(x, multipleFits) +
theme_bw() +
theme(panel.grid = element_blank())
The Percentile-Percentile plot rescales the input data to the interval (0, 1] and then calculates the theoretical percentiles to compare. The plot_pp
function takes the same inputs as the Q-Q Plot but it performs on rescaling of x and then computes the percentiles using the p
distribution of the fitted object. A good fit matches the abline y = 0 + 1*x. Note: The P-P plot tends to be more sensitive in the middle of the distribution.
plot_pp(x, multipleFits) +
theme_bw() +
theme(panel.grid = element_blank())
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.