Introduction

The estimators package offers a few convenient functions for parameter estimation in statistics. This guide provides an overview of the package’s capabilities.

library(estimators)

Distributions

There are two popular ways to work with distributions in R. The default stats package includes four functions for each distribution.

shape1 <- 1
shape2 <- 2

dbeta(0.5, shape1, shape2)
#> [1] 1
pbeta(0.5, shape1, shape2)
#> [1] 0.75
qbeta(0.75, shape1, shape2)
#> [1] 0.5
rbeta(2, shape1, shape2)
#> [1] 0.1678726 0.2927314

The distr package S4 system equivalent defines a Distribution object and implements the four functions generically.

library(distr)
D <- Beta(shape1 = shape1, shape2 = shape2)

d(D)(0.5)
#> [1] 1
p(D)(0.5)
#> [1] 0.75
q(D)(0.75)
#> [1] 0.5
r(D)(2)
#> [1] 0.08898852 0.05028710

New Distributions

The estimators package covers the Dirichlet and the Multivariate Gamma distributions. Since both are multivariate distributions, only the density (d) and the simulation (r) functions are implemented.

shape <- 1:3
scale <- 2

set.seed(1)
x1 <- rdirich(100, shape)
x2 <- rmgamma(100, shape, scale)

ddirich(x1[, 1], shape)
#> [1] 7.489563
dmgamma(x2[, 1], shape, scale)
#> [1] 0.1016242

The same utilities are offered in the S4 method style of the distr package.

D1 <- Dirichlet(shape)
D2 <- MGamma(shape, scale)

set.seed(1)
x1 <- r(D1)(100)
x2 <- r(D2)(100)

d(D1)(x1[, 1])
#> [1] 7.489563
d(D2)(x2[, 1])
#> [1] 0.1016242

Parameter Estimation

In order to illustrate the parameter estimation as implemented in the package, a random sample is generated from the Beta distribution.

set.seed(1)
x <- rbeta(100, shape1, shape2)
D <- Beta(shape1, shape2)

Likelihood - The ll Functions

The package implements the ll functions that calculate the log-likelihood. They are offered in two versions, the distribution specific one (llbeta) and the S4 generic one (ll).

llbeta(x, shape1, shape2)
#> [1] 26.56269
ll(x, c(shape1, shape2), D)
#> [1] 26.56269

It is important to note that the S4 methods also accept a character for the distribution. The name should be the same as the S4 distribution generator, case ignored (i.e. “Dirichlet” or “dirichlet” but not “dirich”).

ll(x, c(shape1, shape2), "beta")
#> [1] 26.56269

Point Estimation

Point estimation functions are also offered in two versions, the distribution specific one (ebeta) and the S4 generic one (mle, me, and same). In the first case, the type argument can be used to specify the estimator type.

ebeta(x, type = "mle")
#>   shape1   shape2 
#> 1.066968 2.466715
ebeta(x, type = "me")
#>   shape1   shape2 
#> 1.074511 2.469756
ebeta(x, type = "same")
#>   shape1   shape2 
#> 1.067768 2.454257

mle(x, D)
#>   shape1   shape2 
#> 1.066968 2.466715
me(x, D)
#>   shape1   shape2 
#> 1.074511 2.469756
same(x, D)
#>   shape1   shape2 
#> 1.067768 2.454257

A general function estim is implemented, covering all distributions and estimators. This is in fact the main function of the package.

estim(x, D, type = "mle")
#>   shape1   shape2 
#> 1.066968 2.466715

Again, the S4 methods also accept a character for the distribution, case ignored.

mle(x, "beta")
#>   shape1   shape2 
#> 1.066968 2.466715
estim(x, "Beta", type = "mle")
#>   shape1   shape2 
#> 1.066968 2.466715

Asymptotic Variance

The asymptotic variance (or variance - covariance matrix) of the estimators are also covered in the package. As with point estimation, the implementation is twofold, distribution specific (vbeta) and S4 generic (avar_mle, avar_me, and avar_same). In the first case, the type argument can be used to specify the estimator type.

vbeta(shape1, shape2, type = "mle")
#>          shape1   shape2
#> shape1 1.597168 2.523104
#> shape2 2.523104 7.985838
vbeta(shape1, shape2, type = "me")
#>        shape1 shape2
#> shape1    2.1    3.3
#> shape2    3.3    9.3
vbeta(shape1, shape2, type = "same")
#>          shape1   shape2
#> shape1 1.644934 2.539868
#> shape2 2.539868 8.079736

avar_mle(D)
#>          shape1   shape2
#> shape1 1.597168 2.523104
#> shape2 2.523104 7.985838
avar_me(D)
#>        shape1 shape2
#> shape1    2.1    3.3
#> shape2    3.3    9.3
avar_same(D)
#>          shape1   shape2
#> shape1 1.644934 2.539868
#> shape2 2.539868 8.079736

The general function avar covers all distributions and estimators.

avar(D, type = "mle")
#>          shape1   shape2
#> shape1 1.597168 2.523104
#> shape2 2.523104 7.985838

Estimator Metrics

The estimators can be compared based on both finite sample and asymptotic properties. The package includes the functions small_metrics and large_metrics, where small and large refers to the “small sample” and “large sample” terms that are often used for the two cases. The first one estimates the bias, variance and rmse of the estimator with Monte Carlo simulations, while the latter calculates the asymptotic variance - covariance matrix. The resulting data frames can be plotted with the functions plot_small_metrics and plot_large_metrics, respectively.

The functions get a distribution object and a parameter list that specifies which parameter should change and how. The metric of interest is evaluated as a function of this parameter. Specifically, prm includes three elements named “name”, “pos”, and “val”. The first two elements determine the exact parameter that changes, while the third one is a numeric vector holding the values it takes. For example, in the case of the Multivariate Gamma distribution, D <- MGamma(shape = c(1, 2), scale = 3) and prm <- list(name = "shape", pos = 2, val = seq(1, 1.5, by = 0.1)) means that the evaluation will be performed for the MGamma distributions with shape parameters (1, 1), (1, 1.1), …, (1, 1.5) and scale 3. Notice that the initial shape parameter 2 in D is not utilized in the function.

The following example concerns the small sample metrics for the Dirichlet distribution estimators.

prm <- list(name = "shape",
            pos = 1,
            val = seq(1, 5, by = 0.5))

x <- small_metrics(D1, prm,
             obs = c(20, 50),
             est = c("mle", "same", "me"),
             sam = 5e3,
             seed = 1)

head(x)
#>   Parameter Observations Estimator Metric     Value
#> 1       1.0           20       mle   Bias 0.1027770
#> 2       1.5           20       mle   Bias 0.1487214
#> 3       2.0           20       mle   Bias 0.2083906
#> 4       2.5           20       mle   Bias 0.2665394
#> 5       3.0           20       mle   Bias 0.3266485
#> 6       3.5           20       mle   Bias 0.3738251

plot_small_metrics(x)

The following example concerns the large sample metrics for the Beta distribution estimators.

prm <- list(name = "shape1",
            pos = NULL,
            val = seq(1, 5, by = 0.1))

x <- large_metrics(D, prm,
                   est = c("mle", "same", "me"))

head(x)
#>      Row    Col Parameter Estimator    Value
#> 1 shape1 shape1       1.0       mle 1.597168
#> 2 shape2 shape1       1.0       mle 2.523104
#> 3 shape1 shape2       1.0       mle 2.523104
#> 4 shape2 shape2       1.0       mle 7.985838
#> 5 shape1 shape1       1.1       mle 1.969699
#> 6 shape2 shape1       1.1       mle 2.826906

plot_large_metrics(x)