The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Recall that to determine confidence interval bounds, we solve the equation \[\begin{align*} F_Y(y_{\rm obs} \vert \theta) - q = 0 \end{align*}\] for \(\theta\). Here, \(Y\) is our chosen statistic (e.g., \(\bar{X}\)), \(F_Y(\cdot)\) is the cumulative distribution function (or cdf) for \(Y\)’s sampling distribution, \(y_{\rm obs}\) is the observed statistic value, and \(q\) is a quantile that we determine given the confidence coefficient \(1-\alpha\) and the type of interval we are constructing (one-sided lower bound, one-sided upper bound, or two-sided).
In this vignette, we examine the situation in which the sampling
distribution cdf is not analytically derivable, but where we
can simulate individual data using code in an existing
R
package (e.g., rnorm()
). In this situation,
we would create a function that, given a simulated dataset, returns the
statistic value, and we would pass that function into
cdfinv.sim()
.
Let’s suppose we sample \(n = 6\) data from a \(\text{Beta}(a,2.8)\) distribution and we wish to compute a 95% lower bound for \(a\), given an observed statistic value.
Given that the beta distribution is a member of the exponential family, it makes sense to determine and apply a sufficient statistic. For a \(\text{Beta}(a,2.8)\) distribution, a sufficient statistic, found via likelihood factorization, is \[\begin{align*} Y = \prod_{i=1}^n X_i \,. \end{align*}\] Using products is generally a sub-optimal choice, given the possiblity of floating-point underflows. Since one-to-one functions of sufficient statistics are also sufficient, we adopt \(Y = -\sum_{i=1}^n \log X_i\) instead. The minus sign is included simply to make the statistic values positive.
In R
, we code this statistic as follows:
Below we show how we can pass the function beta.stat
,
now defined in R
’s global environment, to
cdfinv.sim()
. The first three arguments are standard: the
distribution “name” is beta
(cdfinv()
will
tack a p
on the front), the parameter of interest is
shape1
, and the (assumed) observed statistic value is 10.5.
The next argument specifies that we wish to compute a (95% by default)
lower bound on \(a\) (aka
shape1
), instead of the default two-sided interval. As we
know that \(a > 0\), we specify a
lower parameter bound (lpb
) of 0. The sample size is
nsamp
(here, 6), and stat.func
is the function
we just defined above, beta.stat
. The last argument is an
extra one specifically required for rbeta()
.
require(cdfinv)
cdfinv.sim("beta","shape1",10.5,bound="lower",lpb=0,nsamp=6,stat.func=beta.stat,shape2=2.8)
## $DISTR
## [1] "beta"
##
## $PARAM
## [1] "shape1"
##
## $STAT
## [1] 10.5
##
## $bound
## [1] 0.5018137
##
## $q
## [1] 0.05
The 95% lower bound on \(a\) is
\(\hat{\theta}_L = 0.502\). Note that
because we work with empirical sampling distributions, this value is an
estimate that under default conditions takes \(\sim\) 1-10 CPU seconds to compute on a
typical computer. For greater precision, one should consider increasing
numsim
from its default value of \(10^5\) to \(10^6\) or greater, while being mindful of
the increased time needed to complete the computation.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.