Repository Mirror for your Cloud Server and Webhosting

Encoding:

UTF-8

Type:

Package

Title:

Density Goodness-of-Fit Test

Version:

0.6.0

Author:

Dimitrios Bagkavos [aut, cre]

Maintainer:

Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

Description:

Provides functions for the implementation of a density goodness-of-fit test, based on piecewise approximation of the L2 distance.

Imports:

fGarch, nor1mix, boot, mvtnorm

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

NeedsCompilation:

Packaged:

2023-01-27 19:00:45 UTC; Dimitris

Repository:

CRAN

Date/Publication:

2023-01-27 19:30:02 UTC

Kernel functions

Description

Implements various kernel functions, including boundary, integrated and discrete kernels for use in the definition of the nonparametric estimates

Usage

Biweight(x, ...)
Epanechnikov(x, ...)
Triangular(x, ...)
Gaussian(x, ...)
Rectangular(x, ...)
Epanechnikov2(x)

Arguments

x

A vector of data points where the kernel will be evaluated.

...

Further arguments.

Details

Implements the Biweight, Triangular, Guassian, Rectangular and Epanechnikov (including the alternative version in Epanechnikov2) kernels.

Value

The value of the kernel at x

References

Wand and Jones, (1996), Kernel Smoothing, Chapman and Hall, London

Select null distribution

Description

Implements the selection of null distribution; to be used within the implementation of the test statistic S.n

Usage

NDistDens(x, dist, p1, p2)

Arguments

x

A vector of data points - the available sample size.

dist

The null distribution.

p1

Argument 1 (vector or object) for the null distribution.

p2

Argument 2 (vector or object) for the null distribution.

Details

Implements the null distribution evaluation at designated points, given the parameters p1 and p2.

Value

A vector containing the density values of the designated distribution

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Density goodness-of-fit test statistic based on discretized L2 distance

Description

Implements the density goodness of fit test statistic \hat{S}_n(h) of Bagkavos, Patil and Wood (2021), based on aggregation of local discrepancies between the fitted parametric density and a nonparametric empirical density estimator.

Usage

S.n(xin, h,  dist, p1, p2)

Arguments

xin

A vector of data points - the available sample size.

h

The bandwidth to use, typically the output of hopt.edgeworth.

dist

The null distribution.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

Details

Implements the test statistic used for testing the hypothesis

H_0: f(x) = f_0(x, p1, p2) \;\; vs \;\; H_a: f(x) \neq f_0(x, p1, p2).

This density goodness-of-fit test is based on a discretized approximation of the L2 distance. Assuming that n is the number of observations and g = (max(xin)-min(xin))/n^{-drate} is the number of bins in which the range of the data is split, the test statistic is:

S_n(h) = n \Delta^2 h^{-1/2} {\sum\sum}_{i \neq j} K \{ (X_i-X_j)h^{-1}\} \{Y_i -f_0(X_i) \}\{Y_j -f_0(X_j) \}

where K is the Epanechnikov kernel implemented in this package with the Epanechnikov function. The null model f_0 is specified through the dist argument with parameters passed through the p1 and p2 arguments. The test is implemented either with bandwidth hopt.edgeworth or with bandwidth hopt.be which provide the value of h needed for calculation of S_n(h) and the critical value used to determine acceptance or rejection of the null hypothesis. See the example below for an application to a real world dataset.

Value

A vector with the value of the test statistic as well as the Delta value used for its calculation

Author(s)

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Examples

library(fGarch)
library(boot)
 ## Not run: data(EuStockMarkets)
DAX <- as.ts(EuStockMarkets[,"DAX"])
dax <-  diff(log(DAX))#[,"DAX"]

# Fit a GARCH(1,1) model to dax returns:
lll<-garchFit(~ garch(1,1), data = as.ts(dax), trace = FALSE, cond.dist ="std")
# define the model innovations, to be used as input to the test statistic
xin<-lll@residuals /lll@sigma.t
# exclude smallest value - only for uniform presentation of results
#(this step can be excluded):
xin = xin[xin!= min(xin)]

#inputs for the test statistic:
#kernel function to use in implementing the statistic
#and functional estimates for optimal h:
kfun<-"epanechnikov"
a.sig<-0.05 #define the significance level
#null hypothesis is that the innovations are normaly distributed:
Nulldist<-"normal"

p1<-mean(xin)
p2<- sd(xin)
#Power optimal bandwidth:
h<-hopt.edgeworth(xin,   Nulldist, kfun, p1, p2, a.sig )
h.be <- hopt.be(xin)
# Edgeworth cutoff point:
cutoff<-cutoff.edgeworth(xin,   Nulldist, kfun, p1, p2, a.sig )
# Bootstrap cutoff point:
cutoff.boot<-cutoff.bootstrap(xin, 100,  "permutation", Nulldist, h.be, kfun, p1, p2, a.sig)
# Asympt. Norm. cutoff point:
cutoff.asympt<-cutoff.asymptotic( Nulldist,   p1, p2, a.sig )

TestStatistic<-S.n(xin, h, Nulldist, p1, p2)
TestStatistic.be<-S.n(xin, h.be, Nulldist, p1, p2)

cat("L2 test statistic value with power opt. band:", TestStatistic[1],
"\nL2 test statistic value Barry-Essen bandwidth:", TestStatistic.be[1],
"\ncritical value asymptotic:", round(cutoff.asympt,3), "critical value bootstrap:",
round(cutoff.boot,3),  "critical value Edgeworth:", round(cutoff,3), "\n")
#L2 test statistic value Edgeworth: 7.257444
#L2 test statistic value Berry-Esseen bandwidth: 10.97069
# critical value Asymptotically Norm.:  1.801847
# critical value Edgeworth: 2.140446
# critical value bootstrap: 6.040048
# L2 test statistic >  critical value on all occasions, hence normality is rejected
## End(Not run)

Goodness-of-Fit test statistic based on discretized L2 distance

Description

Implements the bootstraped version of the density goodness-of-fit test \hat{S}_n(h) defined in (6) Bagkavos, Patil and Wood (2021).

Usage

S.n.Boot(xin1, indices, h,  dist, kfun, p1, p2)

Arguments

xin1

A vector of data points to perfrom bootstrap on.

indices

indices to use for the bootstrap process.

h

The bandwidth to use, typically the output of hopt.be.

dist

The null distribution.

kfun

The kernel to use in the density estimates used in the bandwidth expression.

p1

Argument 1 (vector or object) for the null distribution.

p2

Argument 2 (vector or object) for the null distribution.

Details

Implements the bootstrap version of the test statistic S.n for use in the cutoff.bootstrap function. This function is typically not to be called directly by the user; it is rather meant to be called indirectly through the cutoff.bootstrap function.

Value

A vector of values of the test statistic.

Author(s)

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Density goodness-of-fit test statistic based on discretized L2 distance

Description

Implements the multivariate (d >=2) density goodness of fit test statistic \hat{S}_n(h) of Bagkavos, Patil and Wood (2021), based on aggregation of local discrepancies between the fitted parametric density and a nonparametric empirical density estimator.

Usage

S.nd(xin, h,  dist, p1, p2)

Arguments

xin

A matrix (n x d) of data points - the available sample with n rows and d columns, each column corresponds to a different coordinate axis.

h

The bandwidth vector to use, typically the output of hopt.be in each coordinate direction.

dist

The null distribution.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

Details

Implements the test statistic used for testing the hypothesis

H_0: f(x) = f_0(x, p1, p2) \;\; vs \;\; H_a: f(x) \neq f_0(x, p1, p2).

S_n(h) = n \Delta^2 {\sum\sum}_{i \neq j} K \{ (X_{i1}-X_{j1})h_1^{-1}, \dots, (X_{id}-X_{jd})h_d^{-1} \} \{Y_i -f_0(X_i) \}\{Y_j -f_0(X_j) \}

Value

A vector with the value of the test statistic as well as the Delta value used for its calculation

Author(s)

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Examples

library(mvtnorm)
sigma <- matrix(c(4,2,2,3), ncol=2)

x <- rmvnorm(n=100, mean=c(1,2), sigma=sigma)
h.be1 <- hopt.be(x[,1])
h.be2 <- hopt.be(x[,2])
h<-c(h.be1, h.be2)
Nulldist<-"normal"

S.nd(x, h,  Nulldist, c(1,2), sigma)

Asymptoticaly normal critical value for the goodness-of-fit test statistic `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Description

Implements an asymptoticaly normal critical value for testing the goodness-of-fit of a parametrically estimated density with the test statistic S.n.

Usage

cutoff.asymptotic(dist,  p1, p2, sig.lev)

Arguments

dist

The null distribution.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

sig.lev

Significance level of the hypothesis test.

Details

Implements the asymptotic critical value defined in Remark 1, Bagkavos, Patil and Wood (2021), equal to z_\alpha \sigma_{0, \theta_0} where z_\alpha is the 1-\alpha quantile of the normal distribution and

\sigma_{0, \theta_0}^2 = 2 \left (\int K^2(u)\,du \right ) \left (\int f^2_0(x; \theta_0)\,dx \right ).

Value

A scalar, the estimate of the asymptotic critical value at the given significance level.

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Bootstrap critical value for the goodness-of-fit test statistic `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Description

Implements a bootstrap critical value for testing the goodness-of-fit of a parametrically estimated density with the test statistic S.n.

Usage

cutoff.bootstrap(xin, M,  sim, dist, h.use, kfun, p1, p2, sig.lev)

Arguments

xin

A vector of data points - the available sample.

M

Number of bootstrap replications.

sim

A character string indicating the type of simulation required: "ordinary" (the default), "parametric", "balanced", "permutation", or "antithetic".

dist

The null distribution.

h.use

The test statistic bandwidth, best implemented with hopt.be.

kfun

The kernel to use in the density estimates used in the bandwidth expression.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

sig.lev

Significance level of the hypothesis test.

Details

Implements the bootstrap based finite sample critical value defined in Section 2.6, Bagkavos, Patil and Wood (2021), and calculated as follows:

1. Resample the observations \mathcal{X}=\{X_1, \dots, X_n\} to obtain M bootstrap samples, denoted by \mathcal{X}_m^\ast=\{ X_{1m}^\ast, \dots, X_{nm}^\ast\}, where for each m=1,\ldots , M, \mathcal{X}_m^\ast is sampled randomly, with replacement, from \mathcal{X}. Write \hat{\theta}=\theta(\mathcal{X}) for the estimator of \theta based on the original sample \mathcal{X} and, for each m, define the bootstrap estimator of \theta by \hat{\theta}_m^\ast = \theta(\mathcal{X}_m^\ast), where \theta(\cdot) is the relevant functional for the parameter \theta.

2. For m=1, \ldots , M, use \mathcal{X}_m^\ast =\{X_{1m}^\ast, \dots, X_{nm}^\ast\} and \hat \theta_m^\ast from the previous step to calculate n \Delta^{2d} h^{-d/2} \hat S_{n,m}^\ast(h\rho),m=1, \dots, M.

3. Calculate \ell_\alpha^\ast as the 1-\alpha empirical quantile of the values n \Delta^{2d} h^{-d/2} \hat S_{n,m}^\ast(h\rho), m=1, \dots, M. Then \ell_\alpha^\ast approximately satisfies P^\ast [ n \Delta^{2d} h^{-d/2}\hat S_{n,m}^\ast(h\rho)> \ell_\alpha^\ast ]=1-\alpha, where P^\ast indicates the bootstrap probability measure conditional on \mathcal{X}.

Value

A scalar, the estimate of the bootstrap critical value at the given significance level.

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)

Examples

library(nor1mix)
library(boot)
SampleSize<-80
M<-1000
dist<- "normixt"
kfun<- Epanechnikov
p1 <-MW.nm2
p2 <-1
sig.lev <- 0.05

sim<-"ordinary"
## Not run: 
#Run the following to compare the asymptotic and bootstrap cut-off points on 4 occasions:
for(i in 15:18)
  {
    set.seed(i)
    xin<-rnorMix(SampleSize, p1)
    h.use <- hopt.be(xin)
    l.a.a<-cutoff.asymptotic( dist,   p1, p2, sig.lev )
    l.a.b<- cutoff.bootstrap(xin,  M,  sim, dist, h.use,  kfun, p1, p2, sig.lev)
    #print the result of each iteration:
    cat("Asympt. cut.off= ", l.a.a, "Boot. cut.off= ", l.a.b,  "\n")
   }

## End(Not run)

Critical value based on Edgeworth expansion of the size function for the density goodness-of-fit test `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Description

Implements the critical value for the density goodness-of-fit test S.n, approximating via an Edgeworth expansion the size function of the test statistic S.n.

Usage

cutoff.edgeworth(xin, dist, kfun, p1, p2, sig.lev)

Arguments

xin

A vector of data points - the available sample.

dist

The null distribution.

kfun

The kernel to use in the density estimates used in the bandwidth expression.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

sig.lev

Significance level of the hypothesis test.

Details

Implements the critical value for the density goodness-of-fit test S.n, approximating via an Edgeworth expansion the size function of the test statistic S.n, given by

l_\alpha = z_\alpha + d_0 \sqrt{h} + d_2(n \sqrt{h})^{-1}

where z_\alpha is the 1-\alpha quantile of the normal distribution and d_0 = d_1 - C_{ H_0} and

d_j = (z_\alpha^2 - 1)c_j, j=1,2

with

c_1 = \frac{4K^{(3)}(0)\mu_2^3 \nu_3}{3\sigma^3}, \; c_2 = \frac{\mu_3^2K^2(0)}{\sigma^3}, \; \mu_i =\int K^i(x)\,dx, i=1,\dots.

and

C_{H_0} = 2\left (E f_0'( \theta_0) \right )^2 \Delta^{-1}, \; \nu_i = E \left \{f^{i}(x)\right \} = \int f^{i+1}(x)\,dx, i=1,\dots

This critical value is the density function equivalent to the critical value estimate obtained in the closely relatated regression setting in Gao and Gijbels (2008) and is suitable for finite sample implementations of the test.

Value

A scalar, the estimate of the critical value at the given significance level.

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)

Power-optimal bandwidth for the test statistic `\hat{S}_n(h)`

Description

Implements an optimal, with respect to Berry-Esseen bound, bandwidth for the density goodness-of-fit test \hat{S}_n(h) of Bagkavos, Patil and Wood (2021).

Usage

hopt.be(xin)

Arguments

xin

A vector of data points - the available sample.

Details

Implements the Berry-Esseen bound optimal bandwidth defined in (18), Bagkavos, Patil and Wood (2022), given by

h = n^{-1/2} \sqrt{\frac{\hat \nu_p R_4(K)}{\rho_\ast^2 \hat \nu_4 I_0(K)} },

where

\hat \nu_p = n^{-1} \sum_{j=1}^n \hat f(X_j; \hat h_a),

and \hat h_a is the density optimal bandwidth calculated by a reference to a prametric distribution, \rho_\star=1 and

R_4(K)=\int K^4(x)\,dx.

Value

The estimate of the Berry-Esseen optimal bandwidth.

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.

Power-optimal bandwidth for the density goodness-of-fit test `S.n`.

Description

Implements the power-optimal bandwidth for density goodness-of-fit test S.n based on optimization of the test statistic's power function.

Usage

hopt.edgeworth(xin, dist, kfun, p1, p2, sig.lev)

Arguments

xin

A vector of data points - the available sample.

dist

The null distribution.

kfun

The kernel to use in the density estimates used in the bandwidth expression.

p1

Parameter 1 (vector or object) for the null distribution.

p2

Parameter 2 (vector or object) for the null distribution.

sig.lev

Significance level of the hypothesis test.

Details

Implements: the power-optimal bandwidth for the test statistic S.n given by

h = \left \{ \frac{\sqrt{2} K^{(3)}(0)}{3R(K)^{3/2}} \frac{\nu_2}{R(f)^{3/2}}\right \}^{-1/2} \left \{ \frac{n \int \Delta_n^2 (x) f^2(x)\,dx}{\sigma^2 \{ 2 \nu_2 R(K)\}^{1/2}} \right \}^{-3/2}.

This bandwidth rule is the density function equivalent bandwidth rule obtained in the closely relatated regression setting in Gao and Gijbels (2008) and is designed to optimize the test's power subject to keeping the size contant.

Value

A scalar, the estimate the power-optimal bandwidth.

Author(s)

Dimitrios Bagkavos

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)

Kernel Density Estimation

Description

Implements the (classical) kernel density estimator, see (2.2a) in Silverman (1986).

Usage

kde(xin, xout, h, kfun)

Arguments

xin

A vector of data points. Missing values not allowed.

xout

A vector of grid points at which the estimate will be calculated.

h

A scalar, the bandwidth to use in the estimate, e.g. bw.nrd(xin)

kfun

Kernel function to use. Supported kernels: Epanechnikov, Biweight, Gaussian, Rectangular, Triangular.

Details

The classical kernel density estimator is given by

\hat f(x;h) = n^{-1}\sum_{i=1}^n K_h(x-X_{i})

h is determined by a bandwidth selector such as Silverman's default plug-in rule.

Value

A vector with the density estimates at the designated points xout.

Author(s)

R implementation and documentation: Dimitrios Bagkavos <dimitrios.bagkavos@gmail.com>

References

Silverman (1986), Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.

Examples

x<-seq(-5, 5,length=100)          #design points where the estimate will be calculated
plot(x, dnorm(x),  type="l", xlab = "x", ylab="density") #plot true density function
SampleSize <- 100
ti<- rnorm(SampleSize)            #draw a random sample from the actual distribution

huse<-bw.nrd(ti)
arg2<-kde(ti, x, huse, Epanechnikov) #Calculate the estimate
lines(x, arg2, lty=2)             #draw the result on the graphics device.

Kernel functions

Description

Usage

Arguments

Details

Value

References

Select null distribution

Description

Usage

Arguments

Details

Value

Author(s)

References

Density goodness-of-fit test statistic based on discretized L2 distance

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Goodness-of-Fit test statistic based on discretized L2 distance

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Density goodness-of-fit test statistic based on discretized L2 distance

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Asymptoticaly normal critical value for the goodness-of-fit test statistic \hat{S}_n(h) of Bagkavos, Patil and Wood (2021)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Bootstrap critical value for the goodness-of-fit test statistic \hat{S}_n(h) of Bagkavos, Patil and Wood (2021)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Critical value based on Edgeworth expansion of the size function for the density goodness-of-fit test \hat{S}_n(h) of Bagkavos, Patil and Wood (2021)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Power-optimal bandwidth for the test statistic \hat{S}_n(h)

Description

Usage

Arguments

Details

Value

Author(s)

References

Asymptoticaly normal critical value for the goodness-of-fit test statistic `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Bootstrap critical value for the goodness-of-fit test statistic `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Critical value based on Edgeworth expansion of the size function for the density goodness-of-fit test `\hat{S}_n(h)` of Bagkavos, Patil and Wood (2021)

Power-optimal bandwidth for the test statistic `\hat{S}_n(h)`

Power-optimal bandwidth for the density goodness-of-fit test `S.n`.