Title: | Simple and Flexible Tests of Sample Exchangeability |
Version: | 0.1.0 |
Date: | 2023-03-22 |
Maintainer: | Alan Aw <alanaw1@berkeley.edu> |
Description: | Given a multivariate dataset and some knowledge about the dependencies between its features, it is customary to fit a statistical model to the features to infer parameters of interest. Such a procedure implicitly assumes that the sample is exchangeable. This package provides a flexible non-parametric test of this exchangeability assumption, allowing the user to specify the feature dependencies by hand as long as features can be grouped into disjoint independent sets. This package also allows users to test a dual hypothesis, which is, given that the sample is exchangeable, does a proposed grouping of the features into disjoint sets also produce statistically independent sets of features? See Aw, Spence and Song (2023) for the accompanying paper. |
License: | GPL (≥ 3) |
Imports: | Rcpp (≥ 1.0.6), doParallel, foreach, assertthat, testthat, stats, utils |
Suggests: | devtools |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://alanaw1.github.io/flintyR/ |
BugReports: | https://github.com/alanaw1/flintyR/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
NeedsCompilation: | yes |
Packaged: | 2023-03-23 06:56:04 UTC; alanaw |
Depends: | R (≥ 3.5.0) |
Author: | Alan Aw |
Repository: | CRAN |
Date/Publication: | 2023-03-23 07:20:02 UTC |
Simple and Flexible Tests of Sample Exchangeability
Description
Given a multivariate dataset and some knowledge about the dependencies between its features, it is customary to fit a statistical model to the features to infer parameters of interest. Such a procedure implicitly assumes that the sample is exchangeable. This package provides a flexible non-parametric test of this exchangeability assumption, allowing the user to specify the feature dependencies by hand as long as features can be grouped into disjoint independent sets. This package also allows users to test a dual hypothesis, which is, given that the sample is exchangeable, does a proposed grouping of the features into disjoint sets also produce statistically independent sets of features? See Aw, Spence and Song (2023) for the accompanying paper.
Package Content
Index of help topics:
blockGaussian Approximate p-value for Test of Exchangeability (Assuming Large N and P with Block Dependencies) blockLargeP Approximate p-value for Test of Exchangeability (Assuming Large P with Block Dependencies) blockPermute p-value Computation for Test of Exchangeability with Block Dependencies buildForward Map from Indices to Label Pairs buildReverse Map from Label Pairs to Indices cacheBlockPermute1 Resampling Many V Statistics (Version 1) cacheBlockPermute2 Resampling Many V Statistics (Version 2) cachePermute Permutation by Caching Distances distDataLargeP Asymptotic p-value of Exchangeability Using Distance Data distDataPValue A Non-parametric Test of Sample Exchangeability and Feature Independence (Distance List Version) distDataPermute p-value Computation for Test of Exchangeability Using Distance Data flintyR-package Simple and Flexible Tests of Sample Exchangeability getBinVStat V Statistic for Binary Matrices getBlockCov Covariance Computations Between Pairs of Distances (Block Dependencies Case) getChi2Weights Get Chi Square Weights getCov Covariance Computations Between Pairs of Distances (Independent Case) getHammingDistance A Hamming Distance Vector Calculator getLpDistance A l_p^p Distance Vector Calculator getPValue A Non-parametric Test of Sample Exchangeability and Feature Independence getRealVStat V Statistic for Real Matrices hamming_bitwise Fast Bitwise Hamming Distance Vector Computation indGaussian Approximate p-value for Test of Exchangeability (Assuming Large N and P) indLargeP Approximate p-value for Test of Exchangeability (Assuming Large P) lp_distance Fast l_p^p Distance Vector Computation naiveBlockPermute1 Resampling V Statistic (Version 1) naiveBlockPermute2 Resampling V Statistic (Version 2) weightedChi2P Tail Probability for Chi Square Convolution Random Variable
Maintainer
Alan Aw <alanaw1@berkeley.edu>
Author(s)
NA
Approximate p-value for Test of Exchangeability (Assuming Large N and P with Block Dependencies)
Description
Computes the large (N,P)
asymptotic p-value for dataset \mathbf{X}
,
assuming its P
features are independent within specified blocks.
Usage
blockGaussian(X, block_boundaries, block_labels, p)
Arguments
X |
The binary or real matrix on which to perform test of exchangeability |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
Details
This is the large N
and large P
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getBlockCov, getChi2Weights
Value
The asymptotic p-value
Approximate p-value for Test of Exchangeability (Assuming Large P with Block Dependencies)
Description
Computes the large P
asymptotic p-value for dataset \mathbf{X}
,
assuming its P
features are independent within specified blocks.
Usage
blockLargeP(X, block_boundaries, block_labels, p)
Arguments
X |
The binary or real matrix on which to perform test of exchangeability |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
Details
This is the large P
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getChi2Weights, weightedChi2P, getBlockCov
Value
The asymptotic p-value
p-value Computation for Test of Exchangeability with Block Dependencies
Description
Generates a block permutation p-value. Uses a heuristic to decide whether to use distance caching or simple block permutations.
Usage
blockPermute(X, block_boundaries = NULL, block_labels = NULL, nruns, type, p)
Arguments
X |
The binary or real matrix on which to perform permutation resampling |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. Default is NULL. |
block_labels |
Length |
nruns |
The resampling number (use at least 1000) |
type |
Either an unbiased estimate ('unbiased'), or exact ('valid') p-value (see Hemerik and Goeman, 2018), or both ('both'). |
p |
The power p of |
Details
Dependencies: buildForward, buildReverse, cachePermute, cacheBlockPermute1, cacheBlockPermute2, getHammingDistance, getLpDistance, naiveBlockPermute1, naiveBlockPermute2
Value
The block permutation p-value
Map from Indices to Label Pairs
Description
Builds a map from indexes to pairs of labels. This is
for caching distances, to avoid recomputing Hamming distances
especially when dealing with high-dimensional (large P
) matrices.
Usage
buildForward(N)
Arguments
N |
Sample size, i.e., nrow( |
Details
Dependencies: None
Value
N \times N
matrix whose entries record the index
corresponding to the pair of labels (indexed by the matrix dims)
Map from Label Pairs to Indices
Description
Builds a map from pairs of labels to indexes. This is
for caching distances, to avoid recomputing Hamming distances
especially when dealing with high-dimensional (large P
) matrices.
Usage
buildReverse(N)
Arguments
N |
Sample size, i.e., nrow( |
Details
Dependencies: None
Value
N \times N
matrix whose entries record the index
corresponding to the pair of labels (indexed by the matrix dims)
Resampling Many V Statistics (Version 1)
Description
Generates a block permutation distribution of V
statistic.
Precomputes distances and some indexing arrays to quickly
generate samples from the block permutation distribution of the V
statistic of \mathbf{X}
.
Usage
cacheBlockPermute1(X, block_labels, nruns, p)
Arguments
X |
The binary or real matrix on which to perform permutation resampling |
block_labels |
Length |
nruns |
The resampling number (use at least 1000) |
p |
The power |
Details
This version is with block labels specified.
Dependencies: buildForward, buildReverse, cachePermute, getHammingDistance, getLpDistance
Value
A vector of resampled values of the V
statistic
Resampling Many V Statistics (Version 2)
Description
Generates a block permutation distribution of V
statistic.
Precomputes distances and some indexing arrays to quickly
generate samples from the block permutation distribution of the V
statistic of \mathbf{X}
.
Usage
cacheBlockPermute2(X, block_boundaries, nruns, p)
Arguments
X |
The binary or real matrix on which to perform permutation resampling |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts |
nruns |
The resampling number (use at least 1000) |
p |
The power p of |
Details
This version is with block boundaries specified.
Dependencies: buildForward, buildReverse, cachePermute, getHammingDistance, getLpDistance
Value
A vector of resampled values of the V
statistic
Permutation by Caching Distances
Description
What do you do when you have to compute pairwise distances many times, and those damn distances take a long time to compute? Answer: You cache the distances and permute the underlying sample labels!
Usage
cachePermute(dists, forward, reverse)
Arguments
dists |
|
forward |
|
reverse |
|
Details
This function permutes the distances (Hamming, l_p^p
, etc.) within blocks.
Permutations respect the fact that we are actually permuting the
underlying labels. Arguments forward and reverse should be
precomputed using buildForward and buildReverse.
Dependencies: buildForward, buildReverse
Value
A matrix with same dimensions as dists containing the block-permuted pairwise distances
Asymptotic p-value of Exchangeability Using Distance Data
Description
Generates an asymptotic p-value.
Usage
distDataLargeP(dist_list)
Arguments
dist_list |
The list (length |
Details
Generates a weighted convolution of chi-squares distribution of V
statistic
by storing the provided list of distance data as an {N\choose2} \times B
array,
and then using large-P
theory to generate the asymptotic null distribution
against which the p-value of observed V
statistic is computed.
Each element of dist_list should be a N\times N
distance matrix.
Dependencies: buildReverse, getChi2Weights, weightedChi2P
Value
The asymptotic p-value obtained from the weighted convolution of chi-squares distribution.
A Non-parametric Test of Sample Exchangeability and Feature Independence (Distance List Version)
Description
The V test computes the p-value of a multivariate dataset, which
informs the user about one of two decisions: (1) whether the sample is exchangeable
at a given significance level, assuming that the feature dependencies are known;
or (2) whether the features or groups of features are independent at a given significance
level, assuming that the sample is exchangeable. This version takes in a list of
distance matrices recording pairwise distances between individuals across B
independent features. It can be used to test one of two hypotheses: (H1) the
sample is exchangeable, assuming that each feature whose pairwise distance data
is available is statistically independent of any other feature, or (H2) the
B
features whose pairwise distance data is available are independent, assuming
that the sample is exchangeable.
Usage
distDataPValue(dist_list, largeP = FALSE, nruns = 1000, type = "unbiased")
Arguments
dist_list |
The list of distances. |
largeP |
Boolean indicating whether to use large |
nruns |
Resampling number for exact test. Default is 1000. |
type |
Either an unbiased estimate of ('unbiased', default), or valid, but biased estimate of, ('valid') p-value
(see Hemerik and Goeman, 2018), or both ('both'). Default is 'unbiased'. Note that unbiased estimate can return |
Details
Dependencies: distDataLargeP and distDataPermute from auxiliary.R
Value
The p-value to be used to test the null hypothesis of exchangeability.
p-value Computation for Test of Exchangeability Using Distance Data
Description
Generates a block permutation p-value.
Usage
distDataPermute(dist_list, nruns, type)
Arguments
dist_list |
The list (length |
nruns |
The resampling number (use at least 1000) |
type |
Either an unbiased estimate ('unbiased'), or exact ('valid') p-value (see Hemerik and Goeman, 2018), or both ('both'). |
Details
Generates a block permutation distribution of V
statistic by storing
the provided list of distance data as an {N\choose2} \times B
array,
and then permuting the underlying indices of each individual to generate
resampled {N\choose2} \times B
arrays. The observed V
statistic is
also computed from the distance data.
Each element of dist_list should be a N\times N
distance matrix.
Dependencies: buildForward, buildReverse, cachePermute
Value
The p-value obtained from comparing the empirical tail cdf of the observed
V
statistic computed from distance data.
V Statistic for Binary Matrices
Description
Computes V
statistic for a binary matrix \mathbf{X}
, as defined in
Aw, Spence and Song (2023).
Usage
getBinVStat(X)
Arguments
X |
The |
Details
Dependencies: getHammingDistance
Value
V(\mathbf{X})
, the variance of the pairwise Hamming distance between samples
Examples
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5))
getBinVStat(X)
Covariance Computations Between Pairs of Distances (Block Dependencies Case)
Description
Computes covariance matrix entries and associated alpha, beta
and gamma quantities defined in Aw, Spence and Song (2023),
for partitionable features that are grouped into blocks. Uses
precomputation to compute the unique entries of the asymptotic
covariance matrix of the pairwise Hamming distances in O(N^2)
time.
Usage
getBlockCov(X, block_boundaries, block_labels, p)
Arguments
X |
The binary or real matrix |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
Details
This is used in the large P
asymptotics of the permutation test.
Dependencies: buildReverse, getHammingDistance, getLpDistance
Value
The three distinct entries of covariance matrix, (\alpha, \beta, \gamma)
Get Chi Square Weights
Description
Computes weights for the asymptotic random variable
from the \alpha, \beta
and \gamma
computed of data array \mathbf{X}
.
Usage
getChi2Weights(alpha, beta, gamma, N)
Arguments
alpha |
covariance matrix entry computed from getCov |
beta |
covariance matrix entry computed from getCov |
gamma |
covariance matrix entry computed from getCov |
N |
The sample size, i.e., nrow(X) where X is the original dataset |
Details
This is used in the large P
asymptotics of the permutation test.
Dependencies: None
Value
The weights (w_1, w_2)
Covariance Computations Between Pairs of Distances (Independent Case)
Description
Computes covariance matrix entries and associated alpha, beta
and gamma quantities defined in Aw, Spence and Song (2023),
assuming the P
features of the dataset \mathbf{X}
are independent.
Usage
getCov(X, p = 2)
Arguments
X |
The binary or real matrix |
p |
The power |
Details
This is used in the large P
asymptotics of the permutation test.
Dependencies: buildReverse, getLpDistance
Value
The three distinct entries of covariance matrix, (\alpha, \beta, \gamma)
A Hamming Distance Vector Calculator
Description
Computes all pairwise Hamming distances for a binary matrix \mathbf{X}
.
Usage
getHammingDistance(X)
Arguments
X |
The |
Details
Dependencies: hamming_bitwise from fast_dist_calc.cpp
Value
A length {N \choose 2}
vector of pairwise Hamming distances
Examples
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5))
getHammingDistance(X)
A l_p^p
Distance Vector Calculator
Description
Computes all pairwise l_p^p
distances for a real matrix \mathbf{X}
,
for a specified choice of Minkowski norm exponent p
.
Usage
getLpDistance(X, p)
Arguments
X |
The |
p |
The power p of |
Details
Dependencies: lp_distance from fast_dist_calc.cpp
Value
A length {N \choose 2}
vector of pairwise l_p^p
distances
Examples
X <- matrix(nrow = 5, ncol = 10, rnorm(50))
getLpDistance(X, p = 2)
A Non-parametric Test of Sample Exchangeability and Feature Independence
Description
The V test computes the p-value of a multivariate dataset \mathbf{X}
, which
informs the user about one of two decisions: (1) whether the sample is exchangeable
at a given significance level, assuming that the feature dependencies are known;
or (2) whether the features or groups of features are independent at a given significance
level, assuming that the sample is exchangeable.
See Aw, Spence and Song (2023) for details.
Usage
getPValue(
X,
block_boundaries = NULL,
block_labels = NULL,
largeP = FALSE,
largeN = FALSE,
nruns = 5000,
type = "unbiased",
p = 2
)
Arguments
X |
The binary or real matrix on which to perform test of exchangeability. |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. Default is NULL. |
block_labels |
Length |
largeP |
Boolean indicating whether to use large |
largeN |
Boolean indicating whether to use large |
nruns |
Resampling number for exact test. Default is 5000. |
type |
Either an unbiased estimate of ('unbiased', default), or valid, but biased estimate of, ('valid') p-value
(see Hemerik and Goeman, 2018), or both ('both'). Default is 'unbiased'. Note that unbiased estimate can return |
p |
The power |
Details
Automatically detects if dataset is binary, and runs the Hamming
distance version of test if so. Otherwise, computes the squared
Euclidean distance between samples and evaluates whether the
variance of Euclidean distances, V
, is atypically large under the
null hypothesis of exchangeability. Note the user may tweak the
choice of power p
if they prefer an l_p^p
distance other than Euclidean.
Under the hood, the variance statistic, V
, is computed efficiently.
Moreover, the user can specify their choice of block permutations,
large P
asymptotics, or large P
and large N
asymptotics. The latter two
return reasonably accurate p-values for moderately large dimensionalities.
User recommendations: When the number of independent blocks B
or number of
independent features P
is at least 50, it is safe to use large P
asymptotics.
If P
or B
is small, however, stick with permutations.
Dependencies: All functions in auxiliary.R
Value
The p-value to be used to test the null hypothesis of exchangeability.
Examples
# Example 1 (get p-value of small matrix with independent features using exact test)
suppressWarnings(require(doParallel))
# registerDoParallel(cores = 2)
X1 <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix, small
getPValue(X1) # perform exact test with 5000 permutations
# should be larger than 0.05
# Example 2 (get p-value of high-dim matrix with independent features using asymptotic test)
X2 <- matrix(nrow = 10, ncol = 1000, rnorm(1e4)) # real matrix, large enough
getPValue(X2, p = 2, largeP = TRUE) # very fast
# should be larger than 0.05
# getPValue(X2, p = 2) # slower, do not run (Output: 0.5764)
# Example 3 (get p-value of high-dim matrix with partitionable features using exact test)
X3 <- matrix(nrow = 10, ncol = 1000, rbinom(1e4, 1, 0.5))
getPValue(X3, block_labels = rep(c(1,2,3,4,5), 200))
# Warning message: # there are features that have zero variation (i.e., all 0s or 1s)
# In getPValue(X3, block_labels = rep(c(1, 2, 3, 4, 5), 200)) :
# There exist columns with all ones or all zeros for binary X.
# Example 4 (get p-value of high-dim matrix with partitionable features using asymptotic test)
## This elaborate example generates binarized versions of time series data.
# Helper function to binarize a marker
# by converting z-scores to {0,1} based on
# standard normal quantiles
binarizeMarker <- function(x, freq, ploidy) {
if (ploidy == 1) {
return((x > qnorm(1-freq)) + 0)
} else if (ploidy == 2) {
if (x <= qnorm((1-freq)^2)) {
return(0)
} else if (x <= qnorm(1-freq^2)) {
return(1)
} else return(2)
} else {
cat("Specify valid ploidy number, 1 or 2")
}
}
getAutoRegArray <- function(B, N, maf_l = 0.38, maf_u = 0.5, rho = 0.5, ploid = 1) {
# get minor allele frequencies by sampling from uniform
mafs <- runif(B, min = maf_l, max = maf_u)
# get AR array
ar_array <- t(replicate(N, arima.sim(n = B, list(ar=rho))))
# theoretical column variance
column_var <- 1/(1-rho^2)
# rescale so that variance per marker is 1
ar_array <- ar_array / sqrt(column_var)
# rescale each column of AR array
for (b in 1:B) {
ar_array[,b] <- sapply(ar_array[,b],
binarizeMarker,
freq = mafs[b],
ploidy = ploid)
}
return(ar_array)
}
## Function to generate the data array with desired number of samples
getExHaplotypes <- function(N) {
array <- do.call("cbind",
lapply(1:50, function(x) {getAutoRegArray(N, B = 20)}))
return(array)
}
## Generate data and run test
X4 <- getExHaplotypes(10)
getPValue(X4, block_boundaries = seq(from = 1, to = 1000, by = 25), largeP = TRUE)
# stopImplicitCluster()
V Statistic for Real Matrices
Description
Computes V
statistic for a real matrix \mathbf{X}
,
where V(\mathbf{X})
= scaled variance of l_p^p
distances between the
row samples of \mathbf{X}
.
Usage
getRealVStat(X, p)
Arguments
X |
The |
p |
The power |
Details
Dependencies: getLpDistance
Value
V(\mathbf{X})
, the variance of the pairwise l_p^p
distance between samples
Examples
X <- matrix(nrow = 5, ncol = 10, rnorm(50))
getRealVStat(X, p = 2)
Fast Bitwise Hamming Distance Vector Computation
Description
Takes in a binary matrix \mathbf{X}
, whose transpose \mathbf{X}^T
has N
rows, and computes a vector recording all
{N \choose 2}
pairwise Hamming distances of \mathbf{X}^T
,
ordered lexicographically.
Usage
hamming_bitwise(X)
Arguments
X |
binary matrix (IntegerMatrix class ) |
Value
vector of Hamming distances (NumericVector class)
Examples
# t(X) = [[1,0], [0,1], [1,1]] --> output = [2,1,1]
Approximate p-value for Test of Exchangeability (Assuming Large N and P)
Description
Computes the large (N,P)
asymptotic p-value for dataset \mathbf{X}
,
assuming its P
features are independent
Usage
indGaussian(X, p = 2)
Arguments
X |
The binary or real matrix on which to perform test of exchangeability |
p |
The power p of |
Details
This is the large N
and large P
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getCov, getChi2Weights
Value
The asymptotic p-value
Approximate p-value for Test of Exchangeability (Assuming Large P)
Description
Computes the large P
asymptotic p-value for dataset \mathbf{X}
,
assuming its P
features are independent.
Usage
indLargeP(X, p = 2)
Arguments
X |
The binary or real matrix on which to perform test of exchangeability |
p |
The power p of |
Details
This is the large P
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getChi2Weights, weightedChi2P, getCov
Value
The asymptotic p-value
Fast l_p^p
Distance Vector Computation
Description
Takes in a double matrix \mathbf{X}
, whose transpose \mathbf{X}^T
has N
rows, and computes a vector recording all
{N \choose 2}
pairwise l_p^p
distances of \mathbf{X}^T
,
ordered lexicographically.
Usage
lp_distance(X, p)
Arguments
X |
double matrix (arma::mat class) |
p |
numeric Minkowski power (double class) |
Value
vector of l_p^p
distances (arma::vec class)
Examples
# X = [[0.5,0.5],[0,1],[0.3,0.7]] --> lPVec = [x,y,z]
# with x = (0.5^p + 0.5^p)
Resampling V Statistic (Version 1)
Description
Generates a new array \mathbf{X}'
under the permutation null and then
returns the V
statistic computed for \mathbf{X}'
.
Usage
naiveBlockPermute1(X, block_labels, p)
Arguments
X |
The |
block_labels |
A vector of length |
p |
The power |
Details
This is Version 1, which takes in the block labels. It is suitable in
the most general setting, where the features are grouped by labels.
Given original \mathbf{X}
and a list denoting labels of each feature,
independently permutes the rows within each block of \mathbf{X}
and returns resulting V
.
If block labels are not specified, then features are assumed independent, which
is to say that block_labels is set to 1:ncol(\mathbf{X}
).
Dependencies: getBinVStat, getRealVStat
Value
V(\mathbf{X}')
, where \mathbf{X}'
is a resampled by permutation of entries blockwise
Examples
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example
naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5), p = 2) # use Euclidean distance
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example
naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5))
Resampling V Statistic (Version 2)
Description
Generates a new array \mathbf{X}'
under the permutation null and then
returns the V
statistic computed for \mathbf{X}'
.
Usage
naiveBlockPermute2(X, block_boundaries, p)
Arguments
X |
The |
block_boundaries |
A vector of length at most P, whose entries indicate positions at which to demarcate blocks |
p |
The power p of |
Details
This is Version 2, which takes in the block boundaries. It is suitable
for use when the features are already arranged such that the block
memberships are determined by index delimiters. Given original \mathbf{X}
and
a list denoting labels of each feature, independently permutes the rows
within each block of \mathbf{X}
and returns resulting V
. If block labels are not specified,
then features are assumed independent, which is to say that block_labels is set to 1:ncol(\mathbf{X}
).
Dependencies: getBinVStat, getRealVStat
Value
V(\mathbf{X}')
, where \mathbf{X}'
is a resampled by permutation of entries blockwise
Examples
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example
naiveBlockPermute2(X, block_boundaries = c(4,7,9), p = 2) # use Euclidean distance
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example
naiveBlockPermute2(X, block_boundaries = c(4,7,9))
Tail Probability for Chi Square Convolution Random Variable
Description
Computes P(X > val)
where X = w_1 Y + w_2 Z
, where
Y
is chi square distributed with d_1
degrees of freedom,
Z
is chi square distributed with d_2
degrees of freedom,
and w_1
and w_2
are weights with w_2
assumed positive.
The probability is computed using numerical integration of the
densities of the two chi square distributions. (Method: trapezoidal rule)
Usage
weightedChi2P(val, w1, w2, d1, d2)
Arguments
val |
observed statistic |
w1 |
weight of first chi square rv |
w2 |
weight of second chi square rv, assumed positive |
d1 |
degrees of freedom of first chi square rv |
d2 |
degrees of freedom of second chi square rv |
Details
This is used in the large P
asymptotics of the permutation test.
Dependencies: None
Value
1 - CDF = P(X > val)