Type: | Package |
Title: | Dependence Tests for Two Variables |
Version: | 0.2.0 |
Author: | Jeffrey C. Miecznikowski, En-shuo Hsu, Yanhua Chen, Albert Vexler |
Maintainer: | En-shuo Hsu <daviden1013@gmail.com> |
Description: | Provides test statistics, p-value, and confidence intervals based on 9 hypothesis tests for dependence. |
License: | GPL-3 |
LazyData: | TRUE |
Imports: | Rcpp (≥ 0.12.7), methods |
Depends: | R (≥ 3.2.5), parallel, minerva, Hmisc |
LinkingTo: | Rcpp |
RoxygenNote: | 5.0.1 |
NeedsCompilation: | yes |
Packaged: | 2017-01-19 19:54:18 UTC; david_000 |
Repository: | CRAN |
Date/Publication: | 2017-01-20 10:49:22 |
Draw Kendall plot and compute AUK.
Description
This function draws Kendall plot of 2 variables. Also provides an index AUK (area under Kendall plot).
Usage
AUK(x, y, plot = F, main = "Kendall plot", Auxiliary.line = T,
BS.CI = 0, set.seed = FALSE)
Arguments
x |
a numeric vector stores first variable. |
y |
a numeric vector stores second variable. |
plot |
a TRUE/ FALSE flag for generating Kendall plot or not. |
main |
a character indicating the title of the plot. |
Auxiliary.line |
a TRUE/ FALSE flag for drawing auxiliary lines or not. |
BS.CI |
a numeric specifying alpha for Bootstrap confidence interval. When euqal 0, confidence interval won't be computed. |
set.seed |
a TRUE/ FALSE flag specifying setting seed or not. |
Details
AUK is bounded between 0 and 0.75. For positively correlated x and y's, say x = y, AUK = 0.75. And the plot follows the concave auxiliary line. While negatively correlated x and y's, AUK = 0. The plot is horizontal on y = 0. For independent x and y, AUK = 0.5. Kendall plot is on the diagonal. Due to possible variable overflow, this function is only suitable for input size less than 1000. Input size greater than 1000 causes error.
Value
a list containing a numeric AUK, a numeric vector W.in (x axis of plot), a numeric vector Hi.sort (y axis of plot), and three confidence intervals: normal CI, pivotal CI and percentage CI.
Author(s)
Jeffrey C. Miecznikowski, En-shuo Hsu, Yanhua Chen, Albert Vexler
References
Vexler, Albert, Xiwei Chen, and Alan D. Hutson. "Dependence and independence: Structure and inference." Statistical methods in medical research (2015): 0962280215594198.
R package "VineCopula": Schepsmeier, Ulf, et al. "Package 'VineCopula'." (2015).
Examples
set.seed(123)
x = runif(100)
y = runif(100)
result = AUK(x, y, plot = TRUE)
result$AUK
#[1] 0.4987523
Empirical Likelihood based test for dependence
Description
Empirical Likelihood based test for dependence. See references.
References
Einmahl, J. H., & McKeague, I. W. (2003). Empirical likelihood based hypothesis testing. Bernoulli, 267-290.
Hoeffding's test for dependence
Description
Test statistic is computed by hoeffd{Hmisc}. See hoeffd. Note that test statistic D is 30 times the original test statistic in the original publication.
References
Harrell Jr FE, Dupont MC (2006). "The Hmisc Package." R package version, 3, 0-12.
Kallenberg test for dependence
Description
Includes TS2 and V. See reference.
References
Kallenberg WC, Ledwina T (1999). Data-Driven Rank Tests for Independence." 94. doi: 10.1080/01621459.1999.10473844.
Kendall test for dependence
Description
Test statistic is computed by cor.test{stats}. See cor.test. Note that test statistic returned is the pivot z that approximately follows normal distribution.
LSAT dataset
Description
A dataset of average law school admission test (LSAT) and grade point average (GPA) from 82 American law schools participated in a large study of admission practices.
Usage
data("LSAT")
Format
A data frame with 82 observations on the following 3 variables.
School
a numeric vector of school numbers.
LSAT
a numeric vector of LSAT's.
GPA
a numeric vector of GPA's.
Details
details see references.
Source
Efron B, Tibshirani RJ (1994). An Introduction to the Bootstrap. CRC Press.
References
Efron B, Tibshirani RJ (1994). An Introduction to the Bootstrap. CRC Press.
MIC test for dependence
Description
Test statistic is computed by mine{minerva}. See mine.
Pearson test for dependence
Description
Pearson test for linear dependence. Note that test statistic returned is the pivot t that follows Student's t distribution.
Spearman test for dependence
Description
Test statistic is computed by cor.test{stats}. See cor.test. Note that test statistic returned is the pivot t that approximately follows Student's t distribution. Spearman test cannot handle tie. Since bootstrap resamples with replacement which generates ties, bootstrap confidnece interval does not apply. Setting BS.CI > 0 throughs warning message.
Vexler's test for dependence
Description
A method based on empirical likelihood ratio test. Published by Dr. Vexler in 2014. See reference.
References
Vexler A, Tsai WM, Hutson AD (2014). A Simple Density-Based Empirical Likelihood Ratio Test for Independence."
Test dependence for two data
Description
This function computes test statistic, p value, and confidence interval for dependence based on classic methods: Pearson, Kendall, Spearman, and modern methods: Vexler, Kallenberg, MIC, Hoeffding, and Empirical Likelihood tests.
Usage
testforDEP(x = NA, y = NA, data = NA, test, p.opt = "MC",
num.MC = 10000, BS.CI = 0, rm.na = FALSE, set.seed = FALSE)
Arguments
x |
a numeric vector stores first variable. |
y |
numeric vector stores second variable. |
data |
(Optional) a data frame stores data to be tested. |
test |
a character indicating which test to implement.. Must be one of {"PEARSON", "KENDALL", "SPEARMAN", "VEXLER", "TS2", "V", "MIC", "HOEFFD", "EL"} |
p.opt |
a character specifying p value to be obtained by distribution or by Monte Carlo simulation. Must be "dist", "MC" or "table". |
num.MC |
a numeric for number of Monte Carlo simulations. |
BS.CI |
a numeric specifying alpha for Bootstrap confidence interval. When equal 0, confidence interval won't be computed. |
rm.na |
a TRUE/ FALSE flag indicating whether remove missing data (NA) in input. |
set.seed |
a TRUE/ FALSE flag indicating whether set seed for Monte Carlo simulation and bootstrap sampling. |
Details
Argument "x, y" and "data" are two different ways to input data. When x or y is missing, data will be taken as input; while x, y and data all exist leads to error. Argument data is a two-column numeric data frame. The order of columns does not affect results. Since modern test methods: "VEXLER", "TS2", "V", "MIC", "HOEFFD", and "EL" have no continuous probability density function, argument p.opt = "dist" does not apply. For classic methods, when p.opt is "dist", argument num.MC will be ignored. p.opt = "table" use interpolation from pre stored simulated tables. Current version only supports "VEXLER", "MIC", "HOEFFD" and "EL" tests. For Vexler, MIC and EL, since computation is more time-consuming, a warning with estimated execution time will be returned when input size > 100. Input size <= 100 is recommanded for Monte Carlo p-value. For input size > 100 use table. num.MC should be a integer between 100 and 10,000 for acceptable computation times. NA in input is not acceptable. Set rm.na = TRUE to remove. More details see Pearson, Kendall, Spearman, Vexler, Kallenberg, MIC, Hoeffding, EL.
Value
an S4 object of class "testforDEP_result", having attributes: test statistics (TS), p value (p_value) and confidence interval (CI) if apply.
Author(s)
Jeffrey C. Miecznikowski, En-shuo Hsu, Yanhua Chen, Albert Vexler
See Also
Technical report: http://sphhp.buffalo.edu/content/dam/sphhp/biostatistics/Documents/techreports/UB-Biostatistics-TR1701.pdf
Examples
set.seed(123)
x = runif(100, 0, 1)
y = runif(100, 0, 1)
testforDEP(x, y, test = "SPEARMAN", p.opt = "MC",
num.MC = 10000, BS.CI = 0, set.seed = TRUE)
#An object of class "testforDEP_result"
#Slot "TS":
#[1] 59.54311
#Slot "p_value":
#[1] 0.6735326
#Slot "CI":
#list()