demo-truh

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

demo-truh

This vignette provides a quick demo of the truh package. The example that we consider here is taken from Figure 3 of the paper: Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020) <DOI: 10.1214/20-AOAS1362>.

We will consider a nonparametric two sample testing problem where the \(d\) dimensional baseline (or uninfected) sample \(\boldsymbol{U}=(U_1,\ldots,U_n)\) are i.i.d with cdf \(F_0\) and the \(d\) dimensional treated (infected) sample \(\boldsymbol{V}=V_1,\ldots,V_m\) are i.i.d with cdf \(G\). Here, we assume that the heterogeneity in the baseline population is reflected by \(K\) different subgroups, each having unimodal distributions with distinct modes and cdfs \(F_1,\ldots,F_K\), and mixing proportions \(w_1,\ldots,w_K\) such that \[F_0=\sum_{a=1}^{K}w_aF_a~\text{where}~w_a\in(0,1)~\text{and}~\sum_{a=1}^{K}w_a=1. \]

The goal is to test the following composite hypothesis: \[H_0:G\in\mathcal{F}(F_0)~\text{versus}~H_1:G\notin\mathcal{F}(F_0), \] where \(\mathcal{F}(F_0)\) is the convex hull of \(F_1,\ldots,F_K\). We take \(d=2,n=2000,m=500\) and sample \(U_1,\ldots,U_n\) from \(F_0\) where \[F_0=0.3N(\boldsymbol{0},\boldsymbol{I}_2)+0.3N(\boldsymbol{\mu}_1,\boldsymbol{I}_2)+0.4N(\boldsymbol{\mu}_2,\boldsymbol{I}_2), \] with \(\boldsymbol{\mu}_1=(0,-4)\) and \(\boldsymbol{\mu}_2=(4,-2)\).

n = 2000
d = 2

#Sampling the baseline (uninfected)
set.seed(1)
p<-runif(n,0,1)
set.seed(10)
U<- (p<=0.3)*matrix(rnorm(d*n),n,d)+
  (p>0.3 & p<=0.6)*cbind(matrix(rnorm(n),n,1),
                matrix(rnorm(n,-4),n,1))+
  (p>0.6)*cbind(matrix(rnorm(n,4),n,1),
          matrix(rnorm(n,-2),n,1))

# Sampling the treated (infected)
m = 500
set.seed(50)
V1<-cbind(matrix(rnorm(m,4),m,1),
          matrix(rnorm(m,-2),m,1))

#Scatter plot of the data
grp = c(rep('Baseline',n),
                    rep('Treated',m))
plot(c(U[,1],V1[,1]), c(U[,2],V1[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

# Sampling the treated (infected)
m = 500
set.seed(20)
q<-runif(m,0,1)
set.seed(50)
V2<-(q<=0.5)*cbind(matrix(rnorm(m,2),m,1),
          matrix(rnorm(m,-2),m,1))+
  (q>0.5)*cbind(matrix(rnorm(m,3),m,1),
          matrix(rnorm(m,3),m,1))

#Scatter plot of the data
plot(c(U[,1],V2[,1]), c(U[,2],V2[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

# Sampling the treated (infected)
m = 500
set.seed(20)
q<-runif(m,0,1)
set.seed(50)
V3<-(q<=0.8)*matrix(rnorm(d*m),m,d)+
  (q>0.8 & q<=0.9)*cbind(matrix(rnorm(m),m,1),
                matrix(rnorm(m,-4),m,1))+
  (q>0.9)*cbind(matrix(rnorm(m,4),m,1),
          matrix(rnorm(m,-2),m,1))

#Scatter plot of the data
plot(c(U[,1],V3[,1]), c(U[,2],V3[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

Let us now execute the truh testing procedure for these scenarios. Recall that the goal is to test the following composite hypothesis: \[H_0:G\in\mathcal{F}(F_0)~\text{versus}~H_1:G\notin\mathcal{F}(F_0). \] - Setting 1: Here we know that \(G=F_0\) and so \(H_0\) is true.

library(truh)
truh.1 = truh(V1,U,B=200)
truh.1$pval

library(truh)
truh.2 = truh(V2,U,B=200)
truh.2$pval

library(truh)
truh.3 = truh(V3,U,B=200)
truh.3$pval

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.