demo-truh

This vignette provides a quick demo of the truh package. The example that we consider here is taken from Figure 3 of the paper: Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020) <DOI: 10.1214/20-AOAS1362>.

We will consider a nonparametric two sample testing problem where the d dimensional baseline (or uninfected) sample U = (U₁, …, U_n) are i.i.d with cdf F₀ and the d dimensional treated (infected) sample V = V₁, …, V_m are i.i.d with cdf G. Here, we assume that the heterogeneity in the baseline population is reflected by K different subgroups, each having unimodal distributions with distinct modes and cdfs F₁, …, F_K, and mixing proportions w₁, …, w_K such that $$F_0=\sum_{a=1}^{K}w_aF_a~\text{where}~w_a\in(0,1)~\text{and}~\sum_{a=1}^{K}w_a=1. $$

The goal is to test the following composite hypothesis: H₀ : G ∈ ℱ(F₀) versus H₁ : G ∉ ℱ(F₀), where ℱ(F₀) is the convex hull of F₁, …, F_K. We take d = 2, n = 2000, m = 500 and sample U₁, …, U_n from F₀ where F₀ = 0.3N(0, I₂) + 0.3N(μ₁, I₂) + 0.4N(μ₂, I₂), with μ₁ = (0, −4) and μ₂ = (4, −2).

n = 2000
d = 2

#Sampling the baseline (uninfected)
set.seed(1)
p<-runif(n,0,1)
set.seed(10)
U<- (p<=0.3)*matrix(rnorm(d*n),n,d)+
  (p>0.3 & p<=0.6)*cbind(matrix(rnorm(n),n,1),
                matrix(rnorm(n,-4),n,1))+
  (p>0.6)*cbind(matrix(rnorm(n,4),n,1),
          matrix(rnorm(n,-2),n,1))

To sample V₁, …, V_m we consider three settings for G.

Setting 1: G = N(μ₂, I₂) which is the third component cdf of F₀. In this setting clearly G ∈ ℱ(F₀) and the null hypothesis H₀ is true.

# Sampling the treated (infected)
m = 500
set.seed(50)
V1<-cbind(matrix(rnorm(m,4),m,1),
          matrix(rnorm(m,-2),m,1))

#Scatter plot of the data
grp = c(rep('Baseline',n),
                    rep('Treated',m))
plot(c(U[,1],V1[,1]), c(U[,2],V1[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

Setting 2: G = 0.5N(μ₃, I₂) + 0.5N(μ₄, I₂) where μ₃ = 0.25μ₁ + 0.5μ₂ and μ₄ = (3/4)μ₁ + (9/8)μ₂. Clearly in this case G ∉ ℱ(F₀).

# Sampling the treated (infected)
m = 500
set.seed(20)
q<-runif(m,0,1)
set.seed(50)
V2<-(q<=0.5)*cbind(matrix(rnorm(m,2),m,1),
          matrix(rnorm(m,-2),m,1))+
  (q>0.5)*cbind(matrix(rnorm(m,3),m,1),
          matrix(rnorm(m,3),m,1))

#Scatter plot of the data
plot(c(U[,1],V2[,1]), c(U[,2],V2[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

Setting 3: G = 0.8N(0, I₂) + 0.1N(μ₁, I₂) + 0.1N(μ₂, I₂). This is the most interesting setting as here G ∈ ℱ(F₀) but G ≠ F₀ because the mixing weights differ.

# Sampling the treated (infected)
m = 500
set.seed(20)
q<-runif(m,0,1)
set.seed(50)
V3<-(q<=0.8)*matrix(rnorm(d*m),m,d)+
  (q>0.8 & q<=0.9)*cbind(matrix(rnorm(m),m,1),
                matrix(rnorm(m,-4),m,1))+
  (q>0.9)*cbind(matrix(rnorm(m,4),m,1),
          matrix(rnorm(m,-2),m,1))

#Scatter plot of the data
plot(c(U[,1],V3[,1]), c(U[,2],V3[,2]),
     pch = 19,
     col = factor(grp),
     xlab = 'X_1',
     ylab = 'X_2')

# Legend
legend("topright",
       legend = levels(factor(grp)),
       pch = 19,
       col = factor(levels(factor(grp))))

Let us now execute the truh testing procedure for these scenarios. Recall that the goal is to test the following composite hypothesis: H₀ : G ∈ ℱ(F₀) versus H₁ : G ∉ ℱ(F₀). - Setting 1: Here we know that G = F₀ and so H₀ is true.

library(truh)
truh.1 = truh(V1,U,B=200)
truh.1$pval

## [1] 0.375

So, truh fails to reject the null hypothesis.

Setting 2: Here we know that G ∉ ℱ(F₀) and so H₀ is false.

library(truh)
truh.2 = truh(V2,U,B=200)
truh.2$pval

## [1] 0

We see that truh rejects the null hypothesis.

Setting 3: Here G ∈ ℱ(F₀) but G ≠ F₀. The null hypothesis H₀ is true in this setting.

library(truh)
truh.3 = truh(V3,U,B=200)
truh.3$pval

## [1] 0.205

In this case, truh makes the correct decision and fails to reject H₀.