Package 'truh'

Title: An R package for Two-Sample Nonparametric Testing Under Heterogeneity
Description: This R package implements the TRUH test statistic for two sample testing under heterogeneity. TRUH incorporates the underlying heterogeneity and imbalance in the samples, and provides a conservative test for the composite null hypothesis that the two samples arise from the same mixture distribution but may differ with respect to the mixing weights. See Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee Ann. Appl. Stat. 14(4): 1777-1805 (December 2020). <DOI: 10.1214/20-AOAS1362> for more details.
Authors: Nathan Smith [aut, cre], Trambak Banerjee [aut], Bhaswar Bhattacharya [aut], Gourab Mukherjee [aut]
Maintainer: Nathan Smith <[email protected]>
License: GPL (>=3)
Version: 1.0.0
Built: 2024-11-09 03:53:17 UTC
Source: https://github.com/natesmith07/truh

Help Index


Nearest neighbor computation for the TRUH statistic

Description

For a given dd dimensional vector y\mathbf{y}, this function finds the nearest neighbor of y\mathbf{y} in a n×dn\times d matrix U\mathbf{U}.

Usage

nearest(y, U, n, d)

Arguments

y

a dd dimensional vector.

U

a n×dn\times d matrix where nn represents the sample size and dd is the dimension of each sample.

n

the sample size.

d

dimension of each sample.

Value

  1. d1 - nearest neighbor of y\mathbf{y} in U\mathbf{U}

  2. d2 - nearest neighbor of d1 in U\mathbf{U}

See Also

truh

Examples

library(truh)
n = 100
d = 3
set.seed(1)
y = rnorm(3)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = nearest(y,U,n,d)

TRUH test statistic

Description

TRUH test statistic for nonparametric two sample testing under heterogeneity.

Usage

truh(V, U, B, fc = 1, ncores = 2)

Arguments

V

m×dm\times d matrix where mm represents the sample size and dd is the dimension of each sample.

U

a n×dn\times d matrix where nn represents the sample size and dd is the dimension of each sample with mnm\ll n.

B

number of bootstrap samples.

fc

fold change constant. The default value is 1. See equation (2.8) of the referenced paper for more details.

ncores

the number of computing cores available. The default value is 2.

Value

  1. teststat - TRUH test statistic.

  2. k.hat - number of clusters detected in the uninfected sample.

  3. pval - The maximum p-value across the detected clusters.

  4. pval_all - p-value for each cluster.

  5. dist.null_all - the approximate bootstrapped based null distribution.

References

Banerjee, Trambak, Bhaswar B. Bhattacharya, and Gourab Mukherjee. "A nearest-neighbor based nonparametric test for viral remodeling in heterogeneous single-cell proteomic data." The Annals of Applied Statistics 14, no. 4 (2020): 1777-1805.

See Also

nearest

Examples

library(truh)
n = 500
m = 10
d = 3
set.seed(1)
V = matrix(rnorm(m*d),nrow=m,ncol=d)
set.seed(2)
U = matrix(rnorm(n*d),nrow=n,ncol=d)
out = truh(V,U,100)