Title: | Multiscale Graph Correlation |
---|---|
Description: | Multiscale Graph Correlation (MGC) is a framework developed by Vogelstein et al. (2019) <DOI:10.7554/eLife.41690> that extends global correlation procedures to be multiscale; consequently, MGC tests typically require far fewer samples than existing methods for a wide variety of dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC provides a simple and elegant multiscale characterization of the potentially complex latent geometry underlying the relationship. |
Authors: | Eric Bridgeford [aut, cre], Censheng Shen [aut], Shangsi Wang [aut], Joshua Vogelstein [ths] |
Maintainer: | Eric Bridgeford <[email protected]> |
License: | GPL-2 |
Version: | 2.0.2 |
Built: | 2025-02-19 04:30:41 UTC |
Source: | https://github.com/neurodata/r-mgc |
ConnCompLabel
is a 1 pass implementation of connected components
labelling. Here it is applied to identify disjunt patches within a
distribution.
The raster matrix can be a raster of class 'asc'
(adehabitat package), 'RasterLayer' (raster package) or
'SpatialGridDataFrame' (sp package).
ConnCompLabel(mat)
ConnCompLabel(mat)
mat |
is a binary matrix of data with 0 representing background and 1 representing environment of interest. NA values are acceptable. The matrix can be a raster of class 'asc' (this & adehabitat package), 'RasterLayer' (raster package) or 'SpatialGridDataFrame' (sp package) |
A matrix of the same dim and class of mat
in which unique
components (individual patches) are numbered 1:n with 0 remaining background
value.
Jeremy VanDerWal [email protected]
Chang, F., C.-J. Chen, and C.-J. Lu. 2004. A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93:206-220.
#define a simple binary matrix tmat = { matrix(c( 0,0,0,1,0,0,1,1,0,1, 0,0,1,0,1,0,0,0,0,0, 0,1,NA,1,0,1,0,0,0,1, 1,0,1,1,1,0,1,0,0,1, 0,1,0,1,0,1,0,0,0,1, 0,0,1,0,1,0,0,1,1,0, 1,0,0,1,0,0,1,0,0,1, 0,1,0,0,0,1,0,0,0,1, 0,0,1,1,1,0,0,0,0,1, 1,1,1,0,0,0,0,0,0,1),nr=10,byrow=TRUE) } #do the connected component labelling ccl.mat = ConnCompLabel(tmat) ccl.mat image(t(ccl.mat[10:1,]),col=c('grey',rainbow(length(unique(ccl.mat))-1)))
#define a simple binary matrix tmat = { matrix(c( 0,0,0,1,0,0,1,1,0,1, 0,0,1,0,1,0,0,0,0,0, 0,1,NA,1,0,1,0,0,0,1, 1,0,1,1,1,0,1,0,0,1, 0,1,0,1,0,1,0,0,0,1, 0,0,1,0,1,0,0,1,1,0, 1,0,0,1,0,0,1,0,0,1, 0,1,0,0,0,1,0,0,0,1, 0,0,1,1,1,0,0,0,0,1, 1,1,1,0,0,0,0,0,0,1),nr=10,byrow=TRUE) } #do the connected component labelling ccl.mat = ConnCompLabel(tmat) ccl.mat image(t(ccl.mat[10:1,]),col=c('grey',rainbow(length(unique(ccl.mat))-1)))
A function to simulate data with the same mean that spreads as class id increases.
discr.sims.cross( n, d, K, signal.scale = 10, non.scale = 1, mean.scale = 0, rotate = FALSE, class.equal = TRUE, ind = FALSE )
discr.sims.cross( n, d, K, signal.scale = 10, non.scale = 1, mean.scale = 0, rotate = FALSE, class.equal = TRUE, ind = FALSE )
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
mean.scale |
whether the magnitude of the difference in the means between the two classes.
If a mean scale is requested, |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Eric Bridgeford
library(mgc) sim <- discr.sims.cross(100, 3, 2)
library(mgc) sim <- discr.sims.cross(100, 3, 2)
A function to simulate multi-class data with an Exponential class-mean trend.
discr.sims.exp( n, d, K, signal.scale = 1, signal.lshift = 1, non.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
discr.sims.exp( n, d, K, signal.scale = 1, signal.lshift = 1, non.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
n |
the number of samples. |
d |
the number of dimensions. The first dimension will be the signal dimension; the remainders noise. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
signal.lshift |
the location shift for the signal dimension between the classes. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Eric Bridgeford
A function to simulate data with the same mean that spreads as class id increases.
discr.sims.fat_tails( n, d, K, signal.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
discr.sims.fat_tails( n, d, K, signal.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Eric Bridgeford
library(mgc) sim <- discr.sims.fat_tails(100, 3, 2)
library(mgc) sim <- discr.sims.fat_tails(100, 3, 2)
A function to simulate multi-class data with a linear class-mean trend. The signal dimension is the dimension carrying all of the between-class difference, and the non-signal dimensions are noise.
discr.sims.linear( n, d, K, signal.scale = 1, signal.lshift = 1, non.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
discr.sims.linear( n, d, K, signal.scale = 1, signal.lshift = 1, non.scale = 1, rotate = FALSE, class.equal = TRUE, ind = FALSE )
n |
the number of samples. |
d |
the number of dimensions. The first dimension will be the signal dimension; the remainders noise. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
signal.lshift |
the location shift for the signal dimension between the classes. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Eric Bridgeford
A function to simulate data with the same mean with radial symmetry as class id increases.
discr.sims.radial( n, d, K, er.scale = 0.1, r = 1, class.equal = TRUE, ind = FALSE )
discr.sims.radial( n, d, K, er.scale = 0.1, r = 1, class.equal = TRUE, ind = FALSE )
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
er.scale |
the scaling for the error of the samples. Defaults to |
r |
the radial spacing between each class. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Eric Bridgeford
library(mgc) sim <- discr.sims.radial(100, 3, 2)
library(mgc) sim <- discr.sims.radial(100, 3, 2)
A function for computing the discriminability from a distance matrix and a set of associated labels.
discr.stat( X, Y, is.dist = FALSE, dist.xfm = mgc.distance, dist.params = list(method = "euclidean"), dist.return = NULL, remove.isolates = TRUE )
discr.stat( X, Y, is.dist = FALSE, dist.xfm = mgc.distance, dist.params = list(method = "euclidean"), dist.return = NULL, remove.isolates = TRUE )
X |
is interpreted as:
|
Y |
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
A list containing the following:
discr |
the discriminability statistic. |
rdf |
the rdfs for each sample. |
For more details see the help vignette:
vignette("discriminability", package = "mgc")
Eric Bridgeford
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
sim <- discr.sims.linear(100, 10, K=2) X <- sim$X; Y <- sim$Y discr.stat(X, Y)$discr
sim <- discr.sims.linear(100, 10, K=2) X <- sim$X; Y <- sim$Y discr.stat(X, Y)$discr
A function that performs a one-sample test for whether the discriminability differs from random chance.
discr.test.one_sample( X, Y, is.dist = FALSE, dist.xfm = mgc.distance, dist.params = list(method = "euclidean"), dist.return = NULL, remove.isolates = TRUE, nperm = 500, no_cores = 1 )
discr.test.one_sample( X, Y, is.dist = FALSE, dist.xfm = mgc.distance, dist.params = list(method = "euclidean"), dist.return = NULL, remove.isolates = TRUE, nperm = 500, no_cores = 1 )
X |
is interpreted as:
|
Y |
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
nperm |
the number of permutations to perform. Defaults to |
no_cores |
the number of cores to use for permutation test. Defaults to |
A list containing the following:
stat |
the discriminability of the data. |
null |
the discriminability scores under the null, computed via permutation. |
p.value |
the pvalue associated with the permutation test. |
Performs a test of whether an observed discriminability is significantly different from chance, as described in Bridgeford et al. (2019).
With the sample discriminability of
:
and:
where
is the discriminability that would be observed by random chance.
Eric Bridgeford
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
## Not run: require(mgc) n = 100; d=5 # simulation with a large difference between the classes # meaning they are more discriminable sim <- discr.sims.linear(n=n, d=d, K=2, signal.lshift=10) X <- sim$X; Y <- sim$Y # p-value is small discr.test.one_sample(X, Y)$p.value ## End(Not run)
## Not run: require(mgc) n = 100; d=5 # simulation with a large difference between the classes # meaning they are more discriminable sim <- discr.sims.linear(n=n, d=d, K=2, signal.lshift=10) X <- sim$X; Y <- sim$Y # p-value is small discr.test.one_sample(X, Y)$p.value ## End(Not run)
A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.
discr.test.two_sample( X1, X2, Y, dist.xfm = mgc.distance, dist.params = list(method = "euclidian"), dist.return = NULL, remove.isolates = TRUE, nperm = 500, no_cores = 1, alt = "greater" )
discr.test.two_sample( X1, X2, Y, dist.xfm = mgc.distance, dist.params = list(method = "euclidian"), dist.return = NULL, remove.isolates = TRUE, nperm = 500, no_cores = 1, alt = "greater" )
X1 |
is interpreted as a |
X2 |
is interpreted as a |
Y |
|
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
nperm |
the number of permutations for permutation test. Defualts to |
no_cores |
the number of cores to use for the permutations. Defaults to |
alt |
the alternative hypothesis. Can be that first dataset is more discriminable ( |
A list containing the following:
stat |
the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2. |
discr |
the discriminabilities for each of the two data sets, as a list. |
null |
the null distribution of the test statistic, computed via permutation. |
p.value |
The p-value associated with the test. |
alt |
The alternative hypothesis for the test. |
A function that performs a two-sample test for whether the discriminability is different for that of
one dataset vs another, as described in Bridgeford et al. (2019). With the sample discriminability of one approach, and
the sample discriminability of another approach:
and:
.
Also implemented are tests of and
.
Eric Bridgeford
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
## Not run: require(mgc) require(MASS) n = 100; d=5 # generate two subjects truths; true difference btwn # subject 1 (column 1) and subject 2 (column 2) mus <- cbind(c(0, 0), c(1, 1)) Sigma <- diag(2) # dimensions are independent # first dataset X1 contains less noise than X2 X1 <- do.call(rbind, lapply(1:dim(mus)[2], function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)})) X2 <- do.call(rbind, lapply(1:dim(mus)[2], function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)})) Y <- do.call(c, lapply(1:2, function(i) rep(i, 50))) # X1 should be more discriminable, as less noise discr.test.two_sample(X1, X2, Y, alt="greater")$p.value # p-value is small ## End(Not run)
## Not run: require(mgc) require(MASS) n = 100; d=5 # generate two subjects truths; true difference btwn # subject 1 (column 1) and subject 2 (column 2) mus <- cbind(c(0, 0), c(1, 1)) Sigma <- diag(2) # dimensions are independent # first dataset X1 contains less noise than X2 X1 <- do.call(rbind, lapply(1:dim(mus)[2], function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)})) X2 <- do.call(rbind, lapply(1:dim(mus)[2], function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)})) Y <- do.call(c, lapply(1:2, function(i) rep(i, 50))) # X1 should be more discriminable, as less noise discr.test.two_sample(X1, X2, Y, alt="greater")$p.value # p-value is small ## End(Not run)
Transform the distance matrices, with column-wise ranking if needed.
mgc.dist.xfm(X, Y, option = "mgc", optionRk = TRUE)
mgc.dist.xfm(X, Y, option = "mgc", optionRk = TRUE)
X |
|
Y |
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
optionRk |
is a string that specifies whether ranking within column is computed or not. If |
A list containing the following:
A |
|
B |
|
RX |
|
RY |
|
C. Shen
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) Dx <- as.matrix(dist(data$X), nrow=n); Dy <- as.matrix(dist(data$Y), nrow=n) dt <- mgc.dist.xfm(Dx, Dy)
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) Dx <- as.matrix(dist(data$X), nrow=n); Dy <- as.matrix(dist(data$Y), nrow=n) dt <- mgc.dist.xfm(Dx, Dy)
A function that returns a distance matrix given a collection of observations.
mgc.distance(X, method = "euclidean")
mgc.distance(X, method = "euclidean")
X |
|
method |
the method for computing distances. Defaults to |
a [n x n]
distance matrix indicating the pairwise distances between all samples passed in.
Eric Bridgeford
MGC K Sample Testing provides a wrapper for MGC Sample testing under the constraint that the Ys here are categorical labels with K possible sample ids. This function uses a 0-1 loss for the Ys (one-hot-encoding)).
mgc.ksample(X, Y, mgc.opts = list(), ...)
mgc.ksample(X, Y, mgc.opts = list(), ...)
X |
is interpreted as:
|
Y |
|
mgc.opts |
Arguments to pass to MGC, as a named list. See |
... |
trailing args. |
A list containing the following:
p.value |
P-value of MGC |
stat |
is the sample MGC statistic within |
pLocalCorr |
P-value of the local correlations by double matrix index |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
Eric Bridgeford
Youjin Lee, et al. "Network Dependence Testing via Diffusion Maps and Distance-Based Correlations." ArXiv (2019).
## Not run: library(mgc) library(MASS) n = 100; d = 2 # simulate 100 samples, where first 50 have mean [0,0] and second 50 have mean [1,1] Y <- c(replicate(n/2, 0), replicate(n/2, 1)) X <- do.call(rbind, lapply(Y, function(y) { return(rnorm(d) + y) })) # p value is small mgc.ksample(X, Y, mgc.opts=list(nperm=100))$p.value ## End(Not run)
## Not run: library(mgc) library(MASS) n = 100; d = 2 # simulate 100 samples, where first 50 have mean [0,0] and second 50 have mean [1,1] Y <- c(replicate(n/2, 0), replicate(n/2, 1)) X <- do.call(rbind, lapply(Y, function(y) { return(rnorm(d) + y) })) # p value is small mgc.ksample(X, Y, mgc.opts=list(nperm=100))$p.value ## End(Not run)
Compute all local correlation coefficients in O(n^2 log n)
mgc.localcorr( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, option = "mgc" )
mgc.localcorr( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, option = "mgc" )
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
A list contains the following:
corr |
consists of all local correlations within [-1,1] by double matrix index |
varX |
contains all local variances for X. |
varY |
contains all local variances for X. |
C. Shen
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) lcor <- mgc.localcorr(data$X, data$Y)
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) lcor <- mgc.localcorr(data$X, data$Y)
Driver for MGC Local Correlations
mgc.localcorr.driver(DX, DY, option = "mgc")
mgc.localcorr.driver(DX, DY, option = "mgc")
DX |
the first distance matrix. |
DY |
the second distance matrix. |
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
A list contains the following:
corr |
consists of all local correlations within [-1,1] by double matrix index |
varX |
contains all local variances for X. |
varY |
contains all local variances for X. |
C. Shen
Sample from the 2-ball in d-dimensions.
mgc.sims.2ball(n, d, r = 1, cov.scale = 0)
mgc.sims.2ball(n, d, r = 1, cov.scale = 0)
n |
the number of samples. |
d |
the number of dimensions. |
r |
the radius of the 2-ball. Defaults to |
cov.scale |
if desired, sample from 2-ball with error sigma. Defaults to |
the points sampled from the ball, as a [n, d]
array.
Eric Bridgeford
library(mgc) # sample 100 points from 3-d 2-ball with radius 2 X <- mgc.sims.2ball(100, 3, 2)
library(mgc) # sample 100 points from 3-d 2-ball with radius 2 X <- mgc.sims.2ball(100, 3, 2)
Sample from the 2-sphere in d-dimensions.
mgc.sims.2sphere(n, d, r, cov.scale = 0)
mgc.sims.2sphere(n, d, r, cov.scale = 0)
n |
the number of samples. |
d |
the number of dimensions. |
r |
the radius of the 2-ball. Defaults to |
cov.scale |
if desired, sample from 2-ball with error sigma. Defaults to |
the points sampled from the sphere, as a [n, d]
array.
Eric Bridgeford
library(mgc) # sample 100 points from 3-d 2-sphere with radius 2 X <- mgc.sims.2sphere(100, 3, 2)
library(mgc) # sample 100 points from 3-d 2-sphere with radius 2 X <- mgc.sims.2sphere(100, 3, 2)
A function for Generating a cubic simulation.
mgc.sims.cubic( n, d, eps = 80, ind = FALSE, a = -1, b = 1, c.coef = c(-12, 48, 128), s = 1/3 )
mgc.sims.cubic( n, d, eps = 80, ind = FALSE, a = -1, b = 1, c.coef = c(-12, 48, 128), s = 1/3 )
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
c.coef |
the coefficients for the cubic function, where the first value is the first order coefficient, the second value the quadratic coefficient, and the third the cubic coefficient. Defaults to |
s |
the scaling for the center of the cubic. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simulates
points from
, where:
and controls the noise for higher dimensions.
Eric Bridgeford
library(mgc) result <- mgc.sims.cubic(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.cubic(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating an exponential simulation.
mgc.sims.exp(n, d, eps = 10, ind = FALSE, a = 0, b = 3)
mgc.sims.exp(n, d, eps = 10, ind = FALSE, a = 0, b = 3)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simulates
points from
, where:
and controls the noise for higher dimensions.
Eric Bridgeford
library(mgc) result <- mgc.sims.exp(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.exp(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a joint-normal simulation.
mgc.sims.joint(n, d, eps = 0.5)
mgc.sims.joint(n, d, eps = 0.5)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: ,
is the identity matrix of size
,
is the matrix of ones of size
.
Simulates
points from
, where:
,
and controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.joint(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.joint(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a linear simulation.
mgc.sims.linear(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
mgc.sims.linear(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simulates
points from
, where:
and controls the noise for higher dimensions.
Eric Bridgeford
library(mgc) result <- mgc.sims.linear(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.linear(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a quadratic simulation.
mgc.sims.quad(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
mgc.sims.quad(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults to |
b |
the upper limit for the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simulates
n
points from where:
,
and controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.quad(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.quad(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a spiral simulation.
mgc.sims.spiral(n, d, eps = 0.4, a = 0, b = 5)
mgc.sims.spiral(n, d, eps = 0.4, a = 0, b = 5)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
a |
the lower limit for the data matrix. Defaults |
b |
the upper limit for the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: a random variable.
Simumlates
points from
where:
if
i = d
, and otherwise
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.spiral(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.spiral(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a step function simulation.
mgc.sims.step(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
mgc.sims.step(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults to |
b |
the upper limit for the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simulates
points from
where:
,
and controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.step(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.step(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating an uncorrelated bernoulli simulation.
mgc.sims.ubern(n, d, eps = 0.5, p = 0.5)
mgc.sims.ubern(n, d, eps = 0.5, p = 0.5)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
p |
the bernoulli probability. |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simumlates
points from
where:
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.ubern(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.ubern(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
A function for Generating a W-shaped simulation.
mgc.sims.wshape(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
mgc.sims.wshape(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults |
b |
the upper limit for the data matrix. Defaults to |
a list containing the following:
X |
|
Y |
|
Given: is a weight-vector that scales with the dimensionality.
Simumlates
points from
where:
,
,
and controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Eric Bridgeford
library(mgc) result <- mgc.sims.wshape(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
library(mgc) result <- mgc.sims.wshape(n=100, d=10) # simulate 100 samples in 10 dimensions X <- result$X; Y <- result$Y
The main function that computes the MGC measure between two datasets: It first computes all local correlations, then use the maximal statistic among all local correlations based on thresholding.
mgc.stat( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, option = "mgc" )
mgc.stat( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, option = "mgc" )
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
A list containing the following:
stat |
is the sample MGC statistic within |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
option |
specifies which global correlation was used |
C. Shen and Eric Bridgeford
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) mgc.stat.res <- mgc.stat(data$X, data$Y)
library(mgc) n=200; d=2 data <- mgc.sims.linear(n, d) mgc.stat.res <- mgc.stat(data$X, data$Y)
Test of Dependence using MGC Approach.
mgc.test( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, nperm = 1000, option = "mgc", no_cores = 1 )
mgc.test( X, Y, is.dist.X = FALSE, dist.xfm.X = mgc.distance, dist.params.X = list(method = "euclidean"), dist.return.X = NULL, is.dist.Y = FALSE, dist.xfm.Y = mgc.distance, dist.params.Y = list(method = "euclidean"), dist.return.Y = NULL, nperm = 1000, option = "mgc", no_cores = 1 )
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
nperm |
specifies the number of replicates to use for the permutation test. Defaults to |
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
no_cores |
the number of cores to use for the permutations. Defaults to |
A list containing the following:
p.value |
P-value of MGC |
stat |
is the sample MGC statistic within |
p.localCorr |
P-value of the local correlations by double matrix index. |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
option |
specifies which global correlation was used |
A test of independence using the MGC approach, described in Vogelstein et al. (2019). For ,
:
and:
Note that one should avoid report positive discovery via minimizing individual p-values of local correlations, unless corrected for multiple hypotheses.
For details on usage see the help vignette:
vignette("mgc", package = "mgc")
Eric Bridgeford and C. Shen
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
## Not run: library(mgc) n = 100; d = 2 data <- mgc.sims.linear(n, d) # note: on real data, one would put nperm much higher (at least 100) # nperm is set to 10 merely for demonstration purposes result <- mgc.test(data$X, data$Y, nperm=10) ## End(Not run)
## Not run: library(mgc) n = 100; d = 2 data <- mgc.sims.linear(n, d) # note: on real data, one would put nperm much higher (at least 100) # nperm is set to 10 merely for demonstration purposes result <- mgc.test(data$X, data$Y, nperm=10) ## End(Not run)