Title: | Causal Batch Effects |
---|---|
Description: | Software which provides numerous functionalities for detecting and removing group-level effects from high-dimensional scientific data which, when combined with additional assumptions, allow for causal conclusions, as-described in our manuscripts Bridgeford et al. (2024) <doi:10.1101/2021.09.03.458920> and Bridgeford et al. (2023) <doi:10.48550/arXiv.2307.13868>. Also provides a number of useful utilities for generating simulations and balancing covariates across multiple groups/batches of data via matching and propensity trimming for more than two groups. |
Authors: | Eric W. Bridgeford [aut, cre], Michael Powell [ctb], Brian Caffo [ctb], Joshua T. Vogelstein [ctb] |
Maintainer: | Eric W. Bridgeford <[email protected]> |
License: | GPL-3 |
Version: | 1.4.0 |
Built: | 2025-03-20 20:42:41 UTC |
Source: | https://github.com/neurodata/causal_batch |
A function for performing k-way matching using the matchIt package. Looks for samples which have corresponding matches across all other treatment levels.
cb.align.kway_match( Ts, Xs, match.form, reference = NULL, match.args = list(method = "nearest", exact = NULL, replace = FALSE, caliper = 0.1), retain.ratio = 0.05 )
cb.align.kway_match( Ts, Xs, match.form, reference = NULL, match.args = list(method = "nearest", exact = NULL, replace = FALSE, caliper = 0.1), retain.ratio = 0.05 )
Ts |
|
Xs |
|
match.form |
A formula of columns from |
reference |
the name of the reference/control batch, against which to match. Defaults to |
match.args |
A named list arguments for the |
retain.ratio |
If the number of samples retained is less than |
a list, containing the following:
Retained.Ids
[m]
vector consisting of the sample ids of the n
original samples that were retained after matching.
Reference
the reference batch.
For more details see the help vignette:
vignette("causal_balancing", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Imaging Neuroscience (2025).
Daniel E. Ho, et al. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference" JSS (2011).
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=1.5) cb.align.kway_match(sim$Ts, data.frame(Covar=sim$Xs), "Covar")
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=1.5) cb.align.kway_match(sim$Ts, data.frame(Covar=sim$Xs), "Covar")
A function for implementing the vector matching procedure, a pre-processing step for causal conditional distance correlation. Uses propensity scores to strategically include/exclude samples from subsequent inference, based on whether (or not) there are samples with similar propensity scores across all treatment levels (conceptually, a k-way "propensity trimming"). It is imperative that this function is used in conjunction with domain expertise to ensure that the covariates are not colliders, and that the system satisfies the strong ignorability condiiton to derive causal conclusions.
cb.align.vm_trim( Ts, Xs, prop.form = NULL, retain.ratio = 0.05, ddx = FALSE, reference = NULL )
cb.align.vm_trim( Ts, Xs, prop.form = NULL, retain.ratio = 0.05, ddx = FALSE, reference = NULL )
Ts |
|
Xs |
|
prop.form |
a formula specifying a propensity scoring model. Defaults o |
retain.ratio |
If the number of samples retained is less than |
ddx |
whether to show additional diagnosis messages. Defaults to |
reference |
the name of a reference label, against which to align other labels. Defaults to |
a [m]
vector containing the indices of samples retained after vector matching.
For more details see the help vignette:
vignette("causal_balancing", package = "causalBatch")
Eric W. Bridgeford
Michael J. Lopez, et al. "Estimation of Causal Effects with Multiple Treatments" Statistical Science (2017). ran
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=3) cb.align.vm_trim(sim$Ts, sim$Xs)
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=3) cb.align.vm_trim(sim$Ts, sim$Xs)
A function for implementing the AIPW conditional ComBat (AIPW cComBat) algorithm. This algorithm allows users to remove batch effects (in each dimension), while adjusting for known confounding variables. It is imperative that this function is used in conjunction with domain expertise (e.g., to ensure that the covariates are not colliders, and that the system could be argued to satisfy the ignorability condition) to derive causal conclusions. See citation for more details as to the conditions under which conclusions derived are causal.
cb.correct.aipw_cComBat( Ys, Ts, Xs, aipw.form, covar.out.form = NULL, retain.ratio = 0.05 )
cb.correct.aipw_cComBat( Ys, Ts, Xs, aipw.form, covar.out.form = NULL, retain.ratio = 0.05 )
Ys |
an |
Ts |
|
Xs |
|
aipw.form |
A covariate model, given as a formula. Applies for the estimation of propensities for the AIPW step. |
covar.out.form |
A covariate model, given as a formula. Applies for the outcome regression step of the |
retain.ratio |
If the number of samples retained is less than |
Note: This function is experimental, and has not been tested on real data. It has only been tested with simulated data with binary (0 or 1) exposures.
a list, containing the following:
Ys.corrected
an [m, d]
matrix, for the m
retained samples in d
dimensions, after correction.
Ts
[m]
the labels of the m
retained samples, with K < n
levels.
Xs
the r
covariates/confounding variables for each of the m
retained samples.
Model
the fit batch effect correction model.
Corrected.Ids
the ids to which batch effect correction was applied.
For more details see the help vignette:
vignette("causal_ccombat", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Imaging Neuroscience (2025).
W Evan Johnson, et al. "Adjusting batch effects in microarray expression data using empirical Bayes methods" Biostatistics (2007).
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=2) cb.correct.aipw_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), "Covar")
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=2) cb.correct.aipw_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), "Covar")
This function applies an Augmented Inverse Probability Weighting (AIPW) ComBat model for batch effect correction to new data.
cb.correct.apply_aipw_cComBat(Ys, Ts, Xs, Model)
cb.correct.apply_aipw_cComBat(Ys, Ts, Xs, Model)
Ys |
an |
Ts |
|
Xs |
|
Model |
a list containing the following parameters:
This model is output after fitting with |
Note: This function is experimental, and has not been tested on real data. It has only been tested with simulated data with binary (0 or 1) exposures.
an [n, d]
matrix, the batch-effect corrected data.
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=200, err=1/8, unbalancedness=3) # fit batch effect correction for first 100 samples cb.fit <- cb.correct.matching_cComBat(sim$Ys[1:100,,drop=FALSE], sim$Ts[1:100], data.frame(Covar=sim$Xs[1:100,,drop=FALSE]), "Covar") # apply to all samples cor.dat <- cb.correct.apply_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), cb.fit$Model)
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=200, err=1/8, unbalancedness=3) # fit batch effect correction for first 100 samples cb.fit <- cb.correct.matching_cComBat(sim$Ys[1:100,,drop=FALSE], sim$Ts[1:100], data.frame(Covar=sim$Xs[1:100,,drop=FALSE]), "Covar") # apply to all samples cor.dat <- cb.correct.apply_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), cb.fit$Model)
ComBat allows users to adjust for batch effects in datasets where the batch covariate is known, using methodology described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be cleaned and normalized before batch effect removal.
cb.correct.apply_cComBat(Ys, Ts, Xs, Model)
cb.correct.apply_cComBat(Ys, Ts, Xs, Model)
Ys |
an |
Ts |
|
Xs |
|
Model |
a list containing the following parameters:
This model is output after fitting with |
Note: this code is adapted directly from the ComBat
algorithm featured in the 'sva' package.
an [n, d]
matrix, the batch-effect corrected data.
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=200, err=1/8, unbalancedness=3) # fit batch effect correction for first 100 samples cb.fit <- cb.correct.matching_cComBat(sim$Ys[1:100,,drop=FALSE], sim$Ts[1:100], data.frame(Covar=sim$Xs[1:100,,drop=FALSE]), "Covar") # apply to all samples cor.dat <- cb.correct.apply_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), cb.fit$Model)
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=200, err=1/8, unbalancedness=3) # fit batch effect correction for first 100 samples cb.fit <- cb.correct.matching_cComBat(sim$Ys[1:100,,drop=FALSE], sim$Ts[1:100], data.frame(Covar=sim$Xs[1:100,,drop=FALSE]), "Covar") # apply to all samples cor.dat <- cb.correct.apply_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), cb.fit$Model)
A function for implementing the matching conditional ComBat (matching cComBat) algorithm. This algorithm allows users to remove batch effects (in each dimension), while adjusting for known confounding variables. It is imperative that this function is used in conjunction with domain expertise (e.g., to ensure that the covariates are not colliders, and that the system could be argued to satisfy the ignorability condition) to derive causal conclusions. See citation for more details as to the conditions under which conclusions derived are causal.
cb.correct.matching_cComBat( Ys, Ts, Xs, match.form, covar.out.form = NULL, prop.form = NULL, reference = NULL, match.args = list(method = "nearest", exact = NULL, replace = FALSE, caliper = 0.1), retain.ratio = 0.05, apply.oos = FALSE )
cb.correct.matching_cComBat( Ys, Ts, Xs, match.form, covar.out.form = NULL, prop.form = NULL, reference = NULL, match.args = list(method = "nearest", exact = NULL, replace = FALSE, caliper = 0.1), retain.ratio = 0.05, apply.oos = FALSE )
Ys |
an |
Ts |
|
Xs |
|
match.form |
A formula of columns from |
covar.out.form |
A covariate model, given as a formula. Applies for the outcome regression step of the |
prop.form |
A propensity model, given as a formula. Applies for the estimation of propensities for the propensity trimming step. Defaults to |
reference |
the name of the reference/control batch, against which to match. Defaults to |
match.args |
A named list arguments for the |
retain.ratio |
If the number of samples retained is less than |
apply.oos |
A boolean that indicates whether or not to apply the learned batch effect correction to non-matched samples that are still within a region of covariate support. Defaults to |
a list, containing the following:
Ys.corrected
an [m, d]
matrix, for the m
retained samples in d
dimensions, after correction.
Ts
[m]
the labels of the m
retained samples, with K < n
levels.
Xs
the r
covariates/confounding variables for each of the m
retained samples.
Model
the fit batch effect correction model. See ComBat
for details.
InSample.Ids
the ids which were used to fit the batch effect correction model.
Corrected.Ids
the ids to which batch effect correction was applied. Differs from InSample.Ids
if apply.oos
is TRUE
.
For more details see the help vignette:
vignette("causal_ccombat", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Imaging Neuroscience (2025).
Daniel E. Ho, et al. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference" JSS (2011).
W Evan Johnson, et al. "Adjusting batch effects in microarray expression data using empirical Bayes methods" Biostatistics (2007).
Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Zhang Y, Storey JD, Torres LC (2024). sva: Surrogate Variable Analysis. R package version 3.52.0.
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=2) cb.correct.matching_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), "Covar")
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=2) cb.correct.matching_cComBat(sim$Ys, sim$Ts, data.frame(Covar=sim$Xs), "Covar")
A function for implementing the causal conditional distance correlation (causal cDCorr) algorithm. This algorithm allows users to identify whether a treatment causes changes in an outcome, given assorted covariates/confounding variables. It is imperative that this function is used in conjunction with domain expertise (e.g., to ensure that the covariates are not colliders, and that the system satisfies the strong ignorability condiiton) to derive causal conclusions. See citation for more details as to the conditions under which conclusions derived are causal.
cb.detect.caus_cdcorr( Ys, Ts, Xs, prop.form = NULL, R = 1000, dist.method = "euclidean", distance = FALSE, width = "scott", seed = 1, num.threads = 1, retain.ratio = 0.05, ddx = FALSE, normalize = TRUE )
cb.detect.caus_cdcorr( Ys, Ts, Xs, prop.form = NULL, R = 1000, dist.method = "euclidean", distance = FALSE, width = "scott", seed = 1, num.threads = 1, retain.ratio = 0.05, ddx = FALSE, normalize = TRUE )
Ys |
Either:
|
Ts |
|
Xs |
|
prop.form |
a formula specifying a propensity scoring model. Defaults o |
R |
the number of repetitions for permutation testing. Defaults to |
dist.method |
the method used for computing distance matrices. Defaults to |
distance |
a boolean for whether (or not) |
width |
Either:
Defaults to |
seed |
a random seed to set. Defaults to |
num.threads |
The number of threads for parallel processing (if desired). Defaults to |
retain.ratio |
If the number of samples retained is less than |
ddx |
whether to show additional diagnosis messages. Defaults to |
normalize |
whether or not to compute the distance correlation ( |
a list, containing the following:
Test
The outcome of the statistical test, from cdcov.test
.
Retained.Ids
The sample indices retained after vertex matching, which correspond to the samples for which statistical inference is performed.
For more details see the help vignette:
vignette("causal_cdcorr", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Imaging Neuroscience (2025).
Eric W. Bridgeford, et al. "Learning sources of variability from high-dimensional observational studies" arXiv (2023).
Xueqin Wang, et al. "Conditional Distance Correlation" American Statistical Association (2015).
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=3) cb.detect.caus_cdcorr(sim$Ys, sim$Ts, sim$Xs)
library(causalBatch) sim <- cb.sims.sim_linear(a=-1, n=100, err=1/8, unbalancedness=3) cb.detect.caus_cdcorr(sim$Ys, sim$Ts, sim$Xs)
Simulate Data with Heteroskedastic Conditional Average Treatment Effects
cb.sims.cate.heteroskedastic_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 3, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 0.5, nbreaks = 200, rotate = TRUE )
cb.sims.cate.heteroskedastic_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 3, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 0.5, nbreaks = 200, rotate = TRUE )
n |
number of samples to generate |
p |
number of dimensions for the response variable |
pi |
probability of assignment to treatment group 1 |
balance |
a parameter that governs the similarity between the covariate distributions |
eff_sz |
effect size parameter controlling the heteroskedasticity between groups |
covar_eff_sz |
effect size parameter for the covariate influence |
alpha |
the alpha parameter for beta distribution of control group |
beta |
the beta parameter for beta distribution of control group |
common |
parameter governing the shape of the common sampling distribution |
a |
scaling factor for response variable generation |
b |
scaling factor for sigmoid transformation |
err |
standard deviation for the error term |
nbreaks |
number of points to use for generating the true signal |
rotate |
whether to apply a random rotation to the outcomes. Defaults to |
A list containing:
Ys |
response matrix of size [n, p] |
Ts |
treatment assignment vector of size [n] |
Xs |
covariate vector of size [n] |
Eps |
error matrix of size [n, p] |
Ytrue |
true response matrix for evaluation |
Ttrue |
true treatment vector for evaluation |
Xtrue |
true covariate vector for evaluation |
Group.Effect |
the effect size parameter |
Covar.Effect |
the covariate effect size parameter |
R |
Optional argument returned if a rotation is requested for the rotation matrix applied. |
Eric W. Bridgeford
Eric W. Bridgeford, et al. "Learning Sources of Variability from High-Dimensional Observational Studies" arXiv (2025).
library(causalBatch) sim = cb.sims.cate.heteroskedastic_sim()
library(causalBatch) sim = cb.sims.cate.heteroskedastic_sim()
K-class Sigmoidal CATE Simulation
cb.sims.cate.kclass_sigmoidal_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 5, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 1, nbreaks = 200, K = 3, rotate = TRUE )
cb.sims.cate.kclass_sigmoidal_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 5, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 1, nbreaks = 200, K = 3, rotate = TRUE )
n |
the number of samples. Defaults to |
p |
the number of dimensions. Defaults to |
pi |
the fraction of points which are sampled from a common distribution. Should be a number between |
balance |
a parameter governing the covariate similarity between the two groups. Defaults to |
eff_sz |
the conditional treatment effect size between the different groups, which governs the rotation in radians between the first and second group. Defaults to |
covar_eff_sz |
A parameter which governs the covariate effect size with respect to the outcome. Defaults to |
alpha |
the alpha for sampling the |
beta |
the beta for sampling the |
common |
a parameter which governs the shape of the common sampling distribution. |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
err |
the level of noise for the simulation. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
K |
the number of classes. Defaults to |
rotate |
whether to apply a random rotation to the outcomes. Defaults to |
a list, containing the following:
Y |
an |
Ts |
an |
Xs |
an |
Eps |
an |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Group.Effect |
The group effect magnitude. |
Covar.Effect |
The covariate effect magnitude. |
K |
The total number of classes. |
R |
Optional argument returned if a rotation is requested for the rotation matrix applied. |
Eric W. Bridgeford
Eric W. Bridgeford, et al. "Learning Sources of Variability from High-Dimensional Observational Studies" arXiv (2025).
library(causalBatch) sim = cb.sims.cate.kclass_rotation_sim()
library(causalBatch) sim = cb.sims.cate.kclass_rotation_sim()
Non-monotone CATE Simulation
cb.sims.cate.nonmonotone_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, alpha = 2, beta = 8, common = 10, err = 1, nbreaks = 200, rotate = TRUE )
cb.sims.cate.nonmonotone_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, alpha = 2, beta = 8, common = 10, err = 1, nbreaks = 200, rotate = TRUE )
n |
the number of samples. Defaults to |
p |
the number of dimensions. Defaults to |
pi |
the fraction of points which are sampled from a common distribution. Should be a number between |
balance |
a parameter governing the covariate similarity between the two groups. Defaults to |
eff_sz |
the conditional treatment effect size between the different groups, which governs the rotation in radians between the first and second group. Defaults to |
alpha |
the alpha for sampling the |
beta |
the beta for sampling the |
common |
a parameter which governs the shape of the common sampling distribution. |
err |
the level of noise for the simulation. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
rotate |
whether to apply a random rotation to the outcomes. Defaults to |
covar_eff_sz |
A parameter which governs the covariate effect size with respect to the outcome. Defaults to |
a list, containing the following:
Y |
an |
Ts |
an |
Xs |
an |
Eps |
an |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Group.Effect |
The group effect magnitude. |
R |
Optional argument returned if a rotation is requested for the rotation matrix applied. |
Eric W. Bridgeford
Eric W. Bridgeford, et al. "Learning Sources of Variability from High-Dimensional Observational Studies" arXiv (2025).
library(causalBatch) sim = cb.sims.cate.nonmonotone_sim()
library(causalBatch) sim = cb.sims.cate.nonmonotone_sim()
Sigmoidal CATE Simulation
cb.sims.cate.sigmoidal_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 5, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 1, nbreaks = 200, rotate = TRUE )
cb.sims.cate.sigmoidal_sim( n = 100, p = 10, pi = 0.5, balance = 1, eff_sz = 1, covar_eff_sz = 5, alpha = 2, beta = 8, common = 10, a = 2, b = 8, err = 1, nbreaks = 200, rotate = TRUE )
n |
the number of samples. Defaults to |
p |
the number of dimensions. Defaults to |
pi |
the fraction of points which are sampled from a common distribution. Should be a number between |
balance |
a parameter governing the covariate similarity between the two groups. Defaults to |
eff_sz |
the conditional treatment effect size between the different groups, which governs the rotation in radians between the first and second group. Defaults to |
covar_eff_sz |
A parameter which governs the covariate effect size with respect to the outcome. Defaults to |
alpha |
the alpha for sampling the |
beta |
the beta for sampling the |
common |
a parameter which governs the shape of the common sampling distribution. |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
err |
the level of noise for the simulation. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
a list, containing the following:
Y |
an |
Ts |
an |
Xs |
an |
Eps |
an |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Group.Effect |
The group effect magnitude. |
Covar.Effect |
The covariate effect magnitude. |
Eric W. Bridgeford
Eric W. Bridgeford, et al. "Learning Sources of Variability from High-Dimensional Observational Studies" arXiv (2025).
library(causalBatch) sim = cb.sims.cate.sigmoidal_sim()
library(causalBatch) sim = cb.sims.cate.sigmoidal_sim()
Impulse Simulation
cb.sims.sim_impulse( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, err = 1/2, null = FALSE, a = -0.5, b = 1/2, c = 4, nbreaks = 200 )
cb.sims.sim_impulse( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, err = 1/2, null = FALSE, a = -0.5, b = 1/2, c = 4, nbreaks = 200 )
n |
the number of samples. Defaults to |
pi |
the balance between the classes, where samples will be from group 1
with probability |
eff_sz |
the treatment effect between the different groups. Defaults to |
alpha |
the alpha for the covariate sampling procedure. Defaults to |
unbalancedness |
the level of covariate dissimilarity between the covariates
for each of the groups. Defaults to |
err |
the level of noise for the simulation. Defaults to |
null |
whether to generate a null simulation. Defaults to |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
c |
the third parameter for the covariate/outcome relationship. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
a list, containing the following:
Ys |
an |
Ts |
an |
Xs |
an |
Eps |
an |
x.bounds |
the theoretical bounds for the covariate values. |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Effect |
The batch effect magnitude. |
Overlap |
the theoretical degree of overlap between the covariate distributions for each of the two groups/batches. |
oracle_fn |
A function for fitting outcomes given covariates. |
A sigmoidal relationship between the covariate and the outcome. The first dimension of the outcome is:
where is the probability density function for the normal distribution with
mean
and standard deviation
.
where the batch/group labels are:
The beta coefficient for the covariate sampling is:
The covariate values for the first batch are:
and the covariate values for the second batch are:
Note that , or that the covariates are symmetric
about the origin in distribution.
Finally, the error terms are:
For more details see the help vignette:
vignette("causal_simulations", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Biorxiv (2024).
library(causalBatch) sim = cb.sims.sim_impulse()
library(causalBatch) sim = cb.sims.sim_impulse()
Impulse Simulation with Asymmetric Covariates
cb.sims.sim_impulse_asycov( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, null = FALSE, a = -0.5, b = 1/2, c = 4, err = 1/2, nbreaks = 200 )
cb.sims.sim_impulse_asycov( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, null = FALSE, a = -0.5, b = 1/2, c = 4, err = 1/2, nbreaks = 200 )
n |
the number of samples. Defaults to |
pi |
the balance between the classes, where samples will be from group 1
with probability |
eff_sz |
the treatment effect between the different groups. Defaults to |
alpha |
the alpha for the covariate sampling procedure. Defaults to |
unbalancedness |
the level of covariate dissimilarity between the covariates
for each of the groups. Defaults to |
null |
whether to generate a null simulation. Defaults to |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
c |
the third parameter for the covariate/outcome relationship. Defaults to |
err |
the level of noise for the simulation. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
a list, containing the following:
Ys |
an |
Ts |
an |
Xs |
an |
Eps |
an |
x.bounds |
the theoretical bounds for the covariate values. |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Effect |
The batch effect magnitude. |
Overlap |
the theoretical degree of overlap between the covariate distributions for each of the two groups/batches. |
oracle_fn |
A function for fitting outcomes given covariates. |
A sigmoidal relationship between the covariate and the outcome. The first dimension of the outcome is:
where is the probability density function for the normal distribution with
mean
and standard deviation
.
where the batch/group labels are:
The beta coefficient for the covariate sampling is:
The covariate values for the first batch are asymmetric, in that for the first batch:
and the covariate values for the second batch are:
Finally, the error terms are:
For more details see the help vignette:
vignette("causal_simulations", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Biorxiv (2024).
library(causalBatch) sim = cb.sims.sim_impulse_asycov()
library(causalBatch) sim = cb.sims.sim_impulse_asycov()
Linear Simulation
cb.sims.sim_linear( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, err = 1/2, null = FALSE, a = -2, b = -1, nbreaks = 200 )
cb.sims.sim_linear( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, err = 1/2, null = FALSE, a = -2, b = -1, nbreaks = 200 )
n |
the number of samples. Defaults to |
pi |
the balance between the classes, where samples will be from group 1
with probability |
eff_sz |
the treatment effect between the different groups. Defaults to |
alpha |
the alpha for the covariate sampling procedure. Defaults to |
unbalancedness |
the level of covariate dissimilarity between the covariates
for each of the groups. Defaults to |
err |
the level of noise for the simulation. Defaults to |
null |
whether to generate a null simulation. Defaults to |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
a list, containing the following:
Ys |
an |
Ts |
an |
Xs |
an |
Eps |
an |
x.bounds |
the theoretical bounds for the covariate values. |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Effect |
The batch effect magnitude. |
Overlap |
the theoretical degree of overlap between the covariate distributions for each of the two groups/batches. |
oracle_fn |
A function for fitting outcomes given covariates. |
A linear relationship between the covariate and the outcome. The first dimension of the outcome is:
where the batch/group labels are:
The beta coefficient for the covariate sampling is:
The covariate values for the first batch are:
and the covariate values for the second batch are:
Finally, the error terms are:
For more details see the help vignette:
vignette("causal_simulations", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Biorxiv (2024).
library(causalBatch) sim = cb.sims.sim_linear()
library(causalBatch) sim = cb.sims.sim_linear()
Sigmoidal Simulation
cb.sims.sim_sigmoid( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, null = FALSE, a = -4, b = 8, err = 1/2, nbreaks = 200 )
cb.sims.sim_sigmoid( n = 100, pi = 0.5, eff_sz = 1, alpha = 2, unbalancedness = 1, null = FALSE, a = -4, b = 8, err = 1/2, nbreaks = 200 )
n |
the number of samples. Defaults to |
pi |
the balance between the classes, where samples will be from group 1
with probability |
eff_sz |
the treatment effect between the different groups. Defaults to |
alpha |
the alpha for the covariate sampling procedure. Defaults to |
unbalancedness |
the level of covariate dissimilarity between the covariates
for each of the groups. Defaults to |
null |
whether to generate a null simulation. Defaults to |
a |
the first parameter for the covariate/outcome relationship. Defaults to |
b |
the second parameter for the covariate/outcome relationship. Defaults to |
err |
the level of noise for the simulation. Defaults to |
nbreaks |
the number of breakpoints for computing the expected outcome at a given covariate level
for each batch. Defaults to |
a list, containing the following:
Y |
an |
Ts |
an |
Xs |
an |
Eps |
an |
x.bounds |
the theoretical bounds for the covariate values. |
Ytrue |
an |
Ttrue |
an |
Xtrue |
an |
Effect |
The batch effect magnitude. |
Overlap |
the theoretical degree of overlap between the covariate distributions for each of the two groups/batches. |
oracle_fn |
A function for fitting outcomes given covariates. |
A sigmoidal relationship between the covariate and the outcome. The first dimension of the outcome is:
where the batch/group labels are:
The beta coefficient for the covariate sampling is:
The covariate values for the first batch are:
and the covariate values for the second batch are:
Finally, the error terms are:
For more details see the help vignette:
vignette("causal_simulations", package = "causalBatch")
Eric W. Bridgeford
Eric W. Bridgeford, et al. "A Causal Perspective for Batch Effects: When is no answer better than a wrong answer?" Biorxiv (2024).
library(causalBatch) sim = cb.sims.sim_sigmoid()
library(causalBatch) sim = cb.sims.sim_sigmoid()