Title: | Multivariate Distance Matrix Regression |
---|---|
Description: | This package allows users to conduct multivariate distance matrix regression using analytic p-values and compute measures of effect size. For details on the method, see McArtor, Lubke, & Bergeman (2017) <https://doi.org/10.1007/s11336-016-9527-8>. |
Authors: | Daniel B. McArtor ([email protected]) [aut, cre] |
Maintainer: | Daniel B. McArtor <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5.1 |
Built: | 2025-02-19 03:28:53 UTC |
Source: | https://github.com/dmcartor/mdmr |
MDMR
allows a user to conduct multivariate distance matrix regression
using analytic p-values and measures of effect size described by McArtor et
al. (2017). Analytic p-values are computed using the R package CompQuadForm
(Duchesne & De Micheaux, 2010). It also facilitates the use of MDMR on
samples consisting of (hierarchically) clustered observations.
To access this package's tutorial, type the following line into the console:
vignette('mdmr-vignette')
There are three primary functions that comprise this package:
mdmr
, which regresses a distance matrix onto a set of
predictors, and delta
, which computes measures of univariate
effect size in the context of multivariate distance matrix regression. The
third function mixed.mdmr
facilitates the use of MDMR on
(hierarchically) clustered samples using an approach analogous to the
linearar mixed-effects model for univariate outcomes. The help files of all
all three functions provide more general information than the package
vignette.
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P.L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
McArtor, D. B. (2017). Extending a distance-based approach to multivariate multiple regression (Doctoral Dissertation).
################################################################ ## Conducting MDMR on data comprised of independent observations ################################################################ # Source data data(mdmrdata) # Get distance matrix D <- dist(Y.mdmr, method = 'euclidean') # Conduct MDMR mdmr.res <- mdmr(X = X.mdmr, D = D) summary(mdmr.res) ################################################################ ## Conducting MDMR on data comprised of dependent observations ################################################################ # Source data data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Conduct mixed-MDMR mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) summary(mixed.res)
################################################################ ## Conducting MDMR on data comprised of independent observations ################################################################ # Source data data(mdmrdata) # Get distance matrix D <- dist(Y.mdmr, method = 'euclidean') # Conduct MDMR mdmr.res <- mdmr(X = X.mdmr, D = D) summary(mdmr.res) ################################################################ ## Conducting MDMR on data comprised of dependent observations ################################################################ # Source data data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Conduct mixed-MDMR mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) summary(mixed.res)
delta
computes permutation-based effect sizes on individual items
comprising the distance matrix outcome used in multivariate distance matrix
regression. It returns the omnibus estimates of delta (i.e. effect size of
the entire design matrix on each outcome) as well as estimates of each
pair-wise effect size (i.e. the effect of each predictor on each outcome
variable, conditional on the rest of the predictors).
delta(X, Y = NULL, dtype = NULL, niter = 10, x.inds = NULL, y.inds = NULL, G = NULL, G.list = NULL, ncores = 1, seed = NULL, plot.res = F, grayscale = F, cex = 1, y.las = 2)
delta(X, Y = NULL, dtype = NULL, niter = 10, x.inds = NULL, y.inds = NULL, G = NULL, G.list = NULL, ncores = 1, seed = NULL, plot.res = F, grayscale = F, cex = 1, y.las = 2)
X |
A |
Y |
Outcome data: |
dtype |
Measure of dissimilarity that will be used by |
niter |
Number of times to permute each outcome item in the procedure
to compute delta. The final result is the average of all |
x.inds |
Vector indicating which columns of X should have their
conditional effect sizes computed. Default value of |
y.inds |
Vector indicating which columns of Y effect sizes should be
computed on. Default value of |
G |
Gower's centered similarity matrix computed from |
G.list |
List of length |
ncores |
Integer; if |
seed |
Integer; sets seed for the permutations of each variable comprising Y so that results can be replicated. |
plot.res |
Logical; Indicates whether or not a heat-map of the results should be plotted. |
grayscale |
Logical; Indicates whether or not the heat-map should be plotted in grayscale. |
cex |
Multiplier for cex.axis, cex.lab, cex.main, and cex that are passed to the plotted result. |
y.las |
Orientation of labels for the outcome items. Defaults to vertical (2). Value of 1 prints horizontal labels, and is only recommended if the multivariate outcome is comprised of few variables. |
See McArtor et al. (2017) for a detailed description of how delta is computed. Note that it is a relative measure of effect, quantifying which effects are strong (high values of delta) and weak (low values of delta) within a single analysis, but estimates of delta cannot be directly compared across different datasets.
There are two options for using this function. The first option is to
specify the predictor matrix X
, the outcome matrix Y
, the
distance type dtype
(supported by "dist" in R), and number of
iterations niter
. This option conducts the permutation of each Y-item
niter
times (to average out random association in each permutation)
and reports the median estimates of delta over the niter
reps.
The second option is to specify X
, G
, and G.list
, a
list of G matrices where the permutation has already been done for each item
comprising Y. The names of the elements in G.list
should correspond
to the names of the variables that were permuted. This option is implemented
so that delta can be computed when MDMR is being used in conjunction with
distance metrics not supported by dist
.
A data frame whose rows correspond to the omnibus effects and the
effect of each individual predictor (conditional on the rest), and whose
columns correspond to each outcome variable whose effect sizes are being
quantified. If plot.res = TRUE
, a heat-map is plotted of this data
frame to easily identify the strongest effects. Note that the heatmap is
partitioned into the omnibus effect (first row) and pair-wise effects
(remaining rows), because otherwise the omnibus effect would dominate the
heatmap.
Daniel B. McArtor ([email protected]) [aut, cre]
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
data(mdmrdata) # --- Method 1 --- # delta(X.mdmr, Y = Y.mdmr, dtype = "euclidean", niter = 1, seed = 12345) # --- Method 2 --- # D <- dist(Y.mdmr, method = "euclidean") G <- gower(D) q <- ncol(Y.mdmr) G.list <- vector(mode = "list", length = q) names(G.list) <- names(Y.mdmr) for(i in 1:q) { Y.shuf <- Y.mdmr Y.shuf[,i] <- sample(Y.shuf[,i]) G.list[[i]] <- gower(dist(Y.shuf, method = "euclidean")) } delta(X.mdmr, G = G, G.list = G.list)
data(mdmrdata) # --- Method 1 --- # delta(X.mdmr, Y = Y.mdmr, dtype = "euclidean", niter = 1, seed = 12345) # --- Method 2 --- # D <- dist(Y.mdmr, method = "euclidean") G <- gower(D) q <- ncol(Y.mdmr) G.list <- vector(mode = "list", length = q) names(G.list) <- names(Y.mdmr) for(i in 1:q) { Y.shuf <- Y.mdmr Y.shuf[,i] <- sample(Y.shuf[,i]) G.list[[i]] <- gower(dist(Y.shuf, method = "euclidean")) } delta(X.mdmr, G = G, G.list = G.list)
Compute Gower's centered similarity matrix , which is the matrix
decomposed by the MDMR test statistic.
gower(d.mat)
gower(d.mat)
d.mat |
Symmetric distance matrix (or R distance object) computed from the outcome data to be used in MDMR. |
G Gower's centered dissimilarity matrix computed from D.
Daniel B. McArtor ([email protected]) [aut, cre]
Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53(3-4), 325-338.
mdmr
(multivariate distance matrix regression) is used to regress a
distance matrix onto a set of predictors. It returns the test statistic,
pseudo R-square statistic, and analytic p-values for all predictors
jointly and for each predictor individually, conditioned on the rest.
mdmr(X, D = NULL, G = NULL, lambda = NULL, return.lambda = F, start.acc = 1e-20, ncores = 1, perm.p = (nrow(as.matrix(X)) < 200), nperm = 500, seed = NULL)
mdmr(X, D = NULL, G = NULL, lambda = NULL, return.lambda = F, start.acc = 1e-20, ncores = 1, perm.p = (nrow(as.matrix(X)) < 200), nperm = 500, seed = NULL)
X |
A |
D |
Distance matrix computed on the outcome data. Can be either a
matrix or an R |
G |
Gower's centered similarity matrix computed from |
lambda |
Optional argument: Eigenvalues of |
return.lambda |
Logical; indicates whether or not the eigenvalues of
|
start.acc |
Starting accuracy of the Davies (1980) algorithm
implemented in the |
ncores |
Integer; if |
perm.p |
Logical: should permutation-based p-values be computed instead
of analytic p-values? Default behavior is |
nperm |
Number of permutations to use if permutation-based p-values are to be computed. |
seed |
Random seed to use to generate the permutation null distribution. Defaults to a random seed. |
This function is the fastest approach to conducting MDMR. It uses the fastest known computational strategy to compute the MDMR test statistic (see Appendix A of McArtor et al., 2017), and it uses fast, analytic p-values.
The slowest part of conducting MDMR is now the necessary eigendecomposition
of the G
matrix, whose computation time is a function of
. If MDMR is to be conducted multiple times on the same
distance matrix, it is recommended to compute eigenvalues of
G
in
advance and pass them to the function rather than computing them every
time mdmr
is called, as is the case if the argument lambda
is left NULL
.
The distance matrix D
can be passed to mdmr
as either a
distance object or a symmetric matrix.
An object with six elements and a summary function. Calling
summary(mdmr.res)
produces a data frame comprised of:
Statistic |
Value of the corresponding MDMR test statistic |
Numer DF |
Numerator degrees of freedom for the corresponding effect |
Pseudo R2 |
Size of the corresponding effect on the distance matrix |
p-value |
The p-value for each effect. |
In addition to the information in the three columns comprising
summary(res)
, the res
object also contains:
p.prec |
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the |
lambda |
A vector of the eigenvalues of |
nperm |
Number of permutations used. Will read |
Note that the printed output of summary(res)
will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res)
will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound. For a permutation
test, the estimated p-value will be zero if no permuted test statistics are
greater than the observed statistic, but the zero p-value is only a product
of the finite number of permutations conduted. The only conclusion that can
be drawn is that the p-value is smaller than 1/nperm
.
Daniel B. McArtor ([email protected]) [aut, cre]
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
# --- The following two approaches yield equivalent results --- # # Approach 1 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") res1 <- mdmr(X = X.mdmr, D = D) summary(res1) # Approach 2 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") G <- gower(D) res2 <- mdmr(X = X.mdmr, G = G) summary(res2)
# --- The following two approaches yield equivalent results --- # # Approach 1 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") res1 <- mdmr(X = X.mdmr, D = D) summary(res1) # Approach 2 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") G <- gower(D) res2 <- mdmr(X = X.mdmr, G = G) summary(res2)
mixed.mdmr
allows users to conduct multivariate distance matrix
regression (MDMR) in the context of a (hierarchically) clustered sample
without inflating Type-I error rates as a result of the violation of the
independence assumption. This is done by invoking a mixed-effects modeling
framework, in which clustering/grouping variables are specified as random
effects and the covariate effects of interest are fixed effects. The input
to mixed.mdmr
largely reflects the input of the lmer
function from the package lme4
insofar as the specification of
random and fixed effects are concerned (see Arguments for details). Note that
this function simply controls for the random effects in order to test the
fixed effects; it does not facilitate point estimation or inference on the
random effects.
mixed.mdmr(fmla, data, D = NULL, G = NULL, use.ssd = 1, start.acc = 1e-20, ncores = 1)
mixed.mdmr(fmla, data, D = NULL, G = NULL, use.ssd = 1, start.acc = 1e-20, ncores = 1)
fmla |
A one-sided linear formula object describing both the fixed-effects and random-effects part of the model, beginning with an ~ operator, which is followed by the terms to include in the model, separated by + operators. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors. Two vertical bars (||) can be used to specify multiple uncorrelated random effects for the same grouping variable. |
data |
A mandatory data frame containing the variables named in formula. |
D |
Distance matrix computed on the outcome data. Can be either a
matrix or an R |
G |
Gower's centered similarity matrix computed from |
use.ssd |
The proportion of the total sum of squared distances (SSD)
that will be targeted in the modeling process. In the case of non-Euclidean
distances, specifying |
start.acc |
Starting accuracy of the Davies (1980) algorithm
implemented in the |
ncores |
Integer; if |
An object with six elements and a summary function. Calling
summary(mixed.mdmr.res)
produces a data frame comprised of:
Statistic |
Value of the corresponding MDMR test statistic |
Numer DF |
Numerator degrees of freedom for the corresponding effect |
p-value |
The p-value for each effect. |
In addition to the information in the three columns comprising
summary(res)
, the res
object also contains:
p.prec |
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the |
Note that the printed output of summary(res)
will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res)
will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound.
Daniel B. McArtor ([email protected]) [aut, cre]
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B. (2017). Extending a distance-based approach to multivariate multiple regression (Doctoral Dissertation).
data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Regular MDMR without the grouping variable mdmr.res <- mdmr(X = X.clust[,1:2], D = D, perm.p = FALSE) # Results look significant summary(mdmr.res) # Account for grouping variable mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) # Signifance was due to the grouping variable summary(mixed.res)
data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Regular MDMR without the grouping variable mdmr.res <- mdmr(X = X.clust[,1:2], D = D, perm.p = FALSE) # Results look significant summary(mdmr.res) # Account for grouping variable mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) # Signifance was due to the grouping variable summary(mixed.res)
print
method for class mdmr
## S3 method for class 'mdmr' print(x, ...)
## S3 method for class 'mdmr' print(x, ...)
x |
Output from |
... |
Further arguments passed to or from other methods. |
p-value |
Analytic p-values for the omnibus test and each predictor |
Daniel B. McArtor ([email protected]) [aut, cre]
print
method for class mixed.mdmr
## S3 method for class 'mixed.mdmr' print(x, ...)
## S3 method for class 'mixed.mdmr' print(x, ...)
x |
Output from |
... |
Further arguments passed to or from other methods. |
p-value |
Analytic p-values for the omnibus test and each predictor |
Daniel B. McArtor ([email protected]) [aut, cre]
summary
method for class mdmr
## S3 method for class 'mdmr' summary(object, ...)
## S3 method for class 'mdmr' summary(object, ...)
object |
Output from |
... |
Further arguments passed to or from other methods. |
Calling
summary(mdmr.res)
produces a data frame comprised of:
Statistic |
Value of the corresponding MDMR test statistic |
Pseudo R2 |
Size of the corresponding effect on the distance matrix |
p-value |
The p-value for each effect. |
In addition to the information in the three columns comprising
summary(res)
, the res
object also contains:
p.prec |
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the |
lambda |
A vector of the eigenvalues of |
nperm |
Number of permutations used. Will read |
Note that the printed output of summary(res)
will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res)
will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound. For a permutation
test, the estimated p-value will be zero if no permuted test statistics are
greater than the observed statistic, but the zero p-value is only a product
of the finite number of permutations conduted. The only conclusion that can
be drawn is that the p-value is smaller than 1/nperm
.
Daniel B. McArtor ([email protected]) [aut, cre]
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B., Lubke, G. H., & Bergeman, C. S. (2017). Extending multivariate distance matrix regression with an effect size measure and the distribution of the test statistic. Psychometrika, 82, 1052-1077.
# --- The following two approaches yield equivalent results --- # # Approach 1 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") mdmr.res <- mdmr(X = X.mdmr, D = D) summary(mdmr.res)
# --- The following two approaches yield equivalent results --- # # Approach 1 data(mdmrdata) D <- dist(Y.mdmr, method = "euclidean") mdmr.res <- mdmr(X = X.mdmr, D = D) summary(mdmr.res)
summary
method for class mixed.mdmr
## S3 method for class 'mixed.mdmr' summary(object, ...)
## S3 method for class 'mixed.mdmr' summary(object, ...)
object |
Output from |
... |
Further arguments passed to or from other methods. |
Calling
summary(mdmr.res)
produces a data frame comprised of:
Statistic |
Value of the corresponding MDMR test statistic |
p-value |
The p-value for each effect. |
In addition to the information in the three columns comprising
summary(res)
, the res
object also contains:
p.prec |
A data.frame reporting the precision of each p-value. If
analytic p-values were computed, these are the maximum error bound of the
p-values reported by the |
Note that the printed output of summary(res)
will truncate p-values
to the smallest trustworthy values, but the object returned by
summary(res)
will contain the p-values as computed. The reason for
this truncation differs for analytic and permutation p-values. For an
analytic p-value, if the error bound of the Davies algorithm is larger than
the p-value, the only conclusion that can be drawn with certainty is that
the p-value is smaller than (or equal to) the error bound.
Daniel B. McArtor ([email protected]) [aut, cre]
Davies, R. B. (1980). The Distribution of a Linear Combination of chi-square Random Variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3), 323-333.
Duchesne, P., & De Micheaux, P. L. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4), 858-862.
McArtor, D. B. (2017). Extending a distance-based approach to multivariate multiple regression (Doctoral Dissertation).
data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Regular MDMR without the grouping variable mdmr.res <- mdmr(X = X.clust[,1:2], D = D, perm.p = FALSE) # Results look significant summary(mdmr.res) # Account for grouping variable mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) # Signifance was due to the grouping variable summary(mixed.res)
data("clustmdmrdata") # Get distance matrix D <- dist(Y.clust) # Regular MDMR without the grouping variable mdmr.res <- mdmr(X = X.clust[,1:2], D = D, perm.p = FALSE) # Results look significant summary(mdmr.res) # Account for grouping variable mixed.res <- mixed.mdmr(~ x1 + x2 + (x1 + x2 | grp), data = X.clust, D = D) # Signifance was due to the grouping variable summary(mixed.res)
See mixed.mdmr
.
X.clust
X.clust
An object of class data.frame
with 250 rows and 3 columns.
See package vignette by calling vignette("mdmr-vignette")
.
X.mdmr
X.mdmr
An object of class matrix
with 500 rows and 3 columns.
See mixed.mdmr
.
Y.clust
Y.clust
An object of class data.frame
with 250 rows and 12 columns.
See package vignette by calling vignette("mdmr-vignette")
.
Y.mdmr
Y.mdmr
An object of class matrix
with 500 rows and 10 columns.