Title: | Tools for Multiple Imputation in Multilevel Modeling |
---|---|
Description: | Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets. |
Authors: | Simon Grund [aut,cre], Alexander Robitzsch [aut], Oliver Luedtke [aut] |
Maintainer: | Simon Grund <[email protected]> |
License: | GPL (>=2) |
Version: | 0.4-5 |
Built: | 2024-11-02 03:42:35 UTC |
Source: | https://github.com/simongrund1/mitml |
Provides tools for multiple imputation of missing data in multilevel modeling.
This package includes a user-friendly interface to the algorithms implemented in the R packages pan
and jomo
as well as several functions for visualizing, managing, and analyzing multiply imputed data sets.
The main interface to pan
is the function panImpute
, which allows specifying imputation models for continuous variables with missing data at level 1.
In addition, the function jomoImpute
provides an interface to jomo
, which allows specifying imputation models for both continuous and categorical variables with missing data at levels 1 and 2 as well as single-level data.
The imputations and parameter values are stored in objects of class mitml
.
To obtain the completed (i.e., imputed) data sets, mitmlComplete
is used, producing a list of imputed data sets of class mitml.list
that can be used in further analyses.
Several additional functions allow for convenient analysis of multiply imputed data sets including (bot not limited to) multilevel analyses with lme4
or nlme
and structural equation modeling with lavaan
.
The functions within
, sort
, and subset
can be used to manage and manipulate multiply imputed data sets.
Statistical models are fitted using with
.
Pooled parameter estimates can be extracted with testEstimates
, and model comparisons as well as single- and multi-parameter hypotheses tests can be performed using the functions testModels
and testConstraints
.
In addition, the anova
method provides a simple interface to model comparisons.
Data sets can be imported and exported from or to different statistical software packages.
Currently, mids2mitml.list
, amelia2mitml.list
, jomo2mitml.list
, and long2mitml.list
can be used for importing imputations for other packages in R.
In addition, write.mitmlMplus
, write.mitmlSAV
, and write.mitmlSPSS
export data sets to Mplus and SPSS, respectively.
Finally, the package provides tools for summarizing and visualizing imputation models, which is useful for the assessment of convergence and the reporting of results.
The data sets contained in this package are published under the same license as the package itself. They contain simulated data and may be used by anyone free of charge as long as reference to this package is given.
Authors: Simon Grund, Alexander Robitzsch, Oliver Luedtke
Maintainer: Simon Grund <[email protected]>
amelia
to mitml.list
This function converts a amelia
class object (as produced by the Amelia
package) to mitml.list
. The resulting object may be used in further analyses.
amelia2mitml.list(x)
amelia2mitml.list(x)
x |
An object of class |
A list of imputed data sets with an additional class attribute mitml.list
.
Simon Grund
data(studentratings) require(Amelia) imp <- amelia(x = studentratings[, c("ID", "MathAchiev", "ReadAchiev")], cs = "ID") implist <- amelia2mitml.list(imp)
data(studentratings) require(Amelia) imp <- amelia(x = studentratings[, c("ID", "MathAchiev", "ReadAchiev")], cs = "ID") implist <- amelia2mitml.list(imp)
Performs model comparisons for a series of nested statistical models fitted using with.mitml.list
.
## S3 method for class 'mitml.result' anova(object, ..., method = c("D3", "D4", "D2"), ariv = c("default", "positive", "robust"), data = NULL)
## S3 method for class 'mitml.result' anova(object, ..., method = c("D3", "D4", "D2"), ariv = c("default", "positive", "robust"), data = NULL)
object |
An object of class |
... |
Additional objects of class |
method |
A character string denoting the method used for the model comparison. Can be |
ariv |
A character string denoting how the ARIV is calculated. Can be |
data |
(optional) A list of imputed data sets (see 'Details'). |
This function performs likelihood-based comparisons between multiple statistical models fitted with with.mitml.list
.
If possible, the models are compared using the statistic (Meng & Rubin, 1992).
If this method is unavailable, the
or
statistic is used instead (Chan & Meng, 2019; Li, Meng, Raghunathan, & Rubin, 1991).
This function is essentially a wrapper for testModels
with the advantage that several models can be compared simultaneously.
For a list of supported models and further options for more specific model comparisons, see testModels
.
The ariv
argument affects how the average relative increase in variance is calculated (see also testModels
).
Note that the method can fail if the data to which the model was fitted cannot be found.
In such a case, the
data
argument can be used to specify the list of imputed data sets directly (see also testModels
).
A list containing the results of each model comparison.
A print
method is used for more readable output.
Simon Grund
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103-111.
Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. Journal of the American Statistical Association, 82, 97-105.
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.
require(lme4) data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # simple comparison (same as testModels) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) anova(fit1, fit0) ## Not run: # multiple comparisons fit2 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1 + ReadDis|ID), REML = FALSE)) anova(fit2, fit1, fit0) ## End(Not run)
require(lme4) data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # simple comparison (same as testModels) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) anova(fit1, fit0) ## Not run: # multiple comparisons fit2 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1 + ReadDis|ID), REML = FALSE)) anova(fit2, fit1, fit0) ## End(Not run)
mitml.list
This function adds a mitml.list
class attribute to a list of data frames. The resulting object can be used in further analyses.
as.mitml.list(x)
as.mitml.list(x)
x |
A list of data frames. |
The original list with an additional class attribute mitml.list
.
The list entries are converted into a data.frame
if necessary, in which case a note is printed.
Simon Grund
is.mitml.list
, long2mitml.list
# data frame with 'imputation' indicator dat <- data.frame(imputation = rep(1:10, each = 20), x = rnorm(200)) # split into a list and convert to 'mitml.list' l <- split(dat, dat$imputation) l <- as.mitml.list(l) is.mitml.list(l) # TRUE
# data frame with 'imputation' indicator dat <- data.frame(imputation = rep(1:10, each = 20), x = rnorm(200)) # split into a list and convert to 'mitml.list' l <- split(dat, dat$imputation) l <- as.mitml.list(l) is.mitml.list(l) # TRUE
These functions allow concatenating lists of imputed data sets by data set, row, or column.
## S3 method for class 'mitml.list' c(...) ## S3 method for class 'mitml.list' rbind(...) ## S3 method for class 'mitml.list' cbind(...)
## S3 method for class 'mitml.list' c(...) ## S3 method for class 'mitml.list' rbind(...) ## S3 method for class 'mitml.list' cbind(...)
... |
One or several lists of imputed data sets with class |
The c
, cbind
, and rbind
functions allow combining multiple lists of imputed data sets in different ways.
The c
method concatenates by data set (i.e., by appending additional data sets to the list), rbind
concatenates by row (i.e., appending additional rows to each data set), and cbind
concatenates by column (i.e., by appending additional columns to each data set).
These functions are intended for experienced users and should be used with caution. Appending rows or columns from multiple imputation procedures is usually unsafe unless in special applications (see 'Examples').
A list of imputed data sets with an additional class attribute mitml.list
.
Simon Grund
# Example 1: manual imputation by grouping variable data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp1 <- panImpute(subset(studentratings, FedState == "SH"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) imp2 <- panImpute(subset(studentratings, FedState == "B"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist1 <- mitmlComplete(imp1) implist2 <- mitmlComplete(imp2) rbind(implist1, implist2) # Example 2: predicted values from linear model imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) pred <- with(implist1, predict(lm(ReadDis ~ ReadAchiev))) cbind(implist, pred.ReadDis = pred)
# Example 1: manual imputation by grouping variable data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp1 <- panImpute(subset(studentratings, FedState == "SH"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) imp2 <- panImpute(subset(studentratings, FedState == "B"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist1 <- mitmlComplete(imp1) implist2 <- mitmlComplete(imp2) rbind(implist1, implist2) # Example 2: predicted values from linear model imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) pred <- with(implist1, predict(lm(ReadDis ~ ReadAchiev))) cbind(implist, pred.ReadDis = pred)
Calculates the mean of a given variable within each cluster, possibly conditioning on an additional grouping variable.
clusterMeans(x, cluster, adj = FALSE, group = NULL)
clusterMeans(x, cluster, adj = FALSE, group = NULL)
x |
A numeric vector for which cluster means should be calculated. Can also be supplied as a character string denoting a variable in the current environment (see 'Details'). |
cluster |
A numeric vector or a factor denoting the cluster membership of each unit in |
adj |
Logical flag indicating if person-adjusted group means should be calculated. If |
group |
(optional) A grouping factor or a variable that can be interpreted as such. If specified, cluster means are calculated separately within the sub-groups defined by |
This function calculates the mean of a variable within each level of a cluster variable.
Any NA
are omitted during calculation.
The three main arguments of the function can also be supplied as (single) character strings, denoting the name of the respective variables in the current environment.
This is especially useful for calculating several cluster means simultaneously, for example using within.mitml.list
(see 'Example 2' below).
A numeric vector with the same length as x
containing the cluster mean for all units.
Simon Grund, Alexander Robitzsch
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: single cluster means # calculate cluster means (for each data set) with(implist, clusterMeans(ReadAchiev, ID)) # ... person-adjusted cluster means with(implist, clusterMeans(ReadAchiev, ID, adj = TRUE)) # ... groupwise cluster means with(implist, clusterMeans(ReadAchiev, ID, group = Sex)) # * Example 2: automated cluster means using 'for' and 'assign' # calculate multiple cluster means within multiply imputed data sets within(implist,{ vars <- c("ReadAchiev", "MathAchiev", "CognAbility") for(i in vars) assign(paste(i, "Mean", sep = "."), clusterMeans(i, ID)) rm(i, vars) })
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: single cluster means # calculate cluster means (for each data set) with(implist, clusterMeans(ReadAchiev, ID)) # ... person-adjusted cluster means with(implist, clusterMeans(ReadAchiev, ID, adj = TRUE)) # ... groupwise cluster means with(implist, clusterMeans(ReadAchiev, ID, group = Sex)) # * Example 2: automated cluster means using 'for' and 'assign' # calculate multiple cluster means within multiply imputed data sets within(implist,{ vars <- c("ReadAchiev", "MathAchiev", "CognAbility") for(i in vars) assign(paste(i, "Mean", sep = "."), clusterMeans(i, ID)) rm(i, vars) })
Functions for extracting results and computing confidence intervals from the pooled parameter estimates computed with testEstimates
.
## S3 method for class 'mitml.testEstimates' coef(object, ...) ## S3 method for class 'mitml.testEstimates' vcov(object, ...) ## S3 method for class 'mitml.testEstimates' confint(object, parm, level = 0.95, ...)
## S3 method for class 'mitml.testEstimates' coef(object, ...) ## S3 method for class 'mitml.testEstimates' vcov(object, ...) ## S3 method for class 'mitml.testEstimates' confint(object, parm, level = 0.95, ...)
object |
An object of class |
parm |
(optional) A reference to the parameters for which to calculate confidence intervals. Can be a character or integer vector denoting names or position of parameters, respectively. If missing, all parameters are considered (the default). |
level |
The confidence level. Default is to |
... |
Not being used. |
These functions can be used to extract information and compute additional results from pooled parameter estimates.
The coef
and vcov
methods extract the pooled parameter estimates and their pooled variance-covariance matrix (with the squared standard errors in the diagonal).
The confint
method computes confidence intervals with the given confidence level for the pooled parameters on the basis of a -distribution, with estimates, standard errors, and degrees of freedom as returned by
testEstimates
.
For coef
: A vector containing the pooled parameter estimates
For vcov
: A matrix containing the pooled variance-covariance matrix of the parameter estimates.
For confint
: A matrix containing the lower and upper bounds of the confidence intervals.
Simon Grund
data(studentratings) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 500, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit regression model fit <- with(implist, lm(ReadDis ~ 1 + ReadAchiev)) est <- testEstimates(fit) # extract parameter estimates and variance-covariance matrix coef(est) vcov(est) # compute confidence intervals confint(est) # ... with different confidence levels confint(est, level = 0.90) confint(est, level = 0.999)
data(studentratings) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 500, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit regression model fit <- with(implist, lm(ReadDis ~ 1 + ReadAchiev)) est <- testEstimates(fit) # extract parameter estimates and variance-covariance matrix coef(est) vcov(est) # compute confidence intervals confint(est) # ... with different confidence levels confint(est, level = 0.90) confint(est, level = 0.999)
mitml.list
This function checks if its argument is a list of class mitml.list
.
is.mitml.list(x)
is.mitml.list(x)
x |
An R object. |
TRUE
or FALSE
.
A warning message is displayed if the contents of x
do not appear to be data frames.
Simon Grund
l <- list(data.frame(x = rnorm(20))) l <- as.mitml.list(l) is.mitml.list(l) # TRUE l <- as.list(1:10) is.mitml.list(l) # FALSE class(l) <- "mitml.list" is.mitml.list(l) # TRUE, with a warning
l <- list(data.frame(x = rnorm(20))) l <- as.mitml.list(l) is.mitml.list(l) # TRUE l <- as.list(1:10) is.mitml.list(l) # FALSE class(l) <- "mitml.list" is.mitml.list(l) # TRUE, with a warning
jomo
Performs single- and multilevel imputation for (mixed) continuous and categorical data using the jomo
package
Supports imputation of missing data at level 1 and 2 as well as imputation using random (residual) covariance matrices.
See 'Details' for further information.
jomoImpute(data, type, formula, random.L1 = c("none", "mean", "full"), n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE)
jomoImpute(data, type, formula, random.L1 = c("none", "mean", "full"), n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE)
data |
A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets. |
type |
An integer vector specifying the role of each variable in the imputation model or a list of two vectors specifying a two-level model (see 'Details'). |
formula |
A formula specifying the role of each variable in the imputation model or a list of two formulas specifying a two-level model. The basic model is constructed by |
random.L1 |
A character string denoting if the covariance matrix of residuals should vary across groups and how the values of these matrices are stored (see 'Details'). Can be |
n.burn |
The number of burn-in iterations before any imputations are drawn. Default is 5,000. |
n.iter |
The number of iterations between imputations. Default is 100. |
m |
The number of imputed data sets to generate. Default is 10. |
group |
(optional) A character string denoting the name of an additional grouping variable to be used with the |
prior |
(optional) A list with components |
seed |
(optional) An integer value initializing R's random number generator for reproducible results. Default is to use the global seed. |
save.pred |
(optional) Logical flag indicating if variables derived using |
keep.chains |
(optional) A character string denoting which chains of the MCMC algorithm to store. Can be |
silent |
(optional) Logical flag indicating if console output should be suppressed. Default is |
This function serves as an interface to the jomo
package and supports imputation of single-level and multilevel continuous and categorical data at both level 1 and 2 (see Carpenter & Kenward, 2013; Goldstein et al., 2009).
In order for categorical variables to be detected correctly, these must be formatted as a factor
variables (see 'Examples').
The imputation model can be specified using either the type
or the formula
argument.
The type
interface is designed to provide quick-and-easy imputations using jomo
. The type
argument must be an integer vector denoting the role of each variable in the imputation model:
1
: target variables containing missing data
2
: predictors with fixed effect on all targets (completely observed)
3
: predictors with random effect on all targets (completely observed)
-1
: grouping variable within which the imputation is run separately
-2
: cluster indicator variable
0
: variables not featured in the model
At least one target variable and, for multilevel imputation, the cluster indicator must be specified.
If the cluster indicator is omitted, single-level imputation will be performed.
The intercept is automatically included as both a fixed and (for multilevel models) a random effect.
If a variable of type -1
is found, then separate imputations are performed within each level of that variable.
The formula
argument is intended as a more flexible and feature-rich interface to jomo
.
Specifying the formula
argument is similar to specifying other formulae in R.
Given below is a list of operators that jomoImpute
currently understands:
~
: separates the target (left-hand) and predictor (right-hand) side of the model
+
: adds target or predictor variables to the model
*
: adds an interaction term of two or more predictors
|
: denotes cluster-specific random effects and specifies the cluster indicator (e.g., 1|ID
)
I()
: defines functions to be interpreted by model.matrix
If the cluster indicator is omitted, single-level imputation will be performed.
For multilevel imputation, predictors are allowed to have fixed effects, random effects, or both on all target variables.
The intercept is automatically included as both a fixed and (for multilevel models) a random effect.
Both can be suppressed if needed (see panImpute
).
Note that, when specifying random effects other than the intercept, these will not be automatically added as fixed effects and must be included explicitly.
Any predictors defined by I()
will be used for imputation but not included in the data set unless save.pred = TRUE
.
If missing data occur at both level 1 and 2, the imputation model is specified as a list of two formula
s or type
s, respectively.
The first element of this list denotes the model specification for variables at level 1.
The second element denotes the model specification for variables at level 2.
Missing data are imputed jointly at both levels (see 'Examples', see also Carpenter and Kenward, 2013; Goldstein et al., 2009).
It is possible to model the covariance matrix of residuals at level 1 as random across clusters (Yucel, 2011; Carpenter & Kenward, 2013).
The random.L1
argument determines this behavior and how the values of these matrices are stored.
If set to "none"
, a common covariance matrix is assumed across groups (similar to panImpute
).
If set to "mean"
, the covariance matrices are random, but only the average covariance matrix is stored at each iteration.
If set to "full"
, the covariance matrices are random, and all variances and covariances from all clusters are stored.
In order to run separate imputations for each level of an additional grouping variable, the group
argument can be used.
The name of the grouping variable must be given as a character string (i.e., in quotation marks).
The default prior distribution for the covariance matrices in jomoImpute
are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale.
The prior
argument can be used to specify alternative prior distributions.
These must be supplied as a list containing the following components:
Binv
: scale matrix for the covariance matrix of residuals at level 1
Dinv
: scale matrix for the covariance matrix of random effects and residuals at level 2
a
: starting value for the degrees of freedom of random covariance matrices of residuals (only used with random.L1 = "mean"
or random.L1 = "full"
)
Note that jomo
does not allow for the degrees of freedom for the inverse-Wishart prior to be specified by the user.
These are always set to the lowest value possible (largest dispersion) or determined iteratively if the residuals at level 1 are modeled as random (see above).
For single-level imputation, only Binv
is relevant.
In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the keep.chains
argument (see also panImpute
).
This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution.
An object of class mitml
, containing the following components:
data |
The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original order ( |
replacement.mat |
A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set. |
index.mat |
A matrix containing the row and column index for each missing value. The index matrix is used to link the missing values in the data set with their corresponding rows in the replacement matrix. |
call |
The matched function call. |
model |
A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, at level 1 and level 2, respectively. |
random.L1 |
A character string denoting the handling of the (random) covariance matrix of residuals at level 1 (see 'Details'). |
prior |
The prior parameters used in the imputation model. |
iter |
A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets. |
par.burnin |
A multi-dimensional array containing the parameters of the imputation model from the burn-in phase. |
par.imputation |
A multi-dimensional array containing the parameters of the imputation model from the imputation phase. |
For objects of class mitml
, methods for the generic functions print
, summary
, and plot
are available to inspect the fitted imputation model.
mitmlComplete
is used for extracting the imputed data sets.
Simon Grund, Alexander Robitzsch, Oliver Luedtke
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley.
Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173-197.
Yucel, R. M. (2011). Random covariances and mixed-effects models for imputing multivariate multilevel continuous data. Statistical Modelling, 11, 351-370.
panImpute
, mitmlComplete
, summary.mitml
, plot.mitml
# NOTE: The number of iterations in these examples is much lower than it # should be. This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) data(leadership) # *** # for further examples, see "panImpute" # ?panImpute # *** ................................ # the 'type' interface # # * Example 1.1 (studentratings): 'ReadDis' and 'SES', predicted by 'ReadAchiev' # (random slope) type <- c(-2, 0, 0, 0, 0, 1, 3, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.2 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) type.L1 <- c(-2, 1, 0, 1, 1) # imputation model at level 1 type.L2 <- c(-2, 0, 1, 0, 0) # imputation model at level 2 names(type.L1) <- names(type.L2) <- colnames(leadership) type <- list(type.L1, type.L2) type imp <- jomoImpute(leadership, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.3 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) type <- c(0, -1, 0, 0, 0, 1, 1, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1 (studentratings): 'ReadDis' and 'SES' predicted by 'ReadAchiev' # (random slope) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.2 (studentratings): 'ReadDis' predicted by 'ReadAchiev' and the # the cluster mean of 'ReadAchiev' fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.3 (studentratings): 'ReadDis' predicted by 'ReadAchiev', groupwise # for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # * Example 2.4 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , COHES ~ 1 ) imp <- jomoImpute(leadership, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.5 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) fml <- ReadDis + ReadAchiev + SES ~ 1 imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5)
# NOTE: The number of iterations in these examples is much lower than it # should be. This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) data(leadership) # *** # for further examples, see "panImpute" # ?panImpute # *** ................................ # the 'type' interface # # * Example 1.1 (studentratings): 'ReadDis' and 'SES', predicted by 'ReadAchiev' # (random slope) type <- c(-2, 0, 0, 0, 0, 1, 3, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.2 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) type.L1 <- c(-2, 1, 0, 1, 1) # imputation model at level 1 type.L2 <- c(-2, 0, 1, 0, 0) # imputation model at level 2 names(type.L1) <- names(type.L2) <- colnames(leadership) type <- list(type.L1, type.L2) type imp <- jomoImpute(leadership, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.3 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) type <- c(0, -1, 0, 0, 0, 1, 1, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1 (studentratings): 'ReadDis' and 'SES' predicted by 'ReadAchiev' # (random slope) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.2 (studentratings): 'ReadDis' predicted by 'ReadAchiev' and the # the cluster mean of 'ReadAchiev' fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.3 (studentratings): 'ReadDis' predicted by 'ReadAchiev', groupwise # for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # * Example 2.4 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , COHES ~ 1 ) imp <- jomoImpute(leadership, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.5 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) fml <- ReadDis + ReadAchiev + SES ~ 1 imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5)
Data set containing simulated data for employees nested within organizations, featuring employees' sex, ratings on individual justice orientation and ratings on job satisfaction.
The data set also includes scores for justice climate in each organization (defined at the level of organizations, level 2).
Different organizations are denoted by the variable id
.
The data were simulated based on the results by Liao and Rupp (2005), as well as the secondary analyses of the same data given in Mathieu, Aguinis, Culpepper, and Chen, (2012).
data(justice)
data(justice)
A data frame containing 1400 observations on 4 variables.
Liao, H., & Rupp, D. E. (2005). The impact of justice climate and justice orientation on work outcomes: A cross-level multifoci framework. Journal of Applied Psychology, 90, 242.256.
Mathieu, J. E., Aguinis, H., Culpepper, S. A., & Chen, G. (2012). Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. Journal of Applied Psychology, 97, 951-966.
Data set based on the data simulated by Paul D. Bliese as described in Klein et al. (2000) with slight modifications. The data set consists of 750 employees, nested within 50 work groups, and includes employees' ratings on negative leadership style, job satisfaction, and workload as well as a measure for each work group's cohesion.
The original data set is available in the multilevel
package and was modified by (a) transforming workload into a categorical variable, (b) transforming cohesion into a group-level variable, and (c) by inducing missing values.
data(leadership)
data(leadership)
A data frame containing 750 observations on 5 variables.
Bliese, P. D. (2013). multilevel: Multilevel functions (Version 2.5) [Computer software]. Retrieved from http://CRAN.R-project.org/package=multilevel
Klein, K. J., Bliese, P. D., Kozlowski, S. W. J., Dansereau, F., Gavin, M. B., Griffin, M. A., ... Bligh, M. C. (2000). Multilevel analytical techniques: Commonalities, differences, and continuing questions. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 512-553). San Francisco, CA: Jossey-Bass.
mitml.list
These functions convert data sets containing multiple imputations in long format to objects of class mitml.list
. The resulting object can be used in further analyses.
long2mitml.list(x, split, exclude = NULL) jomo2mitml.list(x)
long2mitml.list(x, split, exclude = NULL) jomo2mitml.list(x)
x |
A data frame in long format containing multiple imputations (see 'Details'). |
split |
A character string denoting the column in |
exclude |
A vector denoting the values of |
The function long2mitml.list
converts data frames from the long format to mitml.list
(i.e., a list of imputed data sets).
In long format, all imputations are contained in a single data frame, where different imputations are denoted by split
.
This function splits the data frame into a list of imputed data sets according to split
, excluding the values specified by exclude
(see the 'Examples').
The jomo2mitml.list
function is a special case of long2mitml.list
which converts imputations that have been generated with jomo
(see the jomo
package)).
A list of imputed data sets with class mitml.list
.
Simon Grund
data(studentratings) require(jomo) # impute data using jomo (native functions) clus <- studentratings[, "ID"] Y <- studentratings[, c("ReadAchiev", "ReadDis")] imp <- jomo(Y = Y, clus = clus, nburn = 1000, nbetween = 100, nimp = 5) # split imputations impList <- long2mitml.list(imp, split = "Imputation", exclude = 0) impList <- jomo2mitml.list(imp)
data(studentratings) require(jomo) # impute data using jomo (native functions) clus <- studentratings[, "ID"] Y <- studentratings[, c("ReadAchiev", "ReadDis")] imp <- jomo(Y = Y, clus = clus, nburn = 1000, nbetween = 100, nimp = 5) # split imputations impList <- long2mitml.list(imp, split = "Imputation", exclude = 0) impList <- jomo2mitml.list(imp)
mids
to mitml.list
This function converts a mids
class object (as produced by the mice
package) to mitml.list
. The resulting object may be used in further analyses.
mids2mitml.list(x)
mids2mitml.list(x)
x |
An object of class |
A list of imputed data sets with class mitml.list
.
Simon Grund
data(studentratings) # imputation using mice require(mice) imp <- mice(studentratings) implist <- mids2mitml.list(imp)
data(studentratings) # imputation using mice require(mice) imp <- mice(studentratings) implist <- mids2mitml.list(imp)
mitml.list
to mids
This function converts a mitml.list
class object to mids
(as used in the mice
package).
mitml.list2mids(x, data, fill = FALSE, where = NULL)
mitml.list2mids(x, data, fill = FALSE, where = NULL)
x |
A list of imputed data sets with class |
data |
A data frame containing to original (incomplete) data (see 'Details'). |
fill |
A logical flag indicating whether variables in the imputed data that are not in the original data should be added and filled with |
where |
(optional) A data frame or matrix of logicals indicating the location of missing values (see 'Details'). |
This function converts objects of class mitml.list
into mids
objects (as used in the mice
package).
The conversion requires a list of imputed data sets and the original (incomplete) data set.
If the imputed data sets have been appended with new variables (e.g., by within.mitml.list
), the new variables can be added to the original data set by setting fill = TRUE
.
This function is essentially a wrapper around as.mids
that sets the case and imputation identifiers automatically and and passes the where
argument as is (see also the documentation of as.mids
).
An object of class mids
.
Simon Grund
mitmlComplete
, mids2mitml.list
, within.mitml.list
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: simple conversion # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings) # * Example 2: conversion with additional variables # compute new variables implist <- within(implist, { M.ReadAchiev <- clusterMeans(ReadAchiev, ID) C.ReadAchiev <- ReadAchiev - M.ReadAchiev }) # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings, fill = TRUE)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: simple conversion # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings) # * Example 2: conversion with additional variables # compute new variables implist <- within(implist, { M.ReadAchiev <- clusterMeans(ReadAchiev, ID) C.ReadAchiev <- ReadAchiev - M.ReadAchiev }) # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings, fill = TRUE)
This function extracts imputed data sets from mitml
class objects as produced by panImpute
and jomoImpute
.
mitmlComplete(x, print = "all", force.list = FALSE)
mitmlComplete(x, print = "all", force.list = FALSE)
x |
An object of class |
print |
Either an integer vector, |
force.list |
(optional) Logical flag indicating if single data sets should be enclosed in a list. Default is |
Usually a list of imputed data with class mitml.list
If only one data set is extracted: a data frame unless force.list = TRUE
.
Simon Grund
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # extract original (incomplete) data set mitmlComplete(imp, print = 0) # extract first imputed data set (returned as mitml.list) mitmlComplete(imp, print = 1, force.list = TRUE) # extract all imputed data sets at once implist <- mitmlComplete(imp, print = "all") ## Not run: # ... alternatives with same results implist <- mitmlComplete(imp, print = 1:5) implist <- mitmlComplete(imp, print = "list") ## End(Not run)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # extract original (incomplete) data set mitmlComplete(imp, print = 0) # extract first imputed data set (returned as mitml.list) mitmlComplete(imp, print = 1, force.list = TRUE) # extract all imputed data sets at once implist <- mitmlComplete(imp, print = "all") ## Not run: # ... alternatives with same results implist <- mitmlComplete(imp, print = 1:5) implist <- mitmlComplete(imp, print = "list") ## End(Not run)
Calculates several measures for the proportion of explained variance in a fitted linear mixed-effects or multilevel model (or a list of fitted models).
multilevelR2(model, print = c("RB1", "RB2", "SB", "MVP"))
multilevelR2(model, print = c("RB1", "RB2", "SB", "MVP"))
model |
Either a fitted linear mixed-effects model as produced by |
print |
A character vector denoting which measures should be calculated (see details). Default is to printing all measures. |
This function calculates several measures of explained variance () for linear-mixed effects models.
It can be used with a single model, as produced by the packages
lme4
or nlme
, or a list of fitted models produced by with.mitml.list
.
In the latter case, the measures are calculated separately for each imputed data set and then averaged across data sets.
Different measures can be requested using the
print
argument.
Specifying RB1
and RB2
returns the explained variance at level 1 and level 2, respectively, according to Raudenbush and Bryk (2002, pp. 74 and 79).
Specifying SB
returns the total variance explained according to Snijders and Bosker (2012, p. 112).
Specifying MVP
returns the total variance explained based on “multilevel variance partitioning” as proposed by LaHuis, Hartman, Hakoyama, and Clark (2014).
A numeric vector containing the measures requested in
print
.
Calculating measures is currently only supported for two-level models with a single cluster variable.
Simon Grund
LaHuis, D. M., Hartman, M. J., Hakoyama, S., & Clark, P. C. (2014). Explained variance measures for multilevel models. Organizational Research Methods, 17, 433-451.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.
require(lme4) data(studentratings) fml <- MathAchiev + ReadAchiev + CognAbility ~ 1 + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) fit <- with(implist, lmer(MathAchiev ~ 1 + CognAbility + (1|ID))) multilevelR2(fit)
require(lme4) data(studentratings) fml <- MathAchiev + ReadAchiev + CognAbility ~ 1 + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) fit <- with(implist, lmer(MathAchiev ~ 1 + CognAbility + (1|ID))) multilevelR2(fit)
pan
Performs multiple imputation of multilevel data using the pan
package (Schafer & Yucel, 2002).
Supports imputation of continuous multilevel data with missing values at level 1.
See 'Details' for further information.
panImpute(data, type, formula, n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE)
panImpute(data, type, formula, n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE)
data |
A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets. |
type |
An integer vector specifying the role of each variable in the imputation model (see 'Details'). |
formula |
A formula specifying the role of each variable in the imputation model. The basic model is constructed by |
n.burn |
The number of burn-in iterations before any imputations are drawn. Default is 5,000. |
n.iter |
The number of iterations between imputations. Default is 100. |
m |
The number of imputed data sets to generate. |
group |
(optional) A character string denoting the name of an additional grouping variable to be used with the |
prior |
(optional) A list with components |
seed |
(optional) An integer value initializing |
save.pred |
(optional) Logical flag indicating if variables derived using |
keep.chains |
(optional) A character string denoting which chains of the MCMC algorithm to store. Can be |
silent |
(optional) Logical flag indicating if console output should be suppressed. Default is to |
This function serves as an interface to the pan
package and supports imputation of continuous multilevel data at level 1 (Schafer & Yucel, 2002).
The imputation model can be specified using either the type
or the formula
argument.
The type
interface is designed to provide quick-and-easy imputations using pan
.
The type
argument must be an integer vector denoting the role of each variable in the imputation model:
1
: target variables containing missing data
2
: predictors with fixed effect on all targets (completely observed)
3
: predictors with random effect on all targets (completely observed)
-1
: grouping variable within which the imputation is run separately
-2
: cluster indicator variable
0
: variables not featured in the model
At least one target variable and the cluster indicator must be specified.
The intercept is automatically included as both a fixed and a random effect.
If a variable of type -1
is found, then separate imputations are performed within each level of that variable.
The formula
argument is intended as a more flexible and feature-rich interface to pan
.
Specifying the formula
argument is similar to specifying other formulae in R.
Given below is a list of operators that panImpute
understands:
~
: separates the target (left-hand) and predictor (right-hand) side of the model
+
: adds target or predictor variables to the model
*
: adds an interaction term of two or more predictors
|
: denotes cluster-specific random effects and specifies the cluster indicator (e.g., 1|ID
)
I()
: defines functions to be interpreted by model.matrix
Predictors are allowed to have fixed effects, random effects, or both on all target variables.
The intercept is automatically included as both a fixed and a random effect, but it can be suppressed if needed (see 'Examples').
Note that, when specifying random effects other than the intercept, these will not be automatically added as fixed effects and must be included explicitly.
Any predictors defined by I()
will be used for imputation but not included in the data set unless save.pred = TRUE
.
In order to run separate imputations for each level of an additional grouping variable, the group
argument can be used.
The name of the grouping variable must be given as a character string (i.e., in quotation marks).
The default prior distributions for the covariance matrices in panImpute
are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale.
The prior
argument can be used to specify alternative prior distributions.
These must be supplied as a list containing the following components:
a
: degrees of freedom for the covariance matrix of residuals
Binv
: scale matrix for the covariance matrix of residuals
c
: degrees of freedom for the covariance matrix of random effects
Dinv
: scale matrix for the covariance matrix of random effects
A sensible choice for a diffuse non-default prior is to set the degrees of freedom to the lowest value possible, and the scale matrices according to a prior guess of the corresponding covariance matrices (see Schafer & Yucel, 2002).
In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the keep.chains
argument.
If set to "full"
(the default), all chains are saved.
If set to "diagonal"
, only chains pertaining to fixed effects and the diagonal entries of the covariance matrices are saved.
This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution.
An object of class mitml
, containing the following components:
data |
The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original row order ( |
replacement.mat |
A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set. |
index.mat |
A matrix containing the row and column index for each missing value. The index matrix is used to link the missing values in the data set with their corresponding rows in the replacement matrix. |
call |
The matched function call. |
model |
A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, respectively. |
random.L1 |
A character string denoting the handling of random residual covariance matrices (not used here; see |
prior |
The prior parameters used in the imputation model. |
iter |
A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets. |
par.burnin |
A multi-dimensional array containing the parameters of the imputation model from the burn-in phase. |
par.imputation |
A multi-dimensional array containing the parameters of the imputation model from the imputation phase. |
For objects of class mitml
, methods for the generic functions print
, summary
, and plot
are available to inspect the fitted imputation model.
mitmlComplete
is used for extracting the imputed data sets.
Simon Grund, Alexander Robitzsch, Oliver Luedtke
Schafer, J. L., and Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437-457.
jomoImpute
, mitmlComplete
, summary.mitml
, plot.mitml
# NOTE: The number of iterations in these examples is much lower than it # should be! This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) # *** ................................ # the 'type' interface # # * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and # 'CognAbility', with random slope for 'ReadAchiev' type <- c(-2, 0, 0, 0, 0, 0, 3, 1, 2, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState', # and predicted by 'ReadAchiev' type <- c(-2, -1, 0, 0, 0, 0, 2, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random intercept) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept) fml <- ReadDis ~ 0 + ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random slope) fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev', # groupwise for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = "FedState", n.burn = 1000, n.iter = 100, m = 5) # * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev' # including the cluster mean of 'ReadAchiev' as an additional predictor fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... using 'save.pred' to save the calculated cluster means in the data set fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5, save.pred = TRUE) head(mitmlComplete(imp, print = 1))
# NOTE: The number of iterations in these examples is much lower than it # should be! This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) # *** ................................ # the 'type' interface # # * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and # 'CognAbility', with random slope for 'ReadAchiev' type <- c(-2, 0, 0, 0, 0, 0, 3, 1, 2, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState', # and predicted by 'ReadAchiev' type <- c(-2, -1, 0, 0, 0, 0, 2, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random intercept) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept) fml <- ReadDis ~ 0 + ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random slope) fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev', # groupwise for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = "FedState", n.burn = 1000, n.iter = 100, m = 5) # * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev' # including the cluster mean of 'ReadAchiev' as an additional predictor fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... using 'save.pred' to save the calculated cluster means in the data set fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5, save.pred = TRUE) head(mitmlComplete(imp, print = 1))
Generates diagnostic plots for assessing the convergence and autocorrelation behavior of pan
's and jomo
's MCMC algorithms.
## S3 method for class 'mitml' plot(x, print = c("beta", "beta2", "psi", "sigma"), pos = NULL, group = "all", trace = c("imputation", "burnin", "all"), thin = 1, smooth = 3, n.Rhat = 3, export = c("none", "png", "pdf"), dev.args = list(), ...)
## S3 method for class 'mitml' plot(x, print = c("beta", "beta2", "psi", "sigma"), pos = NULL, group = "all", trace = c("imputation", "burnin", "all"), thin = 1, smooth = 3, n.Rhat = 3, export = c("none", "png", "pdf"), dev.args = list(), ...)
x |
An object of class |
print |
A character vector containing one or several of |
pos |
Either |
group |
Either |
trace |
One of |
thin |
An integer denoting the thinning factor that is applied before plotting. Default is |
smooth |
A numeric value denoting the smoothing factor for the trend line in trace plots. Higher values correspond to less smoothing. Default is |
n.Rhat |
An integer denoting the number of segments of each chain used for calculating the potential scale reduction factor. Default is |
export |
(optional) A character string specifying if plots should be exported to a file. If |
dev.args |
(optional) A named list containing additional arguments that are passed to the graphics device. |
... |
Parameters passed to the plotting functions. |
The plot
method generates a series of plots for the parameters of the imputation model which can be used for diagnostic purposes.
In addition, a short summary of the parameter chain is displayed.
Setting print
to "beta"
, "beta2"
, "psi"
and "sigma"
will plot the fixed effects, the variances and covariances of the random effects, and the variances and covariances of the residuals, respectively.
In this context, "beta2"
refers to the fixed effects for target variables at level 2 and is only used when a two-part model was specified in (jomoImpute
).
Each plotting window contains a trace plot (upper left), an autocorrelation plot (lower left), a kernel density approximation of the posterior distribution (upper right), and a posterior summary (lower right).
The summary includes the following quantities:
EAP
:Expected value a posteriori (i.e., the mean of the parameter chain)
MAP
:Mode a posteriori (i.e., the mode of the parameter chain)
SD
:Standard deviation of the parameter chain
2.5%
:The 2.5% quantile of parameter values
97.5%
:The 97.5% quantile of parameter values
Rhat
:Estimated potential scale reduction factor ()
ACF-k
:Smoothed autocorrelation at lag , where
is the number of iterations between imputations (see
summary.mitml
)
The trace
and smooth
arguments can be used to influence how the trace plot is drawn and what part of the chain should be used for it.
The thin
argument can be used for thinning the chain before plotting, in which case the number of data points is reduced in the trace plot, and the autocorrelation is calculated up to lag (see above).
The
n.Rhat
argument controls the number of segments that are used for calculating the potential scale reduction factor () in each plot (see
summary.mitml
).
Further aguments to the graphics device are supplied using the dev.args
argument.
The plot
function computes and displays diagnostic information primarily for the imputation phase (i.e., for iterations after burn-in).
This is the default in the plot
function and the recommended method for most users.
If trace = "all"
, the full chain is displayed with emphasis on the imputation phase, and the posterior summary is calculated based on only the iterations after burn-in (as recommended).
If trace = "burnin"
, the posterior summary and the trace plots are calculated based on only the burn-on interations, which is generally not sufficient to establish convergence and should be used with caution.
None (invisible NULL
).
The plots are presented on-screen one at a time. To proceed with the next plot, the user may left-click in the plotting window or press the "enter" key while in the R console, depending on the operating system. No plots are displayed when exporting to file.
Simon Grund
panImpute
, jomoImpute
, summary.mitml
## Not run: data(studentratings) # * Example 1: simple imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # plot all parameters (default) plot(imp) # plot fixed effects only plot(imp, print = "beta") # export plots to file (using pdf device) plot(imp, export = "pdf", dev.args = list(width = 9, height = 4, pointsize = 12)) # * Example 2: groupwise imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = FedState, n.burn = 1000, n.iter = 100, m = 5) # plot fixed effects for all groups (default for 'group') plot(imp, print = "beta", group = "all") # plot fixed effects for first group only plot(imp, print = "beta", group = 1) ## End(Not run)
## Not run: data(studentratings) # * Example 1: simple imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # plot all parameters (default) plot(imp) # plot fixed effects only plot(imp, print = "beta") # export plots to file (using pdf device) plot(imp, export = "pdf", dev.args = list(width = 9, height = 4, pointsize = 12)) # * Example 2: groupwise imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = FedState, n.burn = 1000, n.iter = 100, m = 5) # plot fixed effects for all groups (default for 'group') plot(imp, print = "beta", group = "all") # plot fixed effects for first group only plot(imp, print = "beta", group = 1) ## End(Not run)
mitml
objects from fileThis function loads mitml
class objects from R binary formats (similar to ?load
), usually produced by write.mitml
.
read.mitml(filename)
read.mitml(filename)
filename |
Name of the file to read, to be specified with file extension (e.g., |
An object of class mitml
.
Simon Grund
panImpute
, jomoImpute
, write.mitml
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write 'mitml' object write.mitml(imp, filename = "imputation.R") # read previously saved 'mitml' object old.imp <- read.mitml("imputation.R") class(old.imp) old.imp ## End(Not run)
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write 'mitml' object write.mitml(imp, filename = "imputation.R") # read previously saved 'mitml' object old.imp <- read.mitml("imputation.R") class(old.imp) old.imp ## End(Not run)
Sorts a list of multiply imputed data sets according to an R expression.
## S3 method for class 'mitml.list' sort(x, decreasing = FALSE, by, ...)
## S3 method for class 'mitml.list' sort(x, decreasing = FALSE, by, ...)
x |
A list of imputed data sets with class |
decreasing |
Logical flag indicating if data sets should be sorted in decreasing (i.e., reversed) order. Default is ' |
by |
An R expression or a list of multiple expressions by which to sort the imputed data sets (see 'Examples'). |
... |
Further arguments to ' |
This function sorts a list of imputed data sets according to the R expression given in the by
argument.
The function is based on order
and works in a similar manner.
Note that sorting is performed individually for each data set.
For this reason, the order of cases may differ across data sets if the variables used for sorting contain different values.
A list of imputed data sets with class mitml.list
.
Simon Grund
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: sort by ID sort(implist, by = ID) # * Example 2: sort by combination of variables sort(implist, by = list(FedState, ID, -SES))
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: sort by ID sort(implist, by = ID) # * Example 2: sort by combination of variables sort(implist, by = list(FedState, ID, -SES))
Contains simulated data for students nested within schools, featuring students' ratings of their teachers' behavior (i.e., disciplinary problems in mathematics and reading class) and their general learning environment (school climate) as well as mathematics and reading achievement scores, and scores for socio-economic status and cognitive ability.
In addition, the data set features the ID of 50 different schools (i.e., clusters), the biological sex of all students, and a broad, additional grouping factor. Different amounts of missing data have been inserted into the data set in a completely random fashion.
data(studentratings)
data(studentratings)
A data frame containing 750 observations on 10 variables.
Creates data subsets for a list of multiply imputed data sets.
## S3 method for class 'mitml.list' subset(x, subset, select, ...)
## S3 method for class 'mitml.list' subset(x, subset, select, ...)
x |
A list of imputed data sets with class |
subset |
An R expression by which to subset each data set. |
select |
An R expression by which to select columns. |
... |
Not used. |
This function can be used to create subsets and select variables for a list of multiply imputed data sets according to the R expressions given in the subset
and select
arguments.
The function is based on the subset
function for regular data sets and works in a similar manner.
Note that subsetting is performed individually for each data set.
For this reason, the cases included may differ across data sets if the variables used for subsetting contain different values.
A list of imputed data sets with class mitml.list
.
Simon Grund
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: subset by SES, select variables by name subset(implist, SES < 25, select = c(ID, FedState, Sex, SES, ReadAchiev, ReadDis)) # * Example 2: subset by FedState, select variables by column number subset(implist, FedState == "SH", select = -c(6:7, 9:10)) ## Not run: # * Example 3: subset by ID and Sex subset(implist, ID # * Example 4: select variables by name range subset(implist, select = ID:Sex) ## End(Not run)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: subset by SES, select variables by name subset(implist, SES < 25, select = c(ID, FedState, Sex, SES, ReadAchiev, ReadDis)) # * Example 2: subset by FedState, select variables by column number subset(implist, FedState == "SH", select = -c(6:7, 9:10)) ## Not run: # * Example 3: subset by ID and Sex subset(implist, ID # * Example 4: select variables by name range subset(implist, select = ID:Sex) ## End(Not run)
Provides summary statistics and additional information on imputations in objects of class mitml
.
## S3 method for class 'mitml' summary(object, n.Rhat = 3, goodness.of.appr = FALSE, autocorrelation = FALSE, ...)
## S3 method for class 'mitml' summary(object, n.Rhat = 3, goodness.of.appr = FALSE, autocorrelation = FALSE, ...)
object |
An object of class |
n.Rhat |
(optional) An integer denoting the number of segments used for calculating the potential scale reduction factor. Default is |
goodness.of.appr |
(optional) A logical flag indicating if the goodness of approximation should be printed. Default is |
autocorrelation |
(optional) A logical flag indicating if the autocorrelation should be printed. Default is |
... |
Not used. |
The summary
method calculates summary statistics for objects of class mitml
as produced by panImpute
or jomoImpute
.
The output includes the potential scale reduction factor (PSRF, or ) and (optionally) the goodness of approximation and autocorrelation.
The PSRF is calculated for each parameter of the imputation model and can be used as a convergence diagnostic (Gelman and Rubin, 1992).
Calculation of the PSRFs can be suppressed by setting n.Rhat = NULL
.
The PSRFs are not computed from different chains but by dividing each chain from the imputation phase into a number of segments as denoted by n.Rhat
.
This is slightly different from the original method proposed by Gelman and Rubin.
The goodness of approximation measure indicates what proportion of the posterior standard deviation is due to simulation error.
This is useful for assessing the accuracy of the posterior summaries (e.g., the EAP).
The autocorrelation includes estimates of the autocorrelation in the chains at lag 1 (i.e., for consecutive draws) and for lags and
, where
is the number of iterations between imputations.
For lag
and
, the autocorrelation is slightly smoothed to reduce the influence of noise on the estimates (see
plot.mitml
).
An object of class summary.mitml
.
A print method is used for more readable output.
Simon Grund
Gelman, A., and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472.
Hoff, P. D. (2009). A first course in Bayesian statistical methods. New York, NY: Springer.
panImpute
, jomoImpute
, plot.mitml
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # print summary summary(imp)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # print summary summary(imp)
Performs hypothesis tests for arbitrary functions of the model parameters using the Delta method.
testConstraints(model, qhat, uhat, constraints, method = c("D1", "D2"), ariv = c("default", "positive"), df.com = NULL)
testConstraints(model, qhat, uhat, constraints, method = c("D1", "D2"), ariv = c("default", "positive"), df.com = NULL)
model |
A list of fitted statistical models as produced by |
qhat |
A matrix or list containing the point estimates of the parameters for each imputed data set (see 'Details'). |
uhat |
An array or list containing the variance-covariance matrix of the parameters for each imputed data set (see 'Details'). |
constraints |
A character vector specifying constraints or functions of the vector of model parameters to be tested (see 'Details'). |
method |
A character string denoting the method by which the test is performed. Can be |
ariv |
A character string denoting how the ARIV is calculated. Can be |
df.com |
(optional) A single number or a numeric vector denoting the complete-data degrees of freedom for the hypothesis test (see 'Details'). Only used if |
This function performs tests of arbitrary functions (or constraints) of the model parameters using similar methods as testModels
.
The function relies on the Delta method (e.g., Casella & Berger, 2002) for testing functions of the parameters and assumes that their sampling distribution is approximately normal.
The parameters can either be extracted automatically from the fitted statistical models (model
) or provided manually as matrices, arrays, or lists (qhat
and uhat
, see 'Examples').
Constraints and other functions of the model parameters are specified in the constraints
argument.
The constraints must be supplied as a character vector, where each string denotes a function or a constraint to be tested (see 'Examples').
The Wald-like tests that are carried out by testConstraints
are pooled across the imputed data sets with the (Li, Raghunathan & Rubin, 1991) or
(Li, Meng, Raghunathan & Rubin, 1991) method, where
operates on the constrained point and variance estimates, and
operates on the Wald-statistics (for additional details, see
testModels
).
The pooled estimates and standard errors reported in the output are always based on .
For , the complete-data degrees of freedom can be adjusted for smaller samples by specifying
df.com
(see testModels
).
This function supports general statistical models that define coef
and vcov
methods (e.g., lm
, glm
, lavaan
and others) as well as multilevel models estimated with lme4
or nlme
and GEEs estimated with geepack
.
The arguments qhat
and uhat
provide a general method for pooling parameter estimates regardless of model type (see 'Examples').
Support for further models may be added in future releases.
The ariv
argument determines how the average relative increase in variance (ARIV) is calculated (see testModels
).
If ariv = "default"
, the default estimators are used.
If ariv = "positive"
, the default estimators are used but constrained to take on strictly positive values.
A list containing the results of the model comparison.
A print
method is used for more readable output.
Simon Grund
Casella, G., & Berger, R. L. (2002). Statistical inference (2nd. Ed.). Pacific Grove, CA: Duxbury.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65-92.
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.
data(studentratings) fml <- MathDis + ReadDis + SchClimate ~ (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit simple regression model fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) # apply Rubin's rules testEstimates(fit.lm) # * Example 1: test 'identity' function of two parameters (automatic) # test equivalent to model comparison with a restricted model (without 'ReadDis' # and 'MathDis') cons <- c("ReadDis", "MathDis") testConstraints(fit.lm, constraints = cons) # ... adjusting for finite samples testConstraints(fit.lm, constraints = cons, df.com = 749) # ... using D2 testConstraints(fit.lm, constraints = cons, method = "D2") # * Example 2: test for equality of two parameters # tests the hypothesis that the coefficients pertaining to 'ReadDis' and 'MathDis' # are equal (ReadDis = MathDis) cons <- c("ReadDis-MathDis") testConstraints(fit.lm, constraints = cons) # * Example 3: test against a fixed value # tests the hypothesis that the coefficient of "ReadDis" is equal to one # (i.e., 'ReadDis' - 1 == 0) cons <- c("ReadDis-1") testConstraints(fit.lm, constraints = cons) # * Example 4: test 'identity' function of two parameters (qhat, uhat) fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) qhat <- sapply(fit.lm, coef) uhat <- sapply(fit.lm, function(x) vcov(x), simplify = "array") cons <- c("ReadDis", "MathDis") testConstraints(qhat = qhat, uhat = uhat, constraints = cons)
data(studentratings) fml <- MathDis + ReadDis + SchClimate ~ (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit simple regression model fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) # apply Rubin's rules testEstimates(fit.lm) # * Example 1: test 'identity' function of two parameters (automatic) # test equivalent to model comparison with a restricted model (without 'ReadDis' # and 'MathDis') cons <- c("ReadDis", "MathDis") testConstraints(fit.lm, constraints = cons) # ... adjusting for finite samples testConstraints(fit.lm, constraints = cons, df.com = 749) # ... using D2 testConstraints(fit.lm, constraints = cons, method = "D2") # * Example 2: test for equality of two parameters # tests the hypothesis that the coefficients pertaining to 'ReadDis' and 'MathDis' # are equal (ReadDis = MathDis) cons <- c("ReadDis-MathDis") testConstraints(fit.lm, constraints = cons) # * Example 3: test against a fixed value # tests the hypothesis that the coefficient of "ReadDis" is equal to one # (i.e., 'ReadDis' - 1 == 0) cons <- c("ReadDis-1") testConstraints(fit.lm, constraints = cons) # * Example 4: test 'identity' function of two parameters (qhat, uhat) fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) qhat <- sapply(fit.lm, coef) uhat <- sapply(fit.lm, function(x) vcov(x), simplify = "array") cons <- c("ReadDis", "MathDis") testConstraints(qhat = qhat, uhat = uhat, constraints = cons)
Computes final parameter estimates and inferences from multiply imputed data sets.
testEstimates(model, qhat, uhat, extra.pars = FALSE, df.com = NULL, ...)
testEstimates(model, qhat, uhat, extra.pars = FALSE, df.com = NULL, ...)
model |
A list of fitted statistical models as produced by |
qhat |
A matrix or list containing the point estimates of the parameters for each imputed data set (see 'Details'). |
uhat |
(optional) An array, matrix, or list containing the variance estimates (i.e., squared standard errors) of the parameters for each imputed data set (see 'Details'). |
extra.pars |
A logical flag indicating if estimates of additional parameters (e.g., variance components) should be calculated. Default is |
df.com |
(optional) A numeric vector denoting the complete-data degrees of freedom for the hypothesis tests (see 'Details'). |
... |
Not used. |
This function calculates pooled parameter estimates and inferences as suggested by Rubin (1987, "Rubin's rules") for each parameter of the fitted model.
The parameters can either be extracted automatically from the fitted statistical models (model
) or provided manually as matrices, arrays, or lists (qhat
and uhat
, see 'Examples').
Rubin's original method assumes that the complete-data degrees of freedom are infinite, which is reasonable in larger samples.
Alternatively, the degrees of freedom can be adjusted for smaller samples by specifying df.com
(Barnard & Rubin, 1999).
The df.com
argument can either be a single number if the degrees of freedom are equal for all parameters being tested, or a numeric vector with one element per parameter.
Using the extra.pars
argument, pooled estimates for additional parameters can be requested (e.g., variance components).
This option is available for a number of models but may not provide estimates for all parameters in all model types.
In such a case, users may extract the estimates of additional parameters by hand and pool them with the qhat
argument (see 'Examples').
No inferences are calculated for pooled additional parameters.
Currently, the procedure supports automatic extraction of model parameters from models that define coef
and vcov
methods (e.g., lm
, glm
, lavaan
and others) as well as multilevel models estimated with lme4
or nlme
and GEEs estimated with geepack
.
The arguments qhat
and uhat
provide a general method for pooling parameter estimates regardless of model type (see 'Examples').
Support for further models may be added in future releases.
A list containing the pooled parameter and inferences.
A print
method is used for more readable output.
Simon Grund
Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
with.mitml.list
, confint.mitml.testEstimates
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit multilevel model using lme4 require(lme4) fit.lmer <- with(implist, lmer(SES ~ (1|ID))) # * Example 1: pool estimates of fitted models (automatic) # pooled estimates and inferences separately for each parameter (Rubin's rules) testEstimates(fit.lmer) # ... adjusted df for finite samples testEstimates(fit.lmer, df.com = 49) # ... with additional table for variance components and ICCs testEstimates(fit.lmer, extra.pars = TRUE) # * Example 2: pool estimates using matrices or lists (qhat, uhat) fit.lmer <- with(implist, lmer(SES ~ ReadAchiev + (1|ID))) qhat <- sapply(fit.lmer, fixef) uhat <- sapply(fit.lmer, function(x) diag(vcov(x))) testEstimates(qhat = qhat) testEstimates(qhat = qhat, uhat = uhat)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit multilevel model using lme4 require(lme4) fit.lmer <- with(implist, lmer(SES ~ (1|ID))) # * Example 1: pool estimates of fitted models (automatic) # pooled estimates and inferences separately for each parameter (Rubin's rules) testEstimates(fit.lmer) # ... adjusted df for finite samples testEstimates(fit.lmer, df.com = 49) # ... with additional table for variance components and ICCs testEstimates(fit.lmer, extra.pars = TRUE) # * Example 2: pool estimates using matrices or lists (qhat, uhat) fit.lmer <- with(implist, lmer(SES ~ ReadAchiev + (1|ID))) qhat <- sapply(fit.lmer, fixef) uhat <- sapply(fit.lmer, function(x) diag(vcov(x))) testEstimates(qhat = qhat) testEstimates(qhat = qhat, uhat = uhat)
Performs multi-parameter hypothesis tests for a vector of statistical parameters and compares nested statistical models obtained from multiply imputed data sets.
testModels(model, null.model, method = c("D1", "D2", "D3", "D4"), use = c("wald", "likelihood"), ariv = c("default", "positive", "robust"), df.com = NULL, data = NULL)
testModels(model, null.model, method = c("D1", "D2", "D3", "D4"), use = c("wald", "likelihood"), ariv = c("default", "positive", "robust"), df.com = NULL, data = NULL)
model |
A list of fitted statistical models (“full” model) as produced by |
null.model |
A list of fitted statistical models (“restricted” model) as produced by |
method |
A character string denoting the method by which the test is performed. Can be |
use |
A character string denoting Wald- or likelihood-based based tests. Can be either |
ariv |
A character string denoting how the ARIV is calculated. Can be |
df.com |
(optional) A number denoting the complete-data degrees of freedom for the hypothesis test. Only used if |
data |
(optional) A list of imputed data sets (see 'Details'). Only used if |
This function compares two nested statistical models fitted to multiply imputed data sets by pooling Wald-like or likelihood-ratio tests.
Pooling methods for Wald-like tests of multiple parameters were introduced by Rubin (1987) and further developed by Li, Raghunathan and Rubin (1991).
The pooled Wald test is referred to as and can be used by setting
method = "D1"
.
is the multi-parameter equivalent of
testEstimates
, that is, it tests multiple parameters simultaneously.
For , the complete-data degrees of freedom are assumed to be infinite, but they can be adjusted for smaller samples by supplying
df.com
(Reiter, 2007).
An alternative method for Wald-like hypothesis tests was suggested by Li, Meng, Raghunathan and Rubin (1991).
The procedure is called and can be used by setting
method = "D2"
.
calculates the Wald-test directly for each data set and then pools the resulting
values.
The source of these values is specified by the
use
argument.
If use = "wald"
(the default), then a Wald test similar to is performed.
If
use = "likelihood"
, then the two models are compared with a likelihood-ratio test instead.
Pooling methods for likelihood-ration tests were suggested by Meng and Rubin (1992).
This procedure is referred to as and can be used by setting
method = "D3"
.
compares the two models by pooling the likelihood-ratio test across multiply imputed data sets.
Finally, an improved method for pooling likelihood-ratio tests was recommended by Chan & Meng (2019).
This method is referred to as and can be used by setting
method = "D4"
.
also compares models by pooling the likelihood-ratio test but does so in a more general and efficient manner.
The function supports different classes of statistical models depending on which method
is chosen.
supports models that define
coef
and vcov
methods (or similar) for extracting the parameter estimates and their estimated covariance matrix.
can be used for the same models (if
use = "wald"
and models that define a logLik
method (if use = "likelihood"
).
supports linear models, linear mixed-effects models (see Laird, Lange, & Stram, 1987) with an arbitrary cluster structed if estimated with
lme4
or a single cluster if estimated by nlme
, and structural equation models estimated with lavaan
(requires ML estimator, see 'Note').
Finally, supports models that define a
logLik
method but can fail if the data to which the model was fitted cannot be found.
In such a case, users can provide the list of imputed data sets directly by specifying the data
argument or refit with the include.data
argument in with.mitml.list
.
Support for other statistical models may be added in future releases.
The ,
, and
methods support different estimators of the relative increase in variance (ARIV), which can be specified with the
ariv
argument.
If ariv = "default"
, the default estimators are used.
If ariv = "positive"
, the default estimators are used but constrained to take on strictly positive values.
This is useful if the estimated ARIV is negative.
If ariv = "robust"
, which is available only for , the "robust" estimator proposed by Chan & Meng (2019) is used.
This method should be used with caution, because it requires much stronger assumptions and may result in liberal inferences if these assumptions are violated.
A list containing the results of the model comparison.
A print
method is used for more readable output.
The methods ,
, and the likelihood-based
assume that models were fit using maximum likelihood (ML).
Models fit using REML are automatically refit using ML.
Models fit in
'lavaan'
using the MLR estimator or similar techniques that require scaled difference tests are currently not supported.
Simon Grund
Chan, K. W., & Meng, X.-L. (2019). Multiple improvements of multiple imputation likelihood ratio tests. ArXiv:1711.08822 [Math, Stat]. https://arxiv.org/abs/1711.08822
Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. Journal of the American Statistical Association, 82, 97-105.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65-92.
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065-1073.
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103-111.
Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502-508.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
testEstimates
, testConstraints
, with.mitml.list
, anova.mitml.result
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: multiparameter hypothesis test for 'ReadDis' and 'SES' # This tests the hypothesis that both effects are zero. require(lme4) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) # apply Rubin's rules testEstimates(fit1) # multiparameter hypothesis test using D1 (default) testModels(fit1, fit0) # ... adjusting for finite samples testModels(fit1, fit0, df.com = 47) # ... using D2 ("wald", using estimates and covariance-matrix) testModels(fit1, fit0, method = "D2") # ... using D2 ("likelihood", using likelihood-ratio test) testModels(fit1, fit0, method = "D2", use = "likelihood") # ... using D3 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D3") # ... using D4 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D4")
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: multiparameter hypothesis test for 'ReadDis' and 'SES' # This tests the hypothesis that both effects are zero. require(lme4) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) # apply Rubin's rules testEstimates(fit1) # multiparameter hypothesis test using D1 (default) testModels(fit1, fit0) # ... adjusting for finite samples testModels(fit1, fit0, df.com = 47) # ... using D2 ("wald", using estimates and covariance-matrix) testModels(fit1, fit0, method = "D2") # ... using D2 ("likelihood", using likelihood-ratio test) testModels(fit1, fit0, method = "D2", use = "likelihood") # ... using D3 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D3") # ... using D4 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D4")
The with
and within
methods evaluate R expressions in a list of multiply imputed data sets.
## S3 method for class 'mitml.list' with(data, expr, include.data = FALSE, ...) ## S3 method for class 'mitml.list' within(data, expr, ignore = NULL, ...)
## S3 method for class 'mitml.list' with(data, expr, include.data = FALSE, ...) ## S3 method for class 'mitml.list' within(data, expr, ignore = NULL, ...)
data |
A list of imputed data sets with class |
expr |
An R expression to be evaluated for each data set. |
include.data |
Either a logical flag or a character string denoting how the data are included when |
ignore |
A character vector naming objects that are created but should not be saved (see 'Details'). |
... |
Not used. |
The two functions provide with
and within
methods for objects of class mitml.list
.
They evaluate an R expression repeatedly for each of the imputed data sets but return different values: with
returns the result of the evaluated expression; within
returns the resulting data sets.
The within
function is useful for transforming and computing variables in the imputed data (e.g., centering, calculating cluster means, etc.).
The with
function is useful, for example, for fitting statistical models.
The list of fitted models can be analyzed using testEstimates
, testModels
, testConstraints
, or anova
.
The include.data
argument can be used to include the imputed data sets in the call to fit statistical models (expr
) using with
.
This is useful for fitting models that require that the fitting function be called with a proper data
argument (e.g., lavaan
or nlme
; see 'Examples').
Setting include.data = TRUE
will fit the model with the imputed data sets used as the data
argument.
Setting include.data = "df"
(or similar) will fit the model with the imputed data sets as the df
argument (useful if the function refers to the data by a nonstandard name, such as "df"
).
The ignore
argument can be used to declare objects that are not to be saved in the data sets created by within
.
with
: A list of class mitml.results
containing the evaluated expression for each data set.
within
: A list of class mitml.list
containing the imputed data modified by the evaluated expression.
Simon Grund
mitmlComplete
, anova.mitml.result
, testEstimates
, testModels
, testConstraints
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: data transformation # calculate and save cluster means new1.implist <- within(implist, Means.ReadAchiev <- clusterMeans(ReadAchiev, ID)) # center variables, calculate interaction terms, ignore byproducts new2.implist <- within(implist, { M.SES <- mean(SES) M.CognAbility <- mean(CognAbility) C.SES <- SES - M.SES C.CognAbility <- CognAbility - M.CognAbility SES.CognAbility <- C.SES * C.CognAbility }, ignore = c("M.SES", "M.CognAbility")) # * Example 2: fitting statistical models # fit regression model fit.lm <- with(implist, lm(ReadAchiev ~ ReadDis)) # fit multilevel model with lme4 require(lme4) fit.lmer <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID))) ## Not run: # fit structural equation model with lavaan (with include.data = TRUE) require(lavaan) mod <- "ReadAchiev ~ ReadDis" fit.sem <- with(implist, sem(model = mod, cluster = "ID", estimator = "MLR"), include.data = TRUE) ## End(Not run)
data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: data transformation # calculate and save cluster means new1.implist <- within(implist, Means.ReadAchiev <- clusterMeans(ReadAchiev, ID)) # center variables, calculate interaction terms, ignore byproducts new2.implist <- within(implist, { M.SES <- mean(SES) M.CognAbility <- mean(CognAbility) C.SES <- SES - M.SES C.CognAbility <- CognAbility - M.CognAbility SES.CognAbility <- C.SES * C.CognAbility }, ignore = c("M.SES", "M.CognAbility")) # * Example 2: fitting statistical models # fit regression model fit.lm <- with(implist, lm(ReadAchiev ~ ReadDis)) # fit multilevel model with lme4 require(lme4) fit.lmer <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID))) ## Not run: # fit structural equation model with lavaan (with include.data = TRUE) require(lavaan) mod <- "ReadAchiev ~ ReadDis" fit.sem <- with(implist, sem(model = mod, cluster = "ID", estimator = "MLR"), include.data = TRUE) ## End(Not run)
mitml
objects to fileSaves objects of class mitml
in R binary formats (similar to ?save
).
write.mitml(x, filename, drop = FALSE)
write.mitml(x, filename, drop = FALSE)
x |
An object of class |
filename |
Name of the destination file, specified with file extension (e.g., |
drop |
Logical flag indicating if the parameters of the imputation model should be dropped to reduce file size. Default is |
None (invisible NULL
).
Simon Grund
panImpute
, jomoImpute
, read.mitml
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write full 'mitml' object (default) write.mitml(imp, filename = "imputation.Rdata") # drop parameters of the imputation model write.mitml(imp, filename = "imputation.Rdata", drop = TRUE) ## End(Not run)
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write full 'mitml' object (default) write.mitml(imp, filename = "imputation.Rdata") # drop parameters of the imputation model write.mitml(imp, filename = "imputation.Rdata", drop = TRUE) ## End(Not run)
mitml
objects to Mplus formatSaves objects of class mitml
as a series of text files which can be processed by the statistical software Mplus (Muthen & Muthen, 2012).
write.mitmlMplus(x, filename, suffix = "list", sep = "\t", dec = ".", na.value = -999)
write.mitmlMplus(x, filename, suffix = "list", sep = "\t", dec = ".", na.value = -999)
x |
An object of class |
filename |
File base name for the text files containing the imputed data sets, specified without file extension. |
suffix |
File name suffix for the index file. |
sep |
The field separator. |
dec |
The decimal separator. |
na.value |
A numeric value coding the missing data in the resulting data files. |
The Mplus format for multiply imputed data sets comprises a set of text files, each containing one imputed data set, and an index file containing the names of all data files.
During export, factors and character variables are converted to numeric.
To make this more transparent, write.mitmlMplus
produces a log file which contains information about the data set and the factors that have been converted.
In addition, a basic Mplus input file is generated that can be used for setting up subsequent analysis models.
None (invisible NULL
).
Simon Grund
Muthen, L. K., & Muthen, B. O. (2012). Mplus User's Guide. Seventh Edition. Los Angeles, CA: Muthen & Muthen.
panImpute
, jomoImpute
, mitmlComplete
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write imputation files, index file, and log file write.mitmlMplus(imp, filename = "imputation", suffix = "list", na.value = -999) ## End(Not run)
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write imputation files, index file, and log file write.mitmlMplus(imp, filename = "imputation", suffix = "list", na.value = -999) ## End(Not run)
mitml
objects to native SPSS formatSaves objects of class mitml
in the .sav
format used by the statistical software SPSS (IBM Corp., 2013).
The function serves as a front-end for write_sav
from the haven
package.
write.mitmlSAV(x, filename)
write.mitmlSAV(x, filename)
x |
An object of class |
filename |
Name of the destination file. The file extension ( |
This function exports multiply imputed data sets to a single .sav
file, in which an Imputation_
variable separates the original data and the various imputed data sets.
This allows exporting imputed data directly to the native SPSS format.
Alternatively, write.mitmlSPSS
may be used for creating separate text and SPSS syntax files, which offers more control over the data format.
None (invisible NULL
).
Simon Grund
IBM Corp. (2013). IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp
panImpute
, jomoImpute
, mitmlComplete
, write.mitmlSPSS
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSAV(imp, filename = "imputation") ## End(Not run)
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSAV(imp, filename = "imputation") ## End(Not run)
mitml
objects to SPSS compatible formatSaves objects of class mitml
as a text and a syntax file which can be processed by the statistical software SPSS (IBM Corp., 2013).
write.mitmlSPSS(x, filename, sep = "\t", dec = ".", na.value = -999, syntax = TRUE, locale = NULL)
write.mitmlSPSS(x, filename, sep = "\t", dec = ".", na.value = -999, syntax = TRUE, locale = NULL)
x |
An object of class |
filename |
File base name of the data and syntax files, specified without file extension. |
sep |
The field separator. |
dec |
The decimal separator. |
na.value |
A numeric value coding the missing data in the resulting data file. |
syntax |
A logical flag indicating if an SPSS syntax file should be generated. This file contains instructions for SPSS for reading in the data file. Default is |
locale |
(optional) A character string specifying the localization to be used in SPSS (e.g., |
In SPSS, multiply imputed data are contained in a single file, in which an Imputation_
variable separates the original data and the various imputed data sets.
During export, factors are converted to numeric, whereas character variables are left “as is”.
By default, write.mitmlSPSS
generates a raw text file containing the data, along with a syntax file containing instructions for SPSS.
This syntax file mimics SPSS's functionality to read text files with sensible settings.
In order to read in the data, the syntax file must be opened and executed using SPSS, or open using the GUI.
Manual changes to the syntax file can be required, for example, if the file path of the data file is not correctly represented in the syntax.
The locale
argument can be used to ensure that SPSS reads the data in the correct locale.
Alternatively, write.mitmlSAV
may be used for exporting directly to the SPSS native .sav
format.
None (invisible NULL
).
Simon Grund
IBM Corp. IBM SPSS Statistics for Windows. Armonk, NY: IBM Corp
panImpute
, jomoImpute
, mitmlComplete
, write.mitmlSAV
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSPSS(imp, filename = "imputation", sep = "\t", dec = ".", na.value = -999, locale = "en_US") ## End(Not run)
## Not run: data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSPSS(imp, filename = "imputation", sep = "\t", dec = ".", na.value = -999, locale = "en_US") ## End(Not run)