Rense Nieuwenhuis » lme4

influence.ME now supports new lme4 1.0

Rense Nieuwenhuis — Wed, 21 Aug 2013 09:04:32 +0000

influence.ME is an R package for detecting influential data in multilevel regression models (or, mixed effects models as they are referred to in the R community). The application of multilevel models has become common practice, but the development of diagnostic tools has lagged behind. Hence, we developed influence.ME, which calculates standardized measures of influential data for the point estimates of generalized multilevel models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. A paper detailing this package was published in the R Journal (available from the R Journal (.PDF) and my researchgate.net profile).

influence.ME depends on lme4. As the authors of lme4 have completely revised the inner workings of lme4 and are currently releasing version 1.0, influence.ME required an update to maintain forward compatibility with lme4. I just uploaded version 0.9.3 of influence.ME to CRAN, which will be available soon. This version should work with the new lme4, but if you happen to run into any problems please contact me.

Influence.ME: Simple Analysis

Rense Nieuwenhuis — Thu, 16 Jul 2009 11:00:19 +0000

With the introduction of our new package for influential data influence.ME, I’m currently writing a manual for the package. This manual will address topics for both the experienced, and the inexperienced users.

I will also present much of the content of this manual on my blog. Of course, feel free to comment on it, and readers are encouraged to discuss the content of the manual here. All information will be accessible from the influence.ME website as well. Note that updates to the manual will be made available on that website”, instead of updating this blog post. So, please refer to the influence.ME website for the most up-to-date information.

This is the first section on influence.ME, which deals with a very simply analysis of students nested within 23 schools. Only the effect of a single variable measured at the school level is estimated.

A basic example analysis

The school23 data contains information on a math test performance of 519 students, who are nested within 23 schools. For this example, we will be interested in the relationship between class structure (in this data measured at the school level) and students’ performance on a math test. The research question is: To what extend does the classroom structure determine the students’ math test outcomes?

Initially, we will estimate the effect of class structure on the result of the math performance test, without any further covariates. We do take into account the nesting structure of the data, however, and allow the intercept to be random over schools. This model is estimated using the following syntax, and is assigned to an object we call ‘model’.

model <- lmer(math ~  structure + (1 | school.ID), data=school23)
summary(model)

The call for a summary of the model results in the output shown below. In this summary, the original model formula is shown, as well as the data on which this model was estimated. Both random and fixed effects are summarized. The amount of intercept variance associated with the nesting structure of students within schools is considerably large (23.8 compared with 81.2 + 23.8 = 104 in total). The effect of interest is that of the structure variable, which is -2.343 and statistically insignificant by most reasonable standards (t=-1.609).

Linear mixed model fit by REML 
Formula: math ~ structure + (1 | school.ID) 
   Data: school23 
  AIC  BIC logLik deviance REMLdev
 3802 3819  -1897     3798    3794
Random effects:
 Groups    Name        Variance Std.Dev.
 school.ID (Intercept) 23.884   4.8871  
 Residual              81.270   9.0150  
Number of obs: 519, groups: school.ID, 23
Fixed effects:
            Estimate Std. Error t value
(Intercept)   60.002      5.853  10.252
structure     -2.343      1.456  -1.609
Correlation of Fixed Effects:
          (Intr)
structure -0.982

Iteratively re-estimate model

Building upon the example model estimated in section 2.1, the first step in the procedure of the influence.ME package is to iteratively exclude the influence of the observations nested within each school separately. This is done using the estex() function. The name estex refers to the ESTimates that are returned while EXcluding the influence of each of the grouping levels separately. Thus, in the case of the math test example, in which students are nested in 23 schools, the estex procedure re-estimates the original model 23 times, excluding the influence of a higher level unit (ie school). The function returns the relevant estimates of these 23 re-estimations, which in Figure [fig:Three-steps] is referred to with 'altered estimates'.

The estex() function requires the specification of two parameters: a mixed effects model is to be specified, and the grouping factor of which the influence of the nested observations are to be evaluated. In the syntax example below, the original object 'model' is specified, and 'school.ID' is the relevant grouping factor. school.ID is the name of the variable used to indicate the grouping factor when the original model was specified. The estex() function works perfectly when more than a single grouping is present in the model, but only one grouping factor can be addressed at once.

In the example below, the estimates excluding the influence of the respective grouping levels, as returned by the estex() function, are assigned to an object, which in this case is called este.model (the name of this object, however, is to be chosen arbitrarily by the user).

estex.model <- estex(model, "school.ID")

Note that in the case of complex mixed models (i.e. models with large numbers of observations, complex nesting structures, and/or many nesting groups) the execution of estex() may consume considerable amounts of time. The examples offered by the school23 data, should offer no such problems, however.

Calculate measures of influence

The object estex.model containing the altered estimates can be used to calculate several measures of influential data. To determine the Cook's distance, the ME.cook() function is to be used. In its most basic specification, the ME.cook() function only requires an object to which the altered estimates as returned by the estex() function were assigned:

ME.cook(estex.model)

This basic specification returns a matrix with the rows representing the groups in which the observations are nested, and the single column represents the associated value of Cook's distance. Clearly, these can also be assigned to an object for later modification. The output below shows the result of the syntax above, representing the Cook's distance associated with each school in the school23 data.

              [,1]
6053  2.927552e-02
6327  2.557810e-02
6467  1.402948e-02
7194  3.443392e-05
7472  1.115626e+00
7474  8.142758e-02
7801  3.007558e-04
7829  1.005329e-01
7930  5.525680e-03
24371 4.334659e-03
24725 4.387907e-02
25456 5.644399e-04
25642 1.470130e-02
26537 2.369898e-02
46417 2.204840e-02
47583 1.891108e-02
54344 1.445087e-01
62821 3.593314e-01
68448 2.427028e-02
68493 1.538479e-02
72080 3.471805e-04
72292 6.387956e-03
72991 1.316049e-02

Based on the output shown above, the Cook's distance of school number 7472 is the largest. This corresponds very well to what was concluded based on Figure [fig:Bivariate-influence-plots]. For those who prefer to evaluate the Cook's distance based on a visual representation, the ME.cook() function can also plot its output. To do so, an additional parameter is required, stating plot=TRUE. Additional parameters are allowed as well, which are passed on to the internal dotplot() function (Deepayan Sarkar, 2008) and are used to format the resulting plot. In this case, the example syntax below also specifies the xlab= and ylab= parameters, labelling the two axes. The resulting plot is shown in the figure below. These kinds of plots can be used to more easily assert the influence a grouped set of observations exert on the outcomes of analyses, relative to the influence excerted by other groups of observations.

In this case, it (again) is clear that the observation of the level of class structure of school number 7472 excerts the highest influence. This is based on the calculated value for Cook's distance, as well as that this influence clearly exceeds that of other schools.

ME.cook(estex.model, plot=TRUE,
xlab="Cook's Distance, Class structure",
ylab="School")

Exclude influence, and Repeat

Based on the analyses and graphs shown in the previous sections, there are strong indications that the observations in school number 7472 excert too much influence on the outcomes of the analysis, and thereby unjustifiably determine the outcomes of these analyses. To definitively decide whether or not the influence of these observations indeed is too large, the value of Cook’s distance of this school can be compared with a cut-off value given. Regarding Cook’s distance, it has been argued that observations exceeding a Cook’s distance of are too influential Belsley et al. (1980), and need to be dealt with. In this formula, ‘p’ refers to the number of predictors on which Cook’s distance was calculated. In the case of mixed effects models, this refers to the number of groups in which the observations are nested.

The Cook’s distance of school number 7472 was determined to be 1.31, which readily exceeds the cut-off value of = .17. Thus, is can be concluded that the influence school number 7472 needs to be excluded form the analysis, before the results of that analyses are interpreted. This is done using the function exclude.influence(). This function basically has three parameters: first, the model from which the influence of some observations is to be excluded needs to be specified, together with the grouping factor and the specific level of that grouping factor in which the said observations are nested. The function modifies the original model and returns a new model, which can be checked again for possible influential data.

In the example below, the influence of school number 7472 is excluded from the orginal regression model, which was assigned to object ‘model’ in section 2.1.

The result of the exclude.influence() function again has the form of a mixed effects model and is here assigned to object model.2 (again, this name is to chosen by the user).

model.2 <- exclude.influence(model, "school.ID", "7472")
summary(model.2)

Functions that work with ‘normal’ mixed effects models estimated with lme4, also work with models that were modified with the exclude.influence() function. So, also a summary of model.2 was requested, which is shown below. A few things are clear from this output. The estimate of the effect of class structure is now much stronger (-4.55) and statistically significant (t=2.95). This corresponds to what may have been expected based on the graphical representation of the data in Figure [fig:Bivariate-influence-plots]. Some other changes have been made to the model as well. The original intercept vector (which originally was indicated by (Intercept)) is now replaced by a variable called intercept.alt. This variable is basically an ordinary intercept vector (thus, with a value of 1 for each observation), except for the observations that are nested in the excluded nesting group. For these observations, the intercept.alt variable has score 0. Also, a new variable called estex.7472 is shown. This variable is a dummy variable, indicating the observations that are nested in school number 7472. One such dummy variable is added to the model for each nesting group the influence of which is excluded. Generally, these modifications of the model ensure that the observations nested within the excluded nesting group do not contribute to the estimation of both the level and the variance of the intercept, and do not alter the higher level estimates unjustifiably.

Linear mixed model fit by REML 
Formula: math ~ intercept.alt + estex.7472 + structure + 
(0 + intercept.alt | school.ID) - 1 
   Data: ..2 
  AIC  BIC logLik deviance REMLdev
 3792 3814  -1891     3790    3782
Random effects:
 Groups    Name          Variance Std.Dev.
 school.ID intercept.alt 17.874   4.2277  
 Residual                81.301   9.0167  
Number of obs: 519, groups: school.ID, 23

Fixed effects:
              Estimate Std. Error t value
intercept.alt   69.346      6.314  10.983
estex.7472      54.839      3.617  15.163
structure       -4.550      1.545  -2.945

Correlation of Fixed Effects:
           intrc. e.7472
estex.7472  0.843       
structure  -0.987 -0.854

As is shown in the procedural schematic in Figure [fig:Three-steps], it is advisable to repeat this procedure to the point that the user is satisfied with the stability of the model, for instance when no group of observations exceeds the cut-off value. To do this in this example, the model.2 object is again input to the estex() function, the results of which are stored in a second altered estimates object which we call estex.model.2:

estex.model.2 <- estex(model.2, "school.ID")
ME.cook(estex.model.2, plot=TRUE, 
    xlab="Cook's Distance, Class structure",
    ylab="School", 
    cutoff=.18)

Again, ME.cook() is used to calculate the values for Cook's distance, which returns the output shown below. School number 62821 is associated with the largest value for Cook's distance (.39). The cut-off value now differs (slightly) from the previous one, for the number of (effective) groups in which the observations are nested is decreased by 1, for the influence of school number 7472 was excluded. Thus, the cut-off value now is . Based on the output below, it can thus be concluded that school number 62821 is influential as well.

Finally, the call for ME.cook() in the syntax example above shows one more distinguishing characteristic. Again plot=TRUE is specified, together with specifications for labels on both the x and y axes. A plot of the Cook's distances is thus created, shown in Figure [fig:Cook-2]. In addition to this, the cut-off value of .18 is now indicated as well using cutoff=.18. As a result of this, all Cook's distances with a value larger than .18 will be indicated differently in the plot, as is the case in Figure [fig:Cook-2] regarding the two schools numbered 62821 and 7474. Note that the Cook's distance for school number 7472 now equals 0, indeed, indicating that this school now no longer influences the parameter estimates.

              [,1]
6053  2.186203e-03
6327  2.645659e-02
6467  1.326879e-02
7194  1.319258e-02
7472  0.000000e+00
7474  2.273674e-01
7801  1.378937e-03
7829  7.780663e-02
7930  4.728342e-03
24371 8.621802e-03
24725 7.072999e-02
25456 1.985731e-03
25642 2.487072e-02
26537 1.900817e-03
46417 2.409483e-02
47583 7.919332e-02
54344 1.248145e-01
62821 3.706191e-01
68448 1.752182e-01
68493 2.607158e-02
72080 2.669324e-05
72292 1.193296e-02
72991 1.311974e-02

Further analysis of this example would thus entail the exclusion of the influence of observations nested within school number 62821, and then to recheck the model by running through the three steps of the procedure again. This is not shown here, to not make this exercise overly lengthy.

Presenting influence.ME at useR!

Rense Nieuwenhuis — Fri, 10 Jul 2009 09:49:33 +0000

Today I presented influence.ME at the useR! conference in Rennes. Influence.ME is an R package for detecting influential data in mixed models. I developed this package together with Ben Pelzer and Manfred te Grotenhuis.

More information about influence.ME can be found on another section of my website.

Below, please find the slides of the presentation.
Presentation Influence.ME at Rennes, useR! 2009

Influence.ME: don’t specify the intercept

Rense Nieuwenhuis — Thu, 18 Jun 2009 11:00:00 +0000

Just recently, I was contacted by a researcher who wanted to use influence.ME to obtain model estimates from which iteratively some data was deleted. In his case, observations were nested within an area, but there were very unequal numbers of observations in each area.

Unfortunately, he wasn’t able to use the influence.ME package on his models. He kindly sent me his data, so I could figure out what went wrong, and it showed to be a little problem with influence.ME.

The problem was with how the model was specified: the intercept was explicated, next to several (fixed) variables. It turned out, that such a model specification is not compatible with the internal changes made to the mixed model. Therefore, I advise users of influence.ME not to explicitly specify the intercept in their lme4 regression models.

I reproduced the problem with the school23 data, which is available in influence.ME. Compare the two model specifications below: in the first the intercept is specified, in the second it isn’t. The outcomes of both lmer models are identical. However, the first returns a convergence error when used with the estex() function, while the second doesn’t.

The input:
mod <- lmer(math ~ 1 + structure + (1 | school.ID), data=school23) estex.mod <- estex(mod, "school.ID")

mod <- lmer(math ~ structure + (1 | school.ID), data=school23) estex.mod <- estex(mod, "school.ID")

The output:
> mod <- lmer(math ~ 1 + structure + (1 | school.ID), data=school23) > estex.mod <- estex(mod, "school.ID") Error in mer_finalize(ans) : Downdated X'X is not positive definite, 3. > > mod <- lmer(math ~ structure + (1 | school.ID), data=school23) > estex.mod <- estex(mod, "school.ID")

I will surely investigate whether this can be resolved in a future update, but for now, simply leave the intercept out of your model specification: lmer will add it for you.

Influence.ME is an R package and provides tools for detecting influential data in mixed effects models. More information can be found here.

useR! 2009 acceptance: presenting influence.ME

Rense Nieuwenhuis — Thu, 23 Apr 2009 10:23:56 +0000

The organizing committee of the useR! 2009 conference just informed me, that my submission for presenting my extension package influence.ME, has been accepted! Influence.ME is a new R package that I’m currently developing, with the indispensable help of Ben Pelzer and Manfred te Grotenhuis. Although I did not yet introduce influence.ME on this blog, rest assured that I will do so within just a few weeks. Now is time for celebration!

Influence.ME is an R package that provides a collection of tools for detecting influential data in mixed effects models. Testing for inï¬‚uence with mixed effects models is especially important in Social Science applications, for two reasons. First, models in the Social Sciences are frequently based on large numbers of individuals while the number of higher level units is often relatively small. Secondly, often the higher level units are remarkably similar, for instance in the case of neighboring countries.

useR! is a yearly user conference on exciting applications in R. The useR! 2009 edition will be held in Rennes, France. A great variety of packages, applications, and other developments relating to R will be discussed. I’ve visited the useR! 2008 conference last year (in Dortmund, Germany), and found it a highly stimulating environment for those interested in exciting, practical applications in statistics using R.

Influence.ME is a project I’ve been working on for the last months, together with Ben Pelzer and Manfred te Grotenhuis. I’m still working – quite hard!- to iron out the last quirks, and we have tons of ideas for extending its functionality. I’m very happy to be able to present the result of this work to an R-minded audience this summer.

R-Sessions 32: Forward.lmer: Basic stepwise function for mixed effects in R

Rense Nieuwenhuis — Fri, 13 Feb 2009 10:59:03 +0000

Intended to be a customized solution, it may have grown to be a little more. forward.lmer is an early installment of a full stepwise function for mixed effects regression models in R-Project. I may put in some work to extend it, or I may not. Nevertheless, in a ‘forward sense of stepwise’, I think it can be pretty useful as it is. Also, it has an interesting take on the stepwise concept, I think.

Most stepwise functions (as far as I know) take a base model and a bunch of variables, and then iteratively adds and/or subtracts some variables, according to various criteria, to come to the best fitting regression model. All very interesting, but how to deal with interaction variables? And moreover: most existing functions do not work with mixed effects models ((I use the term ‘mixed effects model’ to describe this stepwise function to refer to what is often referred to as hierarchical or multilevel regression models, as well)).

Built around the lme4 package in R, forward.lmer provides a forward stepwise procedure to mixed effects models. Also, it allows the user not only to enter single variables to models, but also to do the same with blocks of variables. This opens up many options: users can add the complete interactions at once (i.e. both the original and the multiplicative terms), or add these consequetively. Future development will focus on additional selection criteria for interactions, such as the criterium that at least the multiplicative term needs to be statistically significant.

The user provides a starting model and a set of variables to evaluate. The procedure then updates the starting model with the addition of every single variable (or block of variables). The models are ordered based on their LogLikelihood (other criteria, i.e. BIC and AIC following soon), after which the best fitting model is evaluated against one of two criteria. The first criterium is that at least one of the added parameters is statistically significant. The other criterium is that the addition of the parameters together is statistically significant.

There are several parameters to be specified:

start.model: The starting model the procedure starts with. This can be a null-model, or a model already containing several variables. All lmer-models (i.e. logistic, poisson, linear) are supported.
blocks: a vector of variable names (as character strings) to be added to a model. Several variables can a concatenated within the same character string, so that these are added as a block of variables, instead of a single variables at once.
max.iter: The maximum number of variables that are evaluated. If max.iter is reached, the procedure stops without adding more variables.
sig.level: This is the p-value against which it is tested whether the new model fits better than a base model. Either sig.level or zt needs to be specified, but not both at once.
zt: This is either the T or Z value that is used to test whether (at least) one of the added variables is statistically significant. T values are used for linear regression, Z values for binary response models.
print.log: Should a log be printed? The log contains information on which variables (and on which criteria) were added in each step.

The forward.lmer function returns the best fitting model (according to the given criteria). Of course, one can use this resulting model as a starting model for a new stepwise procedure.

forward.lmer <- function( start.model, blocks, max.iter=1, sig.level=FALSE, zt=FALSE, print.log=TRUE) {


	# forward.lmer: a function for stepwise regression using lmer mixed effects models

	# Author: Rense Nieuwenhuis
	# Initialysing internal variables

	log.step <- 0

	log.LL <- log.p <- log.block <- zt.temp <- log.zt <- NA

	model.basis <- start.model
	# Maximum number of iterations cannot exceed number of blocks

	if (max.iter > length(blocks)) max.iter <- length(blocks)
	# Setting up the outer loop

	for(i in 1:max.iter)

		{
		models <- list()
		# Iteratively updating the model with addition of one block of variable(s)

		# Also: extracting the loglikelihood of each estimated model

		for(j in 1:length(blocks))

			{

			models[[j]] <- update(model.basis, as.formula(paste(". ~ . + ", blocks[j])))

			}
		LL <- unlist(lapply(models, logLik))
		# Ordering the models based on their loglikelihood.

		# Additional selection criteria apply

		for (j in order(LL, decreasing=TRUE))

			{
			##############

			############## Selection based on ANOVA-test

			##############
			if(sig.level != FALSE)

				{

				if(anova(model.basis, models[[j]])[2,7] < sig.level)

					{
					model.basis <- models[[j]]
					# Writing the logs

					log.step <- log.step + 1

					log.block[log.step] <- blocks[j]

					log.LL[log.step] <- as.numeric(logLik(model.basis))

					log.p[log.step] <- anova(model.basis, models[[j]])[2,7]
					blocks <- blocks[-j]
					break

					}

				}
			##############

			############## Selection based significance of added variable-block

			##############	
			if(zt != FALSE)

				{

				b.model <- summary(models[[j]])@coefs

				diff.par <- setdiff(rownames(b.model), rownames(summary(model.basis)@coefs))

				if (length(diff.par)==0) break

				sig.par <- FALSE
				for (k in 1:length(diff.par))

					{

					if(abs(b.model[which(rownames(b.model)==diff.par[k]),3]) > zt)

						{

						sig.par <- TRUE

						zt.temp <- b.model[which(rownames(b.model)==diff.par[k]),3]

						break

						}

					}					
				if(sig.par==TRUE)

					{

					model.basis <- models[[j]]
					# Writing the logs

					log.step <- log.step + 1

					log.block[log.step] <- blocks[j]

					log.LL[log.step] <- as.numeric(logLik(model.basis))

					log.zt[log.step] <- zt.temp

					blocks <- blocks[-j]
					break

					}

				}

			}

		}
	## Create and print log

	log.df <- data.frame(log.step=1:log.step, log.block, log.LL, log.p, log.zt)

	if(print.log == TRUE) print(log.df, digits=4)
	## Return the 'best' fitting model

	return(model.basis)

	}

As always, you're invited to use this function, or to adapt it and use that. However, it is required to make mention of this function and its author. Additionally, since I intend to continue working on this function (perhaps even evolve it to a 'package' on CRAN), I would love to hear about any experiences in using it.

R-Sessions 31: Combining lmer output in a single table (UPDATED)

Rense Nieuwenhuis — Thu, 05 Feb 2009 11:00:38 +0000

There are various ways of getting your output from R to your publication draft. Most of them are highly efficient, but unfortunately I couldn’t find a function that combines the output from several (lmer) models and presents it in a single table. lmer is the mixed effects model function from the lme4 package. So, I wrote a simple function that does exactly that.

Using it for a specific purpose, it is not a general function or something, but it can easily be adapted for use in other settings. Here it goes:

require(lme4) require(mlmRev) require(lme4) require(mlmRev)


model.1 <- lmer(normexam ~ 1 + (1 | school), data=Exam)

model.2 <- lmer(normexam ~ standLRT + (1 | school), data=Exam)

model.3 <- lmer(normexam ~ standLRT + sex + (1 | school), data=Exam)

model.4 <- lmer(normexam ~ standLRT + sex + schavg + (1 | school), data=Exam)
model.a <- lmer(use ~ 1 + (1 | district), family=binomial, data=Contraception)

model.b <- lmer(use ~ livch + (1 | district), family=binomial, data=Contraception)

model.c <- lmer(use ~ age + (1 | district), family=binomial, data=Contraception)

model.d <- lmer(use ~ livch + age + (1 | district), family=binomial, data=Contraception)
m1 <- c(model.1, model.2, model.3, model.4)

m2 <- c(model.a, model.b, model.c, model.d)
combine.output.lmer <- function(models, labels=FALSE)

	{
	fix.coef <- lapply(models, function(x) summary(x)@coefs)

	var.coef <- lapply(models, function(x) summary(x)@REmat)

	n.par <- dim(summary(models[[1]])@coefs)[2]
	ifelse(labels==FALSE,

		fix.labels <- colnames(summary(models[[1]])@coefs),

		fix.labels <- labels)
	var.labels <- colnames(var.coef[[1]])
	# Creating table with fixed parameters

	output.coefs <- data.frame(Row.names=row.names(fix.coef[[1]]))

	for (i in 1:length(models))

		{
		a <- fix.coef[[i]]

		colnames(a) <- paste("Model", i, fix.labels)

		output.coefs <- merge(output.coefs, a, by.x=1, by.y=0, all=T, sort=FALSE)
		}

	output.coefs[,1] <- as.character(output.coefs[,1])

	output.coefs[dim(output.coefs)[1]+2, 1] <- "Loglikelihood"

	LL <- unlist(lapply(models, function(x) as.numeric(logLik(x))))

	output.coefs[dim(output.coefs)[1], 1:length(models)*n.par-n.par+2] <- LL
	# Creating table with random parameters

	output.vars <- data.frame(var.coef[[1]])[,1:2]

	for (i in 1:length(models))

		{
		a <- var.coef[[i]]

		colnames(a) <- paste("Model", i, var.labels)

		output.vars <- merge(output.vars, a, by.x=1:2, by.y=1:2, all=T, sort=FALSE)
		}
	# Combining output.coefs and output.vars

	n.cols <- dim(output.coefs)[2]

	n.coefs <- dim(output.coefs)[1]

	n.vars <- dim(output.vars)[1]
	output <- matrix(ncol=n.cols +1 , nrow=n.vars+n.coefs+2)
	output[1:n.coefs, -2] <- as.matrix(output.coefs)

	output[n.coefs+2, 1] <- "Variance Components"

	output[(n.coefs+3) : (n.coefs+n.vars+2), 1:2] <- as.matrix(output.vars[,1:2])

	output[

		(n.coefs+3) : (n.coefs+n.vars+2),

		which(rep(c(1,1,rep(0, n.par-2)),length(models))!=0)+2] <- as.matrix(output.vars[,c(-1,-2)])
	colnames(output) <- c("Parameter", "Random", colnames(output.coefs)[-1])
	return(output)

	}
combined <- combine.output.lmer(m1)

combined <- combine.output.lmer(m2)
combined <- combine.output.lmer(m1, labels=c("appel", "banaan", "grapefruit"))

combined <- combine.output.lmer(m2, labels=c("appel", "peer", "banaan", "grapefruit"))

write.csv(combined, "combined.csv", na=" ")

In this example I estimate four mixed effects models, which are concatenated in a single object 'm'. The function itself is called 'combine.output.lmer', and is used on the object 'm'. The output is a data.frame with the variable names in the first column. Not-estimated parameters in models are indicated by 'NA' in their respective columns. By writing the 'combined'-object to an external file, the NA's are lost and the file can be read into other software, such as Open Office Spreadsheet or Excell. Use the xtable-package to get it in your latex document.

UPDATE
I updated and improved the code somewhat, for I wasn't satisfied with the results. Now the code adapts to the number of parameters derived form the models' summary, allows to add your own names to the columns, and, most importantly, also reports the random slopes.

Please note: due to the internal matching procedure, errors may occur when the same variable is random 'within' more than one other variable. This is only the case when other variables are random within each nesting factor as well.

R-Sessions 19: Extractor Functions

Rense Nieuwenhuis — Fri, 05 Sep 2008 10:58:43 +0000

Unlike most statistical software packages, R often stores the results of an analysis in an object. The advantage of this is that while not all output is shown in the screen ad once, it is neither necessary to estimate the statistical model again if different output is required.

This paragraph will show the kind of data that is stored in a multilevel model estimated by R-Project and introduce some functions that make use of this data.

Inside the model

Let’s first estimate a simple multilevel model, using the nlme-package. For this paragraph we will use a model we estimated earlier: the education model with a random intercept and a random slope. This time though, we will assign it to an object that we call model.01. It is estimated as follows:

require(nlme)
require(mlmRev)

model.01 <- lme(fixed = normexam ~ standLRT, data = Exam,
random = ~ standLRT | school)

Basically, this results in no output at all, although the activation of the packages creates a little output. Basic results can be obtained by simply calling the object:

model.01

> model.01
Linear mixed-effects model fit by REML
  Data: Exam 
  Log-restricted-likelihood: -4663.8
  Fixed: normexam ~ standLRT 
(Intercept)    standLRT 
-0.01164834  0.55653379 

Random effects:
 Formula: ~standLRT | school
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.3034980 (Intr)
standLRT    0.1223499 0.494 
Residual    0.7440699       

Number of Observations: 4059
Number of Groups: 65

This gives a first impression of the estimated model. But, there is more. To obtain an idea of the elements that are actually stored inside the model, we use the names() functions, that gives us the names of all the elements of the model.

names(model.01)
model.01$method
model.01$logLik

The output below shows that our model.01 contains seventeen elements. For reasons of space, only some will be described. ‘Contrasts’ contains information on the way categorical variables were handled, ‘coefficients’ contains the model-parameters, in ‘call’ the model formula is stored and in ‘data’ even the original data is stored.

> names(model.01)
 [1] "modelStruct"  "dims"         "contrasts"    "coefficients"
 [5] "varFix"       "sigma"        "apVar"        "logLik"      
 [9] "numIter"      "groups"       "call"         "terms"       
[13] "method"       "fitted"       "residuals"    "fixDF"       
[17] "data"        
> model.01$method
[1] "REML"
> model.01$logLik
[1] -4663.8

In the syntax above two specific elements of the model were requested: the estimation method and the loglikelihood. This is done by sub-setting the model using the $-sign after which the desired element is placed. The output tells us that model.01 was estimated using Restricted Maximum Likelihood and that the loglikelihood is -4663.8 large.

Summary

All the information we could possibly want is stored inside the models, as we have seen. In order to receive synoptic results, many functions exist to extract some of the elements from the model and present them clearly. The most basic of these extractor-functions is probably summary():

summary(model.01)

> summary(model.01)
Linear mixed-effects model fit by REML
 Data: Exam 
     AIC     BIC  logLik
  9339.6 9377.45 -4663.8

Random effects:
 Formula: ~standLRT | school
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.3034980 (Intr)
standLRT    0.1223499 0.494 
Residual    0.7440699       

Fixed effects: normexam ~ standLRT 
                 Value  Std.Error   DF   t-value p-value
(Intercept) -0.0116483 0.04010986 3993 -0.290411  0.7715
standLRT     0.5565338 0.02011497 3993 27.667639  0.0000
 Correlation: 
         (Intr)
standLRT 0.365 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.8323045 -0.6316837  0.0339390  0.6834319  3.4562632 

Number of Observations: 4059
Number of Groups: 65

Anova

The last extractor function that will be shown here is anova. This is a very general function that can be used for a great variety of models. When it is applied to a multilevel model, it results in a basic test for statistical significance of the model-parameters, as is showed below.

anova(model.01)

model.02 <- lme(fixed = normexam ~ standLRT, data = Exam,
random = ~ 1 | school)

anova(model.02,model.01)

In the syntax above an additional model is estimated that is very similar to our model.01, but does not have a random slope. It is stored in the object model.02. This is done to show that it is possible to test whether the random slope model fits better to the data than the fixed slope model. The output below shows that this is indeed the case.

> anova(model.01)
            numDF denDF  F-value p-value
(Intercept)     1  3993 124.3969  <.0001
standLRT        1  3993 765.4983  <.0001
> 
> model.02 <- lme(fixed = normexam ~ standLRT, data = Exam,
+ 	random = ~ 1 | school)
> 
> anova(model.02,model.01)
         Model df      AIC      BIC    logLik   Test  L.Ratio
model.02     1  4 9376.765 9401.998 -4684.383                
model.01     2  6 9339.600 9377.450 -4663.800 1 vs 2 41.16494
         p-value
model.02        
model.01  <.0001

- - -- --- ----- --------

- - -- --- ----- --------
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
-------- ----- --- -- - -

R-Sessions 17: Generalized Multilevel {lme4}

Rense Nieuwenhuis — Mon, 01 Sep 2008 10:00:40 +0000

Although all introductions on regression seem to be based on the assumption of data that is distributed normally, in practice this is not the case. Many other types of distributions exist. To name a few: normal distribution, binomial distribution, poisson, gaussian and so on. The lmer()-function in the lme4-package can easily estimate models based on these distributions. This is done by adding the ‘family’-argument to the command syntax, thereby specifying that not a linear multilevel model needs to be estimated, but a generalized linear model.

Logistic Multilevel Regression

Let us say, we want to estimate the chance for success on a test a student in a specific school has. Therefor, we can use the Exam data-set in the mlmRev-package. This contains the standardized scores on a test. Here, we’ll define success on the test as having a standardized score of 0 or larger. This is recoded to a 0-1 variable below, using the ifelse() function. Using summary() the process of recoding is checked. The needed packages are loaded as well, using the library() function.

library(lme4)
library(mlmRev)
names(Exam)

Exam$success <- ifelse(Exam$normexam >= 0,1,0)
summary(Exam$normexam)
summary(Exam$success)

> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> library(mlmRev)
> names(Exam)
 [1] "school"   "normexam" "schgend"  "schavg"   "vr"       "intake"  
 [7] "standLRT" "sex"      "type"     "student" 
> 
> Exam$success <- ifelse(Exam$normexam >= 0,1,0)
> summary(Exam$normexam)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-3.6660000 -0.6995000  0.0043220 -0.0001138  0.6788000  3.6660000 
> summary(Exam$success)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  1.0000  0.5122  1.0000  1.0000

In order to be able to properly use the so created binary ‘success’ variable, a logistic regression model needs to be estimated. This is done by specifying binomial family, using the logit as a link-function, using “family = binomial(link = “logit”)”. The rest of the specification is exactly the same as a normal linear multilevel regression model using the lmer() function.

lmer(success~ schavg + (1|school), data=Exam, family=binomial(link = “logit”))

> lmer(success~ schavg + (1|school), 
+ 	data=Exam, 
+ 	family=binomial(link = "logit"))
Generalized linear mixed model fit using Laplace 
Formula: success ~ schavg + (1 | school) 
   Data: Exam 
 Family: binomial(logit link)
  AIC  BIC logLik deviance
 5323 5342  -2658     5317
Random effects:
 Groups Name        Variance Std.Dev.
 school (Intercept) 0.23113  0.48076 
number of obs: 4059, groups: school, 65

Estimated scale (compare to  1 )  0.9909287 

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.08605    0.07009   1.228    0.220    
schavg       1.60548    0.21374   7.511 5.86e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Correlation of Fixed Effects:
       (Intr)
schavg 0.072

– – — — —– ——–

– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –

R-Sessions 16: Multilevel Model Specification (lme4)

Rense Nieuwenhuis — Wed, 27 Aug 2008 10:00:47 +0000

Multilevel models, or mixed effects models, can easily be estimated in R. Several packages are available. Here, the lmer() function from the lme4-package is described. The specification of several types of models will be shown, using a fictive example. A detailed description of the specification rules is given. Output of the specified models is given, but not described or interpreted.
Please note that this description is very closely related to the description of the specification of the lme() function of the nlme-package. The results are similar and here exactly the same possibilities are offered.

In this example, the dependent variable is the standardized result of a student on a specific exam. This variable is called “normexam”. In estimating the score on the exam, two levels will be discerned: student and school. On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (“standLRT”). On the school-level, we take into account the average intake-score (“schavg”).

Preparation

Before analyses can be performed, preparation needs to take place. Using the library() command, two packages are loaded. The lme4-package contains functions for estimation of multilevel or hierarchical regression models. The mlmRev-package contains, amongst many other things, the data we are going to use here. In the output below, we see that R-Project automatically loads the Matrix- and the lattice-packages as well. These are needed for the lme4-package to work properly.
Finally, the names() command is used to examine which variables are contained in the ‘Exam’ data.frame.

library(lme4)
library(mlmRev)
names(Exam)

>library(lme4)
Loading required package: lme4
Loading required package: Matrix
Loading required package: lattice
[1] TRUE
>library(mlmRev)
Loading required package: mlmRev
[1] TRUE
>names(Exam)
 [1] "school"   "normexam" "schgend"  "schavg"   "vr"       "intake"
 [7] "standLRT" "sex"      "type"     "student"

null-model

The syntax below specifies the most simple multilevel regression model of all: the null-model. Only the levels are defined. Using the lmer-function, the first level (here: students) do not have to be specified. It is assumed that the dependent variable (here: normexam) is on the first level (which it should be).

The model is specified using standard R formulas: First the dependent variable is given, followed by a tilde ( ~ ). The ~ should be read as: “follows”, or: “is defined by”. Next, the predictors are defined. In this case, only the intercept is defined by entering a ‘1’. Next, the random elements are specified between brackets ( ). Inside these brackets we specify the random predictors, followed by a vertical stripe ( | ), after which the group-level is specified.

After the model specification, several parameters can be given to the model. Here, we specify the data that should be used by data=Exam. Another often used parameter indicates the estimation method. If left unspecified, restricted maximum likelihood (REML) is used. Another option would be: method=”ML”, which calls for full maximum likelihood estimation. All this leads to the following model specification:

lmer(normexam ~ 1 + (1 | school), data=Exam)

This leads to the following output:

> lmer(normexam ~ 1 + (1 | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ 1 + (1 | school)
   Data: Exam
   AIC   BIC logLik MLdeviance REMLdeviance
 11019 11031  -5507      11011        11015
Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept) 0.17160  0.41425
 Residual             0.84776  0.92074
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.01325    0.05405 -0.2452

random intercept, fixed predictor in individual level

For the next model, we add a predictor to the individual level. We do this, by replacing the ‘1’ of the previous model by the predictor (here: standLRT). An intercept is always assumed, so it is still estimated here. It only needs to be specified when no other predictors are specified. Since we don’t want the effect of the predictor to vary between groups, the specification of the random part of the model remains identical to the previous model. The same data is used, so we specify data=Exam again.

lmer(normexam ~ standLRT + (1 | school), data=Exam)

> lmer(normexam ~ standLRT + (1 | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT + (1 | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9375 9394  -4684       9357         9369
Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept) 0.093839 0.30633
 Residual             0.565865 0.75224
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) 0.002323   0.040354    0.06
standLRT    0.563307   0.012468   45.18

Correlation of Fixed Effects:
         (Intr)
standLRT 0.008

random intercept, random slope

The next model that will be specified, is a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, the effect of doing homework on the score on a math-test varies between schools. In order to estimate this model, the ‘1’ that indicates the intercept in the random part of the model specification is replaced by the variable of which we want the effect to vary between the groups.

lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method=”ML”)

> lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method="ML")
Linear mixed-effects model fit by maximum likelihood
Formula: normexam ~ standLRT + (standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9327 9358  -4658       9317         9328
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.090406 0.30068
          standLRT    0.014548 0.12062  0.497
 Residual             0.553656 0.74408
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.01151    0.03978  -0.289
standLRT     0.55673    0.01994  27.917

Correlation of Fixed Effects:
         (Intr)
standLRT 0.365

random intercept, individual and group level predictor

It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lmer-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school) should have a value. For this example, this implies that all respondents that attend the same school, have the same value on the variable “schavg”. We enter this variable to the model in the same way as individual level variables, leading to the following syntax:

lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam)

> lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT + schavg + (1 + standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9336 9374  -4662       9310         9324
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.077189 0.27783
          standLRT    0.015318 0.12377  0.373
 Residual             0.553604 0.74405
number of obs: 4059, groups: school, 65

Fixed effects:
             Estimate Std. Error t value
(Intercept) -0.001422   0.037253  -0.038
standLRT     0.552243   0.020352  27.135
schavg       0.294737   0.107262   2.748

Correlation of Fixed Effects:
         (Intr) stnLRT
standLRT  0.266
schavg    0.089 -0.085

random intercept, cross-level interaction

Finally, a cross-level interaction is specified. This basically works the same as any other interaction specified in R. In contrast with many other statistical packages, it is not necessary to calculate separate interaction variables (but you’re free to do so, of course).
In this example, the cross-level interaction between time spend on homework and size of the school can be specified by entering a model formula containing standLRT * schavg. This leads to the following syntax and output.

lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam)

> lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT * schavg + (1 + standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9334 9379  -4660       9303         9320
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.076326 0.27627
          standLRT    0.012240 0.11064  0.357
 Residual             0.553780 0.74416
number of obs: 4059, groups: school, 65

Fixed effects:
                Estimate Std. Error t value
(Intercept)     -0.00709    0.03713  -0.191
standLRT         0.55794    0.01915  29.134
schavg           0.37341    0.11094   3.366
standLRT:schavg  0.16182    0.05773   2.803

Correlation of Fixed Effects:
            (Intr) stnLRT schavg
standLRT     0.236
schavg       0.070 -0.064
stndLRT:sch -0.065  0.087  0.252

– – — — —– ——–