Rense Nieuwenhuis » mixed effects models

Influence.ME: Tools for Detecting Influential Data in Multilevel Regression Models

Rense Nieuwenhuis — Thu, 20 Dec 2012 14:40:11 +0000

Despite the increasing popularity of multilevel regression models, the development of diagnostic tools lagged behind. Typically, in the social sciences multilevel regression models are used to account for the nesting structure of the data, such as students in classes, migrants from origin-countries, and individuals in countries. The strength of multilevel models lies in analyzing data on a large number of groups with only a couple of observations within each group, such as for instance students in classes.

Nevertheless, in the social sciences multilevel models are often used to analyze data on a limited number of groups with per group a large number of observations. A typical example would be the analysis of data on individuals nested within countries. By nature, only a limited number of countries exists. In practice, typical country-comparative analyses are based on about 25 countries. With such a small number of groups (e.g. countries), observations on a single group can easily be overly influential to the outcomes. This means that the conclusions based on the multilevel regression model could no longer hold when a single group is removed from the data.

In our recent publication in the R Journal, we introduce influence.ME, software that provides tools for detecting influential data in multilevel regression models (or: in mixed effects models, as these are commonly referred to in statistics). influence.ME is a publically available R package that evaluates multilevel regression models that were estimated with the lme4.0 package. It calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested.

With this publication, and of course with the software that was available for quite some time, we hope to contribute to a better usage of multilevel regression models. The provided example and guidelines were geared towards applications in the social sciences, but are applicable in all disciplines.

On a final note, the editorial of the R Journal describes how this journal is quickly ranking up in the degree of (academic) recognition it receives:

Thomson Reuters has informed us that The R Journal has been accepted for listing in the Science Citation Index-Expanded (SCIE), including the Web of Science, and the ISI Alerting Service, starting with volume 1, issue 1 (May 2009). This complements the current listings by EBSCO and the Directory of Open Access Journals (DOAJ), and completes a process started by Peter Dalgaard in 2010.

More information on our influence.ME software is available on this website.

Download the paper from the R Journal
Rense Nieuwenhuis, Manfred te Grotenhuis, & Ben Pelzer (2012). Influence.ME: tools for detecting influential data in mixed effects models R Journal, 4 (2), 38-47

R-Sessions 32: Forward.lmer: Basic stepwise function for mixed effects in R

Rense Nieuwenhuis — Fri, 13 Feb 2009 10:59:03 +0000

Intended to be a customized solution, it may have grown to be a little more. forward.lmer is an early installment of a full stepwise function for mixed effects regression models in R-Project. I may put in some work to extend it, or I may not. Nevertheless, in a ‘forward sense of stepwise’, I think it can be pretty useful as it is. Also, it has an interesting take on the stepwise concept, I think.

Most stepwise functions (as far as I know) take a base model and a bunch of variables, and then iteratively adds and/or subtracts some variables, according to various criteria, to come to the best fitting regression model. All very interesting, but how to deal with interaction variables? And moreover: most existing functions do not work with mixed effects models ((I use the term ‘mixed effects model’ to describe this stepwise function to refer to what is often referred to as hierarchical or multilevel regression models, as well)).

Built around the lme4 package in R, forward.lmer provides a forward stepwise procedure to mixed effects models. Also, it allows the user not only to enter single variables to models, but also to do the same with blocks of variables. This opens up many options: users can add the complete interactions at once (i.e. both the original and the multiplicative terms), or add these consequetively. Future development will focus on additional selection criteria for interactions, such as the criterium that at least the multiplicative term needs to be statistically significant.

The user provides a starting model and a set of variables to evaluate. The procedure then updates the starting model with the addition of every single variable (or block of variables). The models are ordered based on their LogLikelihood (other criteria, i.e. BIC and AIC following soon), after which the best fitting model is evaluated against one of two criteria. The first criterium is that at least one of the added parameters is statistically significant. The other criterium is that the addition of the parameters together is statistically significant.

There are several parameters to be specified:

start.model: The starting model the procedure starts with. This can be a null-model, or a model already containing several variables. All lmer-models (i.e. logistic, poisson, linear) are supported.
blocks: a vector of variable names (as character strings) to be added to a model. Several variables can a concatenated within the same character string, so that these are added as a block of variables, instead of a single variables at once.
max.iter: The maximum number of variables that are evaluated. If max.iter is reached, the procedure stops without adding more variables.
sig.level: This is the p-value against which it is tested whether the new model fits better than a base model. Either sig.level or zt needs to be specified, but not both at once.
zt: This is either the T or Z value that is used to test whether (at least) one of the added variables is statistically significant. T values are used for linear regression, Z values for binary response models.
print.log: Should a log be printed? The log contains information on which variables (and on which criteria) were added in each step.

The forward.lmer function returns the best fitting model (according to the given criteria). Of course, one can use this resulting model as a starting model for a new stepwise procedure.

forward.lmer <- function( start.model, blocks, max.iter=1, sig.level=FALSE, zt=FALSE, print.log=TRUE) {


	# forward.lmer: a function for stepwise regression using lmer mixed effects models

	# Author: Rense Nieuwenhuis
	# Initialysing internal variables

	log.step <- 0

	log.LL <- log.p <- log.block <- zt.temp <- log.zt <- NA

	model.basis <- start.model
	# Maximum number of iterations cannot exceed number of blocks

	if (max.iter > length(blocks)) max.iter <- length(blocks)
	# Setting up the outer loop

	for(i in 1:max.iter)

		{
		models <- list()
		# Iteratively updating the model with addition of one block of variable(s)

		# Also: extracting the loglikelihood of each estimated model

		for(j in 1:length(blocks))

			{

			models[[j]] <- update(model.basis, as.formula(paste(". ~ . + ", blocks[j])))

			}
		LL <- unlist(lapply(models, logLik))
		# Ordering the models based on their loglikelihood.

		# Additional selection criteria apply

		for (j in order(LL, decreasing=TRUE))

			{
			##############

			############## Selection based on ANOVA-test

			##############
			if(sig.level != FALSE)

				{

				if(anova(model.basis, models[[j]])[2,7] < sig.level)

					{
					model.basis <- models[[j]]
					# Writing the logs

					log.step <- log.step + 1

					log.block[log.step] <- blocks[j]

					log.LL[log.step] <- as.numeric(logLik(model.basis))

					log.p[log.step] <- anova(model.basis, models[[j]])[2,7]
					blocks <- blocks[-j]
					break

					}

				}
			##############

			############## Selection based significance of added variable-block

			##############	
			if(zt != FALSE)

				{

				b.model <- summary(models[[j]])@coefs

				diff.par <- setdiff(rownames(b.model), rownames(summary(model.basis)@coefs))

				if (length(diff.par)==0) break

				sig.par <- FALSE
				for (k in 1:length(diff.par))

					{

					if(abs(b.model[which(rownames(b.model)==diff.par[k]),3]) > zt)

						{

						sig.par <- TRUE

						zt.temp <- b.model[which(rownames(b.model)==diff.par[k]),3]

						break

						}

					}					
				if(sig.par==TRUE)

					{

					model.basis <- models[[j]]
					# Writing the logs

					log.step <- log.step + 1

					log.block[log.step] <- blocks[j]

					log.LL[log.step] <- as.numeric(logLik(model.basis))

					log.zt[log.step] <- zt.temp

					blocks <- blocks[-j]
					break

					}

				}

			}

		}
	## Create and print log

	log.df <- data.frame(log.step=1:log.step, log.block, log.LL, log.p, log.zt)

	if(print.log == TRUE) print(log.df, digits=4)
	## Return the 'best' fitting model

	return(model.basis)

	}

As always, you're invited to use this function, or to adapt it and use that. However, it is required to make mention of this function and its author. Additionally, since I intend to continue working on this function (perhaps even evolve it to a 'package' on CRAN), I would love to hear about any experiences in using it.

R-Sessions 25: Book – Mixed Effects Models in S and S-PLUS (Pinheiro & Bates, 2000)

Rense Nieuwenhuis — Wed, 01 Oct 2008 10:00:51 +0000

Despite the reference to S and S-PLUS in the title of this book, it offers an excellent guide for the nlme-package in R-Project. Reason for this is the close resemblance between R and S. The nlme-package, available in R-Project for estimation of both linear and non-linear multilevel models, is written and maintained by the authors of this book.

The book is not an introduction to R. Basic knowledge of R-Project (or S / S-PLUS) is required to get the most out of it, as well as some knowledge on multilevel theory. Although the book forms a thorough introduction to multilevel modeling, addressing both some theory, the mathematics and of course the estimation and specification in R-Project (or S / S-PLUS), the learning curve it offers is quite steep. The authors are not shunned to apply matrix algebra and specify exactly the used estimation procedures.

Not only the specification of basic models is described, but many other subjects are brought up. A specific grouped-data object is considered, as well as ways to visualize hierarchical data and multilevel models. Heteroscedasticity, often a violation of assumptions, can be caught in the models easily, as is described clearly in one of the chapters. Finally, not only linear models are tackled, but non-linear models as well.

All in all, this book is an excellent addition for those who have prior knowledge of both R-Project and multilevel analysis. Using real-data examples and by providing tons of output, the authors accomplish to make clear the necessity of the more complex models and thereby invite the reader to invest time for the more fundamental aspects of multilevel analysis.

– – — — —– ——–

– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –