R-Sessions 16: Multilevel Model Specification (lme4)

Multilevel models, or mixed effects models, can easily be estimated in R. Several packages are available. Here, the lmer() function from the lme4-package is described. The specification of several types of models will be shown, using a fictive example. A detailed description of the specification rules is given. Output of the specified models is given, but not described or interpreted.
Please note that this description is very closely related to the description of the specification of the lme() function of the nlme-package. The results are similar and here exactly the same possibilities are offered.

In this example, the dependent variable is the standardized result of a student on a specific exam. This variable is called “normexam”. In estimating the score on the exam, two levels will be discerned: student and school. On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (“standLRT”). On the school-level, we take into account the average intake-score (“schavg”).

Preparation

Before analyses can be performed, preparation needs to take place. Using the library() command, two packages are loaded. The lme4-package contains functions for estimation of multilevel or hierarchical regression models. The mlmRev-package contains, amongst many other things, the data we are going to use here. In the output below, we see that R-Project automatically loads the Matrix- and the lattice-packages as well. These are needed for the lme4-package to work properly.
Finally, the names() command is used to examine which variables are contained in the ‘Exam’ data.frame.

library(lme4)
library(mlmRev)
names(Exam)

>library(lme4)
Loading required package: lme4
Loading required package: Matrix
Loading required package: lattice
[1] TRUE
>library(mlmRev)
Loading required package: mlmRev
[1] TRUE
>names(Exam)
 [1] "school"   "normexam" "schgend"  "schavg"   "vr"       "intake"
 [7] "standLRT" "sex"      "type"     "student"

null-model

The syntax below specifies the most simple multilevel regression model of all: the null-model. Only the levels are defined. Using the lmer-function, the first level (here: students) do not have to be specified. It is assumed that the dependent variable (here: normexam) is on the first level (which it should be).

The model is specified using standard R formulas: First the dependent variable is given, followed by a tilde ( ~ ). The ~ should be read as: “follows”, or: “is defined by”. Next, the predictors are defined. In this case, only the intercept is defined by entering a ’1′. Next, the random elements are specified between brackets ( ). Inside these brackets we specify the random predictors, followed by a vertical stripe ( | ), after which the group-level is specified.

After the model specification, several parameters can be given to the model. Here, we specify the data that should be used by data=Exam. Another often used parameter indicates the estimation method. If left unspecified, restricted maximum likelihood (REML) is used. Another option would be: method=”ML”, which calls for full maximum likelihood estimation. All this leads to the following model specification:

lmer(normexam ~ 1 + (1 | school), data=Exam)

This leads to the following output:

> lmer(normexam ~ 1 + (1 | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ 1 + (1 | school)
   Data: Exam
   AIC   BIC logLik MLdeviance REMLdeviance
 11019 11031  -5507      11011        11015
Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept) 0.17160  0.41425
 Residual             0.84776  0.92074
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.01325    0.05405 -0.2452

random intercept, fixed predictor in individual level

For the next model, we add a predictor to the individual level. We do this, by replacing the ’1′ of the previous model by the predictor (here: standLRT). An intercept is always assumed, so it is still estimated here. It only needs to be specified when no other predictors are specified. Since we don’t want the effect of the predictor to vary between groups, the specification of the random part of the model remains identical to the previous model. The same data is used, so we specify data=Exam again.

lmer(normexam ~ standLRT + (1 | school), data=Exam)

> lmer(normexam ~ standLRT + (1 | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT + (1 | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9375 9394  -4684       9357         9369
Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept) 0.093839 0.30633
 Residual             0.565865 0.75224
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) 0.002323   0.040354    0.06
standLRT    0.563307   0.012468   45.18

Correlation of Fixed Effects:
         (Intr)
standLRT 0.008

random intercept, random slope

The next model that will be specified, is a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, the effect of doing homework on the score on a math-test varies between schools. In order to estimate this model, the ’1′ that indicates the intercept in the random part of the model specification is replaced by the variable of which we want the effect to vary between the groups.

lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method=”ML”)

> lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method="ML")
Linear mixed-effects model fit by maximum likelihood
Formula: normexam ~ standLRT + (standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9327 9358  -4658       9317         9328
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.090406 0.30068
          standLRT    0.014548 0.12062  0.497
 Residual             0.553656 0.74408
number of obs: 4059, groups: school, 65

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.01151    0.03978  -0.289
standLRT     0.55673    0.01994  27.917

Correlation of Fixed Effects:
         (Intr)
standLRT 0.365

random intercept, individual and group level predictor

It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lmer-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school) should have a value. For this example, this implies that all respondents that attend the same school, have the same value on the variable “schavg”. We enter this variable to the model in the same way as individual level variables, leading to the following syntax:

lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam)

> lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT + schavg + (1 + standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9336 9374  -4662       9310         9324
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.077189 0.27783
          standLRT    0.015318 0.12377  0.373
 Residual             0.553604 0.74405
number of obs: 4059, groups: school, 65

Fixed effects:
             Estimate Std. Error t value
(Intercept) -0.001422   0.037253  -0.038
standLRT     0.552243   0.020352  27.135
schavg       0.294737   0.107262   2.748

Correlation of Fixed Effects:
         (Intr) stnLRT
standLRT  0.266
schavg    0.089 -0.085

random intercept, cross-level interaction

Finally, a cross-level interaction is specified. This basically works the same as any other interaction specified in R. In contrast with many other statistical packages, it is not necessary to calculate separate interaction variables (but you’re free to do so, of course).
In this example, the cross-level interaction between time spend on homework and size of the school can be specified by entering a model formula containing standLRT * schavg. This leads to the following syntax and output.

lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam)

> lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam)
Linear mixed-effects model fit by REML
Formula: normexam ~ standLRT * schavg + (1 + standLRT | school)
   Data: Exam
  AIC  BIC logLik MLdeviance REMLdeviance
 9334 9379  -4660       9303         9320
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 school   (Intercept) 0.076326 0.27627
          standLRT    0.012240 0.11064  0.357
 Residual             0.553780 0.74416
number of obs: 4059, groups: school, 65

Fixed effects:
                Estimate Std. Error t value
(Intercept)     -0.00709    0.03713  -0.191
standLRT         0.55794    0.01915  29.134
schavg           0.37341    0.11094   3.366
standLRT:schavg  0.16182    0.05773   2.803

Correlation of Fixed Effects:
            (Intr) stnLRT schavg
standLRT     0.236
schavg       0.070 -0.064
stndLRT:sch -0.065  0.087  0.252

- – — — —– ——–

- – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – -

Latest Comments

  1. Kofi Ampaabeng says:

    I found theses sessions to be extremely helpful. I’m surprised there are no comments here. Just to encourage you to keep providing these sessions, I’m writing to say thanks.
    I’m currently studying for my PhD in Economics and decided to switch from Stata to R.

    1. David Yeager says:

      I agree — this was really helpful. I hadn’t found a clear answer to this question anywhere else. Thanks for the post.

      1. Joe Powers says:

        Second that. This is the clearest answer I’ve found in 2 years of trying to wrap my head around this. Thanks.

  2. CJ says:

    Thanks for posting this – I’m struggling my way through the final few months of an ecological PhD with a lot of mixed modelling involved. It really helps to see the examples and explanations for specifying different models.

  3. CMZ says:

    This is one of the best explanations I have seen so far. However, I am having a hard time finding examples of code when there are 2 levels. In my case, units (counts so I using glmer) within households and then households within communities.

    1. d l rogers says:

      Two years later… me too! I need to work with several 2- or 3-level models, and can’t seem to find any instructions for how to specify those in lme4 or nlme. :(

  4. robie says:

    great – very clear and good to include the crossed design. be good to see SPSS syntax equivalents and often people start with spss and move onto R

Leave a Reply