R-Sessions 19: Extractor Functions


Unlike most statistical software packages, R often stores the results of an analysis in an object. The advantage of this is that while not all output is shown in the screen ad once, it is neither necessary to estimate the statistical model again if different output is required.

This paragraph will show the kind of data that is stored in a multilevel model estimated by R-Project and introduce some functions that make use of this data.

Inside the model

Let’s first estimate a simple multilevel model, using the nlme-package. For this paragraph we will use a model we estimated earlier: the education model with a random intercept and a random slope. This time though, we will assign it to an object that we call model.01. It is estimated as follows:

require(nlme)
require(mlmRev)

model.01 <- lme(fixed = normexam ~ standLRT, data = Exam,
random = ~ standLRT | school)

Basically, this results in no output at all, although the activation of the packages creates a little output. Basic results can be obtained by simply calling the object:

model.01

> model.01
Linear mixed-effects model fit by REML
  Data: Exam 
  Log-restricted-likelihood: -4663.8
  Fixed: normexam ~ standLRT 
(Intercept)    standLRT 
-0.01164834  0.55653379 

Random effects:
 Formula: ~standLRT | school
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.3034980 (Intr)
standLRT    0.1223499 0.494 
Residual    0.7440699       

Number of Observations: 4059
Number of Groups: 65 

This gives a first impression of the estimated model. But, there is more. To obtain an idea of the elements that are actually stored inside the model, we use the names() functions, that gives us the names of all the elements of the model.

names(model.01)
model.01$method
model.01$logLik

The output below shows that our model.01 contains seventeen elements. For reasons of space, only some will be described. ‘Contrasts’ contains information on the way categorical variables were handled, ‘coefficients’ contains the model-parameters, in ‘call’ the model formula is stored and in ‘data’ even the original data is stored.

> names(model.01)
 [1] "modelStruct"  "dims"         "contrasts"    "coefficients"
 [5] "varFix"       "sigma"        "apVar"        "logLik"      
 [9] "numIter"      "groups"       "call"         "terms"       
[13] "method"       "fitted"       "residuals"    "fixDF"       
[17] "data"        
> model.01$method
[1] "REML"
> model.01$logLik
[1] -4663.8

In the syntax above two specific elements of the model were requested: the estimation method and the loglikelihood. This is done by sub-setting the model using the $-sign after which the desired element is placed. The output tells us that model.01 was estimated using Restricted Maximum Likelihood and that the loglikelihood is -4663.8 large.

Summary

All the information we could possibly want is stored inside the models, as we have seen. In order to receive synoptic results, many functions exist to extract some of the elements from the model and present them clearly. The most basic of these extractor-functions is probably summary():

summary(model.01)

> summary(model.01)
Linear mixed-effects model fit by REML
 Data: Exam 
     AIC     BIC  logLik
  9339.6 9377.45 -4663.8

Random effects:
 Formula: ~standLRT | school
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.3034980 (Intr)
standLRT    0.1223499 0.494 
Residual    0.7440699       

Fixed effects: normexam ~ standLRT 
                 Value  Std.Error   DF   t-value p-value
(Intercept) -0.0116483 0.04010986 3993 -0.290411  0.7715
standLRT     0.5565338 0.02011497 3993 27.667639  0.0000
 Correlation: 
         (Intr)
standLRT 0.365 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.8323045 -0.6316837  0.0339390  0.6834319  3.4562632 

Number of Observations: 4059
Number of Groups: 65 

Anova

The last extractor function that will be shown here is anova. This is a very general function that can be used for a great variety of models. When it is applied to a multilevel model, it results in a basic test for statistical significance of the model-parameters, as is showed below.

anova(model.01)

model.02 <- lme(fixed = normexam ~ standLRT, data = Exam,
random = ~ 1 | school)

anova(model.02,model.01)

In the syntax above an additional model is estimated that is very similar to our model.01, but does not have a random slope. It is stored in the object model.02. This is done to show that it is possible to test whether the random slope model fits better to the data than the fixed slope model. The output below shows that this is indeed the case.

> anova(model.01)
            numDF denDF  F-value p-value
(Intercept)     1  3993 124.3969  <.0001
standLRT        1  3993 765.4983  <.0001
> 
> model.02 <- lme(fixed = normexam ~ standLRT, data = Exam,
+ 	random = ~ 1 | school)
> 
> anova(model.02,model.01)
         Model df      AIC      BIC    logLik   Test  L.Ratio
model.02     1  4 9376.765 9401.998 -4684.383                
model.01     2  6 9339.600 9377.450 -4663.800 1 vs 2 41.16494
         p-value
model.02        
model.01  <.0001

- - -- --- ----- --------

- - -- --- ----- --------
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
-------- ----- --- -- - -

Leave a Reply