Rense Nieuwenhuis » Influence.ME

Update influence.ME, or why I love the open source community

Rense Nieuwenhuis — Wed, 17 Aug 2016 11:39:28 +0000

The other day, Kevin Darras contacted me about my R package influence.ME. The package didn’t work with the kind of models he wanted to estimate, and Kevin was looking for a solution. He had been able to go ‘under the hood’ of the program code in influence.ME and to program a solution, which he kindly shared with me. After some testing, and some adjustments, the influence.ME package is now updated and uploaded to CRAN, available for anyone to use. That’s well within a week after his first e-mail.

This is why I love the open source community so much. Not only can users extend the use of influence.ME, and all other R packages, to do things that the package authors/maintainers did not implement. Or to check procedures. Or fix mistakes. Moreover, in line with the positive attitude towards sharing in the open access community, the improved code was shared back so that other users can benefit.

So, thanks to the help of the community, I am happy to announce an update to influence.ME, with two improvements:

influence.ME now better handles binomial models
influence.ME now supports functions inside the model call;for instance:
model.a <- lmer(math ~ structure + scale(SES) + (1 | school.ID), data=school23)

influence.ME is an extension package for the R statistical software. It provides tools for detecting influential data in multilevel regression models (also known as mixed effects models). It was introduced in the R Journal (Nieuwenhuis, Te Grotenhuis & Pelzer, 2012). influence.ME can be downloaded from with the R software.

Nieuwenhuis, R., Grotenhuis, te, H. F., & Pelzer, B. J. (2012). Influence. ME: tools for detecting influential data in mixed effects models. R Journal, 4(2), 38–47.

Influence.ME now supports sampling weights

Rense Nieuwenhuis — Thu, 18 Dec 2014 13:34:57 +0000

Influence.ME is an R package that helps detecting influential cases in multilevel regression models. It has been around for a while now, and recent changes in lme4 broke the functionality of using influence.ME with sampling weights.

Thanks to a kind contribution of some code by user Jennifer Bufford, influence.ME now should work with multilevel models with sampling weights (and offsets). Version 0.9-5 is now available on CRAN servers around the world.

For more details on influence.ME, see: http://www.rensenieuwenhuis.nl/r-project/influenceme/

influence.ME now supports new lme4 1.0

Rense Nieuwenhuis — Wed, 21 Aug 2013 09:04:32 +0000

influence.ME is an R package for detecting influential data in multilevel regression models (or, mixed effects models as they are referred to in the R community). The application of multilevel models has become common practice, but the development of diagnostic tools has lagged behind. Hence, we developed influence.ME, which calculates standardized measures of influential data for the point estimates of generalized multilevel models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. A paper detailing this package was published in the R Journal (available from the R Journal (.PDF) and my researchgate.net profile).

influence.ME depends on lme4. As the authors of lme4 have completely revised the inner workings of lme4 and are currently releasing version 1.0, influence.ME required an update to maintain forward compatibility with lme4. I just uploaded version 0.9.3 of influence.ME to CRAN, which will be available soon. This version should work with the new lme4, but if you happen to run into any problems please contact me.

Influence.ME: Tools for Detecting Influential Data in Multilevel Regression Models

Rense Nieuwenhuis — Thu, 20 Dec 2012 14:40:11 +0000

Despite the increasing popularity of multilevel regression models, the development of diagnostic tools lagged behind. Typically, in the social sciences multilevel regression models are used to account for the nesting structure of the data, such as students in classes, migrants from origin-countries, and individuals in countries. The strength of multilevel models lies in analyzing data on a large number of groups with only a couple of observations within each group, such as for instance students in classes.

Nevertheless, in the social sciences multilevel models are often used to analyze data on a limited number of groups with per group a large number of observations. A typical example would be the analysis of data on individuals nested within countries. By nature, only a limited number of countries exists. In practice, typical country-comparative analyses are based on about 25 countries. With such a small number of groups (e.g. countries), observations on a single group can easily be overly influential to the outcomes. This means that the conclusions based on the multilevel regression model could no longer hold when a single group is removed from the data.

In our recent publication in the R Journal, we introduce influence.ME, software that provides tools for detecting influential data in multilevel regression models (or: in mixed effects models, as these are commonly referred to in statistics). influence.ME is a publically available R package that evaluates multilevel regression models that were estimated with the lme4.0 package. It calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested.

With this publication, and of course with the software that was available for quite some time, we hope to contribute to a better usage of multilevel regression models. The provided example and guidelines were geared towards applications in the social sciences, but are applicable in all disciplines.

On a final note, the editorial of the R Journal describes how this journal is quickly ranking up in the degree of (academic) recognition it receives:

Thomson Reuters has informed us that The R Journal has been accepted for listing in the Science Citation Index-Expanded (SCIE), including the Web of Science, and the ISI Alerting Service, starting with volume 1, issue 1 (May 2009). This complements the current listings by EBSCO and the Directory of Open Access Journals (DOAJ), and completes a process started by Peter Dalgaard in 2010.

More information on our influence.ME software is available on this website.

Download the paper from the R Journal
Rense Nieuwenhuis, Manfred te Grotenhuis, & Ben Pelzer (2012). Influence.ME: tools for detecting influential data in mixed effects models R Journal, 4 (2), 38-47

Influential Data in Multilevel Regression: What are your strategies?

Rense Nieuwenhuis — Tue, 13 Nov 2012 22:18:41 +0000

The application of multilevel regression models has become common practice in the field of social sciences. Multilevel regression models take into account that observations on individual respondents are nested within higher-level groups such as schools, classrooms, states, and countries.

In the application of multilevel models in country-comparative studies, however, it has long been overlooked that on the country-level only a limited number of observations are available. As a result, measurements on single countries can easily overly influence the regression outcomes.

Diagnostic tools for detecting influential data in multilevel regression are becoming available (including our own influence.ME), but what are your experiences with influential cases in country-comparative (multilevel) studies? How do you deal with influential cases if you encounter them?

influence.ME updated to version 0.9

Rense Nieuwenhuis — Fri, 13 Jul 2012 15:20:57 +0000

Influence.ME is an R extension package for R that provides tools for detecting influential data in multilevel regression models. It is developed by Rense Nieuwenhuis (that’s me), Manfred te Grotenhuis, and Ben Pelzer.

Recently, a new version (0.9) was uploaded to CRAN, and should be available now to all users. Several improvements and changes were made. Some of these changes may affect existing users. Therefore, we provide an overview of the improvements and changes to influence.ME:

To better align with R terminology, the estex() function was renamed to influence().
Several of the existing functions were rewritten, so that they are methods to R generic functions. ME.dfbetas() and ME.cook() were renamed to dfbetas() and cooks.distance() and all plotting functions were removed from the package and replaced by a plot() function.
The plot=TRUE parameter in the cooks.distance() and dfbetas() functions are is no longer available: plots should be called for using the plot() function.
In addition to these changes, influence.ME now provides several new features:
- The sigtest() function allows users to test whether the level of statistical significance of a parameter estimate is affected by the presence of influential data.
- Users can now also test wether lower-level observations affect the multilevel model outcomes (rather than only evaluating the influence of nested groups of cases).
- Plots on dfbetas / sigtest / pchange can now plot values values that exceed a cutoff value visually distinct.

As a result of some of these changes, users may need to modify their code (slightly). The new structure of the influence.ME package is much more in line with R standards. Remember that the version number (0.9) indicates that influence.ME is still in beta stages of development, and that changes may take place in its design. Please feel free to contact me with any questions, comments, and / or problems you may have regarding our software.

Influence.ME: Simple Analysis

Rense Nieuwenhuis — Thu, 16 Jul 2009 11:00:19 +0000

With the introduction of our new package for influential data influence.ME, I’m currently writing a manual for the package. This manual will address topics for both the experienced, and the inexperienced users.

I will also present much of the content of this manual on my blog. Of course, feel free to comment on it, and readers are encouraged to discuss the content of the manual here. All information will be accessible from the influence.ME website as well. Note that updates to the manual will be made available on that website”, instead of updating this blog post. So, please refer to the influence.ME website for the most up-to-date information.

This is the first section on influence.ME, which deals with a very simply analysis of students nested within 23 schools. Only the effect of a single variable measured at the school level is estimated.

A basic example analysis

The school23 data contains information on a math test performance of 519 students, who are nested within 23 schools. For this example, we will be interested in the relationship between class structure (in this data measured at the school level) and students’ performance on a math test. The research question is: To what extend does the classroom structure determine the students’ math test outcomes?

Initially, we will estimate the effect of class structure on the result of the math performance test, without any further covariates. We do take into account the nesting structure of the data, however, and allow the intercept to be random over schools. This model is estimated using the following syntax, and is assigned to an object we call ‘model’.

model <- lmer(math ~  structure + (1 | school.ID), data=school23)
summary(model)

The call for a summary of the model results in the output shown below. In this summary, the original model formula is shown, as well as the data on which this model was estimated. Both random and fixed effects are summarized. The amount of intercept variance associated with the nesting structure of students within schools is considerably large (23.8 compared with 81.2 + 23.8 = 104 in total). The effect of interest is that of the structure variable, which is -2.343 and statistically insignificant by most reasonable standards (t=-1.609).

Linear mixed model fit by REML 
Formula: math ~ structure + (1 | school.ID) 
   Data: school23 
  AIC  BIC logLik deviance REMLdev
 3802 3819  -1897     3798    3794
Random effects:
 Groups    Name        Variance Std.Dev.
 school.ID (Intercept) 23.884   4.8871  
 Residual              81.270   9.0150  
Number of obs: 519, groups: school.ID, 23
Fixed effects:
            Estimate Std. Error t value
(Intercept)   60.002      5.853  10.252
structure     -2.343      1.456  -1.609
Correlation of Fixed Effects:
          (Intr)
structure -0.982

Iteratively re-estimate model

Building upon the example model estimated in section 2.1, the first step in the procedure of the influence.ME package is to iteratively exclude the influence of the observations nested within each school separately. This is done using the estex() function. The name estex refers to the ESTimates that are returned while EXcluding the influence of each of the grouping levels separately. Thus, in the case of the math test example, in which students are nested in 23 schools, the estex procedure re-estimates the original model 23 times, excluding the influence of a higher level unit (ie school). The function returns the relevant estimates of these 23 re-estimations, which in Figure [fig:Three-steps] is referred to with 'altered estimates'.

The estex() function requires the specification of two parameters: a mixed effects model is to be specified, and the grouping factor of which the influence of the nested observations are to be evaluated. In the syntax example below, the original object 'model' is specified, and 'school.ID' is the relevant grouping factor. school.ID is the name of the variable used to indicate the grouping factor when the original model was specified. The estex() function works perfectly when more than a single grouping is present in the model, but only one grouping factor can be addressed at once.

In the example below, the estimates excluding the influence of the respective grouping levels, as returned by the estex() function, are assigned to an object, which in this case is called este.model (the name of this object, however, is to be chosen arbitrarily by the user).

estex.model <- estex(model, "school.ID")

Note that in the case of complex mixed models (i.e. models with large numbers of observations, complex nesting structures, and/or many nesting groups) the execution of estex() may consume considerable amounts of time. The examples offered by the school23 data, should offer no such problems, however.

Calculate measures of influence

The object estex.model containing the altered estimates can be used to calculate several measures of influential data. To determine the Cook's distance, the ME.cook() function is to be used. In its most basic specification, the ME.cook() function only requires an object to which the altered estimates as returned by the estex() function were assigned:

ME.cook(estex.model)

This basic specification returns a matrix with the rows representing the groups in which the observations are nested, and the single column represents the associated value of Cook's distance. Clearly, these can also be assigned to an object for later modification. The output below shows the result of the syntax above, representing the Cook's distance associated with each school in the school23 data.

              [,1]
6053  2.927552e-02
6327  2.557810e-02
6467  1.402948e-02
7194  3.443392e-05
7472  1.115626e+00
7474  8.142758e-02
7801  3.007558e-04
7829  1.005329e-01
7930  5.525680e-03
24371 4.334659e-03
24725 4.387907e-02
25456 5.644399e-04
25642 1.470130e-02
26537 2.369898e-02
46417 2.204840e-02
47583 1.891108e-02
54344 1.445087e-01
62821 3.593314e-01
68448 2.427028e-02
68493 1.538479e-02
72080 3.471805e-04
72292 6.387956e-03
72991 1.316049e-02

Based on the output shown above, the Cook's distance of school number 7472 is the largest. This corresponds very well to what was concluded based on Figure [fig:Bivariate-influence-plots]. For those who prefer to evaluate the Cook's distance based on a visual representation, the ME.cook() function can also plot its output. To do so, an additional parameter is required, stating plot=TRUE. Additional parameters are allowed as well, which are passed on to the internal dotplot() function (Deepayan Sarkar, 2008) and are used to format the resulting plot. In this case, the example syntax below also specifies the xlab= and ylab= parameters, labelling the two axes. The resulting plot is shown in the figure below. These kinds of plots can be used to more easily assert the influence a grouped set of observations exert on the outcomes of analyses, relative to the influence excerted by other groups of observations.

In this case, it (again) is clear that the observation of the level of class structure of school number 7472 excerts the highest influence. This is based on the calculated value for Cook's distance, as well as that this influence clearly exceeds that of other schools.

ME.cook(estex.model, plot=TRUE,
xlab="Cook's Distance, Class structure",
ylab="School")

Exclude influence, and Repeat

Based on the analyses and graphs shown in the previous sections, there are strong indications that the observations in school number 7472 excert too much influence on the outcomes of the analysis, and thereby unjustifiably determine the outcomes of these analyses. To definitively decide whether or not the influence of these observations indeed is too large, the value of Cook’s distance of this school can be compared with a cut-off value given. Regarding Cook’s distance, it has been argued that observations exceeding a Cook’s distance of are too influential Belsley et al. (1980), and need to be dealt with. In this formula, ‘p’ refers to the number of predictors on which Cook’s distance was calculated. In the case of mixed effects models, this refers to the number of groups in which the observations are nested.

The Cook’s distance of school number 7472 was determined to be 1.31, which readily exceeds the cut-off value of = .17. Thus, is can be concluded that the influence school number 7472 needs to be excluded form the analysis, before the results of that analyses are interpreted. This is done using the function exclude.influence(). This function basically has three parameters: first, the model from which the influence of some observations is to be excluded needs to be specified, together with the grouping factor and the specific level of that grouping factor in which the said observations are nested. The function modifies the original model and returns a new model, which can be checked again for possible influential data.

In the example below, the influence of school number 7472 is excluded from the orginal regression model, which was assigned to object ‘model’ in section 2.1.

The result of the exclude.influence() function again has the form of a mixed effects model and is here assigned to object model.2 (again, this name is to chosen by the user).

model.2 <- exclude.influence(model, "school.ID", "7472")
summary(model.2)

Functions that work with ‘normal’ mixed effects models estimated with lme4, also work with models that were modified with the exclude.influence() function. So, also a summary of model.2 was requested, which is shown below. A few things are clear from this output. The estimate of the effect of class structure is now much stronger (-4.55) and statistically significant (t=2.95). This corresponds to what may have been expected based on the graphical representation of the data in Figure [fig:Bivariate-influence-plots]. Some other changes have been made to the model as well. The original intercept vector (which originally was indicated by (Intercept)) is now replaced by a variable called intercept.alt. This variable is basically an ordinary intercept vector (thus, with a value of 1 for each observation), except for the observations that are nested in the excluded nesting group. For these observations, the intercept.alt variable has score 0. Also, a new variable called estex.7472 is shown. This variable is a dummy variable, indicating the observations that are nested in school number 7472. One such dummy variable is added to the model for each nesting group the influence of which is excluded. Generally, these modifications of the model ensure that the observations nested within the excluded nesting group do not contribute to the estimation of both the level and the variance of the intercept, and do not alter the higher level estimates unjustifiably.

Linear mixed model fit by REML 
Formula: math ~ intercept.alt + estex.7472 + structure + 
(0 + intercept.alt | school.ID) - 1 
   Data: ..2 
  AIC  BIC logLik deviance REMLdev
 3792 3814  -1891     3790    3782
Random effects:
 Groups    Name          Variance Std.Dev.
 school.ID intercept.alt 17.874   4.2277  
 Residual                81.301   9.0167  
Number of obs: 519, groups: school.ID, 23

Fixed effects:
              Estimate Std. Error t value
intercept.alt   69.346      6.314  10.983
estex.7472      54.839      3.617  15.163
structure       -4.550      1.545  -2.945

Correlation of Fixed Effects:
           intrc. e.7472
estex.7472  0.843       
structure  -0.987 -0.854

As is shown in the procedural schematic in Figure [fig:Three-steps], it is advisable to repeat this procedure to the point that the user is satisfied with the stability of the model, for instance when no group of observations exceeds the cut-off value. To do this in this example, the model.2 object is again input to the estex() function, the results of which are stored in a second altered estimates object which we call estex.model.2:

estex.model.2 <- estex(model.2, "school.ID")
ME.cook(estex.model.2, plot=TRUE, 
    xlab="Cook's Distance, Class structure",
    ylab="School", 
    cutoff=.18)

Again, ME.cook() is used to calculate the values for Cook's distance, which returns the output shown below. School number 62821 is associated with the largest value for Cook's distance (.39). The cut-off value now differs (slightly) from the previous one, for the number of (effective) groups in which the observations are nested is decreased by 1, for the influence of school number 7472 was excluded. Thus, the cut-off value now is . Based on the output below, it can thus be concluded that school number 62821 is influential as well.

Finally, the call for ME.cook() in the syntax example above shows one more distinguishing characteristic. Again plot=TRUE is specified, together with specifications for labels on both the x and y axes. A plot of the Cook's distances is thus created, shown in Figure [fig:Cook-2]. In addition to this, the cut-off value of .18 is now indicated as well using cutoff=.18. As a result of this, all Cook's distances with a value larger than .18 will be indicated differently in the plot, as is the case in Figure [fig:Cook-2] regarding the two schools numbered 62821 and 7474. Note that the Cook's distance for school number 7472 now equals 0, indeed, indicating that this school now no longer influences the parameter estimates.

              [,1]
6053  2.186203e-03
6327  2.645659e-02
6467  1.326879e-02
7194  1.319258e-02
7472  0.000000e+00
7474  2.273674e-01
7801  1.378937e-03
7829  7.780663e-02
7930  4.728342e-03
24371 8.621802e-03
24725 7.072999e-02
25456 1.985731e-03
25642 2.487072e-02
26537 1.900817e-03
46417 2.409483e-02
47583 7.919332e-02
54344 1.248145e-01
62821 3.706191e-01
68448 1.752182e-01
68493 2.607158e-02
72080 2.669324e-05
72292 1.193296e-02
72991 1.311974e-02

Further analysis of this example would thus entail the exclusion of the influence of observations nested within school number 62821, and then to recheck the model by running through the three steps of the procedure again. This is not shown here, to not make this exercise overly lengthy.

Presenting influence.ME at useR!

Rense Nieuwenhuis — Fri, 10 Jul 2009 09:49:33 +0000

Today I presented influence.ME at the useR! conference in Rennes. Influence.ME is an R package for detecting influential data in mixed models. I developed this package together with Ben Pelzer and Manfred te Grotenhuis.

More information about influence.ME can be found on another section of my website.

Below, please find the slides of the presentation.
Presentation Influence.ME at Rennes, useR! 2009

Influence.ME: don’t specify the intercept

Rense Nieuwenhuis — Thu, 18 Jun 2009 11:00:00 +0000

Just recently, I was contacted by a researcher who wanted to use influence.ME to obtain model estimates from which iteratively some data was deleted. In his case, observations were nested within an area, but there were very unequal numbers of observations in each area.

Unfortunately, he wasn’t able to use the influence.ME package on his models. He kindly sent me his data, so I could figure out what went wrong, and it showed to be a little problem with influence.ME.

The problem was with how the model was specified: the intercept was explicated, next to several (fixed) variables. It turned out, that such a model specification is not compatible with the internal changes made to the mixed model. Therefore, I advise users of influence.ME not to explicitly specify the intercept in their lme4 regression models.

I reproduced the problem with the school23 data, which is available in influence.ME. Compare the two model specifications below: in the first the intercept is specified, in the second it isn’t. The outcomes of both lmer models are identical. However, the first returns a convergence error when used with the estex() function, while the second doesn’t.

The input:
mod <- lmer(math ~ 1 + structure + (1 | school.ID), data=school23) estex.mod <- estex(mod, "school.ID")

mod <- lmer(math ~ structure + (1 | school.ID), data=school23) estex.mod <- estex(mod, "school.ID")

The output:
> mod <- lmer(math ~ 1 + structure + (1 | school.ID), data=school23) > estex.mod <- estex(mod, "school.ID") Error in mer_finalize(ans) : Downdated X'X is not positive definite, 3. > > mod <- lmer(math ~ structure + (1 | school.ID), data=school23) > estex.mod <- estex(mod, "school.ID")

I will surely investigate whether this can be resolved in a future update, but for now, simply leave the intercept out of your model specification: lmer will add it for you.

Influence.ME is an R package and provides tools for detecting influential data in mixed effects models. More information can be found here.

One outlier and you’re out: Influential data and racial prejudice

Rense Nieuwenhuis — Tue, 16 Jun 2009 11:00:54 +0000

Currently preparing a presentation on analyzing influential data in mixed effects models myself, my eye fell on an article in which important claims on racial prejudice were refuted. An important aspect of the criticism on existing work, is that in one article the main correlation was completely due to a single observation. Solely based on this single observation, the study’s outcomes showed the Implicit Association Test (IAT) to predict overall interaction quality between White or Black people. Removing that single observation (out of 41) from the data removed the complete effect.

With survey research showing declines in “American’s endorsement of prejudice sentiments” (p.568), the question rose whether such declines actually took place, or that they are an artifact of social desirability determining respondents’ responses to survey questions. Naturally, tests like the Implicit Association Test (IAT) gained considerable attention, for the attractive claim of such tests is to be able to show levels of prejudice that people themselves are unaware of and which do not show when asked about explicitly (e.g. in a survey).

Blanton et al. (2009) decided to test several of the articles on which the strong claims for the predictive validity of the IAT were based. They re-analyzed the (partial) data of two articles. In one of these analyses, on which 2001 article was based, it was found that one of the main findings was that high scores on the IAT were associated with worse interaction quality with Black experimenters, compared with White experimenters. I’m not completely sure what this interaction quality entails, but based on the re-study I would say that it is a combination of aspects such as ‘forward leaning’, ‘facing the experimenter’, ‘expressiveness’, ‘smiling’ and making ‘eye contact’.

How can just a single observation dominate the outcomes of a statistical analysis? Unfortunately, the answer to this question is: quite easily, especially when the analysis is based on a small number of observations. In this case, the refuted correlation was between the participants’ IAT score and the way the participants’ interacted with either Black of White experimenters. While it is known that the IAT score is determined by the participant’s age, one single participant had an exceptionally high age compared to the overall test group, and indeed that participant score very high on the IAT. Also, the quality of the interaction of that participant with the Black experimenter was rated very low. Now the thing is, that in the rest of the observations no association between IAT score and the quality of the interaction was to be observed, this single observation with extreme scores on both variables, completely dominated the outcomes of the study on this aspect. Deletion of this single observation brought the significant correlation of .32 down to non-significance: no association could be inferred between participants’ IAT score and how they interacted with the Black experimenter.

Blanton et al. didn’t write a full article based on just this point, and of course this is not the only criticism on the original article. Other aspects include low timing of the measures (respondents were probably aware of being tested for discriminatory behavior, before the ‘actual’ test took place), inter-rater reliability, improper statistical analysis due to recoding of the data (due to which the coding of a single rater influenced the findings). Nevertheless, from my current point of view, I’m especially interested in the bias caused by the influential observation.

Of course, McConnell et al. (2009), whose work was criticized, were given the opportunity to respond. Regarding the influence exerted by the outlier, they respond with two arguments. First they state that Blanton et al. did not study the correct outlier, for although this outlier did have an extreme IAT score, another participant did have an even higher score. Their second reaction states that Blanton et al. did focus on only one of the outcome measures, and not on all the various measure used in the original study. In their response, they show that deletion of the outlier found by Blanton et al. does not influence the outcomes of the analyses on the other outcome measures.

I find this response curious on two accounts. First, influential data and outliers are not two of a kind. McConnell’s response that the wrong outlier was selected is not necessarily true, for just having an extreme score on one variable is not enough to make an observation influential. Generally, it has to be an outlier, and to have leverage (changing the slope of the regression line). If the other outlier (mentioned by McConnell) did have a more extreme score on the IAT variable, but an average score on the behavior-quality variable, it may very well prove not to (overly) influence the outcomes of the study. Secondly, an observation is only influential relative to the specification of the analysis and the variables used in it. So, simple deletion of this single observation to show that it does not influence other analyses (on other outcome measures), it not much of a defense to the initial argument that the observation influenced the outcomes of a specific analysis.

All in all, an interesting debate. There is much more to it in both the articles by Blanton et al. (2009) and McConnell et al. (2009). But still, I find it especially striking to see how careful one should be when analyzing data and making inferences on it. And, of course, I can add a nice example of the impact of influential data to my collection.

References

Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94 (3), 567-582 DOI: 10.1037/a0014665

McConnell, A., & Leibold, J. (2009). Weak criticisms and selective evidence: Reply to Blanton et al. (2009). Journal of Applied Psychology, 94 (3), 583-589 DOI: 10.1037/a0014649