<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rense Nieuwenhuis &#187; R-Sessions</title>
	<atom:link href="http://www.rensenieuwenhuis.nl/category/r-project/r-sessions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rensenieuwenhuis.nl</link>
	<description>&#34;The extra-ordinary lies within the curve of normality&#34;</description>
	<lastBuildDate>Thu, 12 Mar 2026 14:58:15 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.2.2</generator>
	<item>
		<title>influence.ME now supports new lme4 1.0</title>
		<link>http://www.rensenieuwenhuis.nl/influence-me-now-works-with-new-lme4-1-0/</link>
		<comments>http://www.rensenieuwenhuis.nl/influence-me-now-works-with-new-lme4-1-0/#comments</comments>
		<pubDate>Wed, 21 Aug 2013 09:04:32 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[Influence.ME]]></category>
		<category><![CDATA[My Publications]]></category>
		<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[influential data]]></category>
		<category><![CDATA[lme4]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1677</guid>
		<description><![CDATA[influence.ME is an R package for detecting influential data in multilevel regression models (or, mixed effects models as they are referred to in the R community). The application of multilevel models has become common practice, ...]]></description>
				<content:encoded><![CDATA[<p>influence.ME is an R package for detecting influential data in multilevel regression models (or, mixed effects models as they are referred to in the R community). The application of multilevel models has become common practice, but the development of diagnostic tools has lagged behind. Hence, we developed influence.ME, which calculates standardized measures of influential data for the point estimates of generalized multilevel models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. A paper detailing this package was published in the R Journal (available from the <a href="http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Nieuwenhuis~et~al.pdf">R Journal (.PDF)</a> <a href="https://www.researchgate.net/publication/232701348_Influence.ME_tools_for_detecting_influential_data_in_mixed_effects_models">and my researchgate.net profile</a>).</p>
<p>influence.ME depends on lme4. As the authors of lme4 have completely revised the inner workings of lme4 and are currently releasing version 1.0, influence.ME required an update to maintain forward compatibility with lme4. I just uploaded version 0.9.3 of influence.ME to CRAN, which will be available soon. This version should work with the new lme4, but if you happen to run into any problems please contact me. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/influence-me-now-works-with-new-lme4-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Influence.ME: Tools for Detecting Influential Data in Multilevel Regression Models</title>
		<link>http://www.rensenieuwenhuis.nl/influence-me-r-journal/</link>
		<comments>http://www.rensenieuwenhuis.nl/influence-me-r-journal/#comments</comments>
		<pubDate>Thu, 20 Dec 2012 14:40:11 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[Blogging about Science]]></category>
		<category><![CDATA[Influence.ME]]></category>
		<category><![CDATA[My Publications]]></category>
		<category><![CDATA[Peer Reviewed]]></category>
		<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[comparative research]]></category>
		<category><![CDATA[influential data]]></category>
		<category><![CDATA[mixed effects models]]></category>
		<category><![CDATA[multilevel regression]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1601</guid>
		<description><![CDATA[Despite the increasing popularity of multilevel regression models, the development of diagnostic tools lagged behind. Typically, in the social sciences multilevel regression models are used to account for the nesting structure of the data, such ...]]></description>
				<content:encoded><![CDATA[<p>Despite the increasing popularity of multilevel regression models, the development of diagnostic tools lagged behind. Typically, in the social sciences multilevel regression models are used to account for the nesting structure of the data, such as students in classes, migrants from origin-countries, and individuals in countries. The strength of multilevel models lies in analyzing data on a large number of groups with only a couple of observations within each group, such as for instance students in classes.</p>
<p>Nevertheless, in the social sciences multilevel models are often used to analyze data on a limited number of groups with per group a large number of observations. A typical example would be the analysis of data on individuals nested within countries. By nature, only a limited number of countries exists. In practice, typical country-comparative analyses are based on about 25 countries. With such a small number of groups (e.g. countries), observations on a single group can easily be overly influential to the outcomes. This means that the conclusions based on the multilevel regression model could no longer hold when a single group is removed from the data. </p>
<p>In our recent publication in the R Journal, we introduce influence.ME, software that provides tools for detecting influential data in multilevel regression models (or: in mixed effects models, as these are commonly referred to in statistics). influence.ME is a publically available R package that evaluates multilevel regression models that were estimated with the lme4.0 package. It calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook’s distance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while accounting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested. </p>
<p>With this publication, and of course with the software that was available for quite some time, we hope to contribute to a better usage of multilevel regression models. The provided example and guidelines were geared towards applications in the social sciences, but are applicable in all disciplines. </p>
<p>On a final note, the editorial of the R Journal describes how this journal is quickly ranking up in the degree of (academic) recognition it receives:</p>
<blockquote><p>
Thomson Reuters has informed us that The R Journal has been accepted for listing in the Science Citation Index-Expanded (SCIE), including the Web of Science, and the ISI Alerting Service, starting with volume 1, issue 1 (May 2009). This complements the current listings by EBSCO and the Directory of Open Access Journals (DOAJ), and completes a process started by Peter Dalgaard in 2010.
</p></blockquote>
<p><a href="http://www.rensenieuwenhuis.nl/r-project/influenceme/">More information on our influence.ME software is available on this website. </a></p>
<p><a href="http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Nieuwenhuis~et~al.pdf">Download the paper from the R Journal</a><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&#038;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&#038;rft.jtitle=R+Journal&#038;rft_id=info%3A%2F&#038;rfr_id=info%3Asid%2Fresearchblogging.org&#038;rft.atitle=Influence.ME%3A+tools+for+detecting+influential+data+in+mixed+effects+models&#038;rft.issn=2073-4859&#038;rft.date=2012&#038;rft.volume=4&#038;rft.issue=2&#038;rft.spage=38&#038;rft.epage=47&#038;rft.artnum=http%3A%2F%2Fjournal.r-project.org%2Farchive%2F2012-2%2FRJournal_2012-2_Nieuwenhuis%7Eet%7Eal.pdf&#038;rft.au=Rense+Nieuwenhuis&#038;rft.au=Manfred+te+Grotenhuis&#038;rft.au=Ben+Pelzer&#038;rfe_dat=bpr3.included=1;bpr3.tags=Social+Science%2CSociology%2C+statistics%2C+multilevel+regression%2C+mixed+effects+models%2C+influential+data">Rense Nieuwenhuis, Manfred te Grotenhuis, &#038; Ben Pelzer (2012). Influence.ME: tools for detecting influential data in mixed effects models <span style="font-style: italic;">R Journal, 4</span> (2), 38-47</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/influence-me-r-journal/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Applied R: Manual for the quantitative social scientist</title>
		<link>http://www.rensenieuwenhuis.nl/applied-r-manualfor-the-quantitative-social-scientist/</link>
		<comments>http://www.rensenieuwenhuis.nl/applied-r-manualfor-the-quantitative-social-scientist/#comments</comments>
		<pubDate>Wed, 23 Mar 2011 10:50:31 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[manual]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1425</guid>
		<description><![CDATA[Applied R for the quantitative social scientist is a manual on R written specifically as an introduction for the quantitative social scientist. To my opinion, R-Project is a magnificent statistical program, ready to be accepted and implemented in the social sciences. The flexibility of this program and the way data are handled gives the user a sense of closeness to and control over the data. I think this inspires users to analyze their data more creatively and sometimes in a more advanced way.]]></description>
				<content:encoded><![CDATA[<p>R-Project is an advanced software package for statistical analysis. Several years ago, already, I wrote an introductory manual for several analyses that can be performed with R. Although several parts of this are available from my blog as the <a href="http://www.rensenieuwenhuis.nl/index-of-the-r-sessions/">R-Sessions</a>, I never publicly published the full document. Now, this changes: for those looking for an applied guide to R-Project, <a href="http://www.rensenieuwenhuis.nl/documents/Applied%20R.pdf">here it is!</a></p>
<p>This manual was written specifically as an introduction for the quantitative social scientist. To my opinion, R-Project is a magnificent statistical program, ready to be accepted and implemented in the social sciences. The flexibility of this program and the way data are handled gives the user a sense of closeness to and control over the data. I think this inspires users to analyze their data more creatively and sometimes in a more advanced way. At present, this manual has a strong focus on multilevel regression techniques. Reason for this is that in R-Project it is very easy to estimate these types of models, even the more complex variants. The more basic and fundamental aspects of R-Project are introduced as well. All this is done with the needs of the quantitative social scientist in mind.</p>
<p>Of course, this manual it provided without any warranty. Please realize that I wrote it almost four years ago. </p>
<p>I&#8217;d love to hear any feedback for (future) improvements!</p>
<h2>Download:</h2>
<p> <a href="http://www.rensenieuwenhuis.nl/documents/Applied%20R.pdf"> Applied R for the quantitative social scientist</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/applied-r-manualfor-the-quantitative-social-scientist/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Index of the R-Sessions</title>
		<link>http://www.rensenieuwenhuis.nl/index-of-the-r-sessions/</link>
		<comments>http://www.rensenieuwenhuis.nl/index-of-the-r-sessions/#comments</comments>
		<pubDate>Mon, 17 May 2010 10:00:19 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1209</guid>
		<description><![CDATA[The R-Sessions are a series of blog entries on using R. A large part consists of an R-manual I once wrote. Other posts include some tricks I found out, as well as entries detailing functions ...]]></description>
				<content:encoded><![CDATA[<p>The R-Sessions are a series of blog entries on using R. A large part consists of an R-manual I once wrote. Other posts include some tricks I found out, as well as entries detailing functions and packages I wrote for R. The series already entails over forty posts, so I decided to create an index. It is found below. On a fixed page on this website (<a href="http://www.rensenieuwenhuis.nl/r-project/r-sessions-index/">www.rensenieuwenhuis.nl/r-project/r-sessions-index/</a>) I will continue to update this index with new editions of the R-Sessions.</p>
<p><a href="http://www.rensenieuwenhuis.nl/applied-r-manualfor-the-quantitative-social-scientist/">A .PDF manual containing many of the R-Sessions material is available here.</a></p>
<p><span id="more-1209"></span></p>
<h2>Introducing R</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-introducing-the-r-sessions/">Introducing the R-Sessions</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-01-what-is-r/">What is R?</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-02-why-r-project/">Why R-Project?</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-03-getting-r-project/">Getting R-Project</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-04/">Getting Packages</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-05-getting-help/">Getting Help</a></li>
</ul>
<h2>Data Manipultion</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-06-most-basic-of-all/">Most Basic of All</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-07-data-structure/">Data Structure</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-08-getting-data-into-r/">Getting Data into R</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-09-data-manipulation/">Data Manipulation</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-10-conditionals/">Conditionals</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-11-tables/">Tables</a></li>
</ul>
<h2>Graphics</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-12-basic-graphics/">Basic Graphics</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-13-overlapping-data-points/">Overlapping Data Points</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-14-multiple-graphs/">Multiple Graphs</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-15-intermediate-graphics/">Intermediate Graphics</a></li>
</ul>
<h2>Mixed Models</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-16-multilevel-model-specification-lme4/">Multilevel Model Specification: LME4</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-17-generalized-multilevel-lme4/">Generalized Multilevel: LME4</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-18-helper-functions/">Helper Functions</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-19-extractor-functions/">Extractor Functions: NLME</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-21-multilevel-model-specification-nlme/">Multilevel Model Specification: NLME</a></li>
</ul>
<h2>Influence.ME: Tools for detecting influential cases in mixed models</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/introducing-influenceme/">Introducing Influence.ME</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-project/influenceme/overview/">Influence.ME Overview</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-project/influenceme/influence-me-manual/">Influence.ME Manual</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/influence-me-simple-analysis/">Influence.ME: Simple Analysis</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/influence-me-dont-specify-the-intercept/">Specification of the intercept</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-project/influenceme/change-log/">Influence.ME: Change-log</a></li>
</ul>
<h2>Some small functions I wrote</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-32/">Forward.lmer: Basic stepwise function for mixed effects in R</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-31-combining-lmer-output-in-a-single-table/">Combining Output in a Single Table</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/">Select Nested Observations with Equal Number of Occurences</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-30-visualizing-missing-values/">Visualizing Missing Values</a></li>
</ul>
<h2>Books</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-25-book-mixed-effects-models-in-s-and-s-plus-pinheiro-bates-2000/">Book: Mixed Effects Models in S and S-Plus (Pinheiro &#038; Bates, 2000)</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-24/">Book: An R and S-PLUS Companion to Applied Regression (Fox, 2002)</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-23-book-data-analysis-using-regression-and-multilevelhierarchical-models-gelman-hill-2007/">Book: Data Analysis using Regression and Multilevel / Hierarchical Models (Gelman &#038; Hill, 2007)</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-22-introductory-statistics-with-r-peter-dalgaard-2002/">Book: Introductory Statistics with R (Dalgaard, 2002)</a></li>
</ul>
<h2>Various</h2>
<ul>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-29-running-r-project-twice-on-apple-mac-os-x/">Running R Twice on Apple Mac OS X</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-28-impressive-r-speeds/">Impressive R Speeds</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-27/">Text Editors for R: Textmate</a></li>
<li><a href="http://www.rensenieuwenhuis.nl/r-sessions-26-text-editors-for-r-internal-editor-on-os-x/">Text Editors for R: Internal Editor on OS X></a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/index-of-the-r-sessions/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>R Sessions 33: Select (nested) observations with equal number of occurences</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 10:00:05 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[balanced data]]></category>
		<category><![CDATA[merge]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[subset]]></category>
		<category><![CDATA[unbalanced data]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1107</guid>
		<description><![CDATA[Recently, I was contacted with an question about R code. A befriended researcher was working with nested data, which was unbalanced. He was working with data in a &#8216;long&#8217; format: all observations nested within the ...]]></description>
				<content:encoded><![CDATA[<p>Recently, I was contacted with an question about R code. A befriended researcher was working with nested data, which was unbalanced. He was working with data in a &#8216;long&#8217; format: all observations nested within the same group had the same identification number. But, the number of observations in each of the groups differed (hence: unbalanced data).</p>
<p>He asked me for a piece of code that creates a subset of the data that <i>is</i> balanced, i.e. all observations that are nested within equally sized groups. Or, as an alternative, all observations nested within groups with at least a minimum number of observations.</p>
<p>I solved it the quick and dirty way, and the solution involves creating additional variables, a new data.frame, and merging. It sure can be done much prettier, but it works. </p>
<p>So, I share it below:<br />
<span id="more-1107"></span></p>
<p><code><br />
id <- c("a", "b","b", "c","c","c", "d","d","d","d", "e","e","e")<br />
y <-  c(3,4,3,2,4,5,6,5,6,7,5,4,3)<br />
df <- data.frame(id, y) # setting up original data.frame</p>
<p>tab <- data.frame(id=names(table(df$id)), fre=as.vector(table(df$id))) # table of frequencies</p>
<p>df.new <- merge(df, tab, by="id") # merging frequencies-variable</p>
<p>subset(df.new, fre==3) # subsetting<br />
subset(df.new, fre>3)<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R-Sessions 32: Forward.lmer: Basic stepwise function for mixed effects in R</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-32/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-32/#comments</comments>
		<pubDate>Fri, 13 Feb 2009 10:59:03 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[forward]]></category>
		<category><![CDATA[hierarchical]]></category>
		<category><![CDATA[lme4]]></category>
		<category><![CDATA[mixed effects models]]></category>
		<category><![CDATA[multilevel]]></category>
		<category><![CDATA[stepwise]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=897</guid>
		<description><![CDATA[Intended to be a customized solution, it may have grown to be a little more. forward.lmer is an early installment of a full stepwise function for mixed effects regression models in R-Project. I may put ...]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg?w=470" alt="" data-recalc-dims="1" /></a> </p>
<p>Intended to be a customized solution, it may have grown to be a little more. forward.lmer is an early installment of a full stepwise function for mixed effects regression models in R-Project. I may put in some work to extend it, or I may not. Nevertheless, in a &#8216;forward sense of stepwise&#8217;, I think it can be pretty useful as it is. Also, it has an interesting take on the stepwise concept, I think.<br />
<!--adsense--><br />
<span id="more-897"></span></p>
<p>Most stepwise functions (as far as I know) take a base model and a bunch of variables, and then iteratively adds and/or subtracts some variables, according to various criteria, to come to the best fitting regression model. All very interesting, but how to deal with interaction variables? And moreover: most existing functions do not work with mixed effects models ((I use the term &#8216;mixed effects model&#8217; to describe this stepwise function to refer to what is often referred to as hierarchical or multilevel regression models, as well)). </p>
<p>Built around the lme4 package in R, forward.lmer provides a forward stepwise procedure to mixed effects models. Also, it allows the user not only to enter single variables to models, but also to do the same with blocks of variables. This opens up many options: users can add the complete interactions at once (i.e. both the original and the multiplicative terms), or add these consequetively. Future development will focus on additional selection criteria for interactions, such as the criterium that at least the multiplicative term needs to be statistically significant. </p>
<p>The user provides a starting model and a set of variables to evaluate. The procedure then updates the starting model with the addition of every single variable (or block of variables). The models are ordered based on their LogLikelihood (other criteria, i.e. BIC and AIC following soon), after which the best fitting model is evaluated against one of two criteria. The first criterium is that at least one of the added parameters is statistically significant. The other criterium is that the addition of the parameters together is statistically significant. </p>
<p>There are several parameters to be specified:</p>
<ul>
<li>start.model: The starting model the procedure starts with. This can be a null-model, or a model already containing several variables. All lmer-models (i.e. logistic, poisson, linear) are supported.</li>
<li>blocks: a vector of variable names (as character strings) to be added to a model. Several variables can a concatenated within the same character string, so that these are added as a block of variables, instead of a single variables at once.</li>
<li>max.iter: The maximum number of variables that are evaluated. If max.iter is reached, the procedure stops without adding more variables. </li>
<li>sig.level: This is the p-value against which it is tested whether the new model fits better than a base model. Either sig.level or zt needs to be specified, but not both at once.</li>
<li>zt: This is either the T or Z value that is used to test whether (at least) one of the added variables is statistically significant. T values are used for linear regression, Z values for binary response models.</li>
<li>print.log: Should a log be printed? The log contains information on which variables (and on which criteria) were added in each step.</li>
</ul>
<p>The forward.lmer function returns the best fitting model (according to the given criteria). Of course, one can use this resulting model as a starting model for a new stepwise procedure.</p>
<p><code><br />
forward.lmer <- function(<br />
	start.model, blocks,<br />
	max.iter=1, sig.level=FALSE,<br />
	zt=FALSE, print.log=TRUE)<br />
	{</p>
<p>	# forward.lmer: a function for stepwise regression using lmer mixed effects models<br />
	# Author: Rense Nieuwenhuis</p>
<p>	# Initialysing internal variables<br />
	log.step <- 0<br />
	log.LL <- log.p <- log.block <- zt.temp <- log.zt <- NA<br />
	model.basis <- start.model</p>
<p>	# Maximum number of iterations cannot exceed number of blocks<br />
	if (max.iter > length(blocks)) max.iter <- length(blocks)</p>
<p>	# Setting up the outer loop<br />
	for(i in 1:max.iter)<br />
		{</p>
<p>		models <- list()</p>
<p>		# Iteratively updating the model with addition of one block of variable(s)<br />
		# Also: extracting the loglikelihood of each estimated model<br />
		for(j in 1:length(blocks))<br />
			{<br />
			models[[j]] <- update(model.basis, as.formula(paste(". ~ . + ", blocks[j])))<br />
			}</p>
<p>		LL <- unlist(lapply(models, logLik))</p>
<p>		# Ordering the models based on their loglikelihood.<br />
		# Additional selection criteria apply<br />
		for (j in order(LL, decreasing=TRUE))<br />
			{</p>
<p>			##############<br />
			############## Selection based on ANOVA-test<br />
			##############</p>
<p>			if(sig.level != FALSE)<br />
				{<br />
				if(anova(model.basis, models[[j]])[2,7] < sig.level)<br />
					{</p>
<p>					model.basis <- models[[j]]</p>
<p>					# Writing the logs<br />
					log.step <- log.step + 1<br />
					log.block[log.step] <- blocks[j]<br />
					log.LL[log.step] <- as.numeric(logLik(model.basis))<br />
					log.p[log.step] <- anova(model.basis, models[[j]])[2,7]</p>
<p>					blocks <- blocks[-j]</p>
<p>					break<br />
					}<br />
				}</p>
<p>			##############<br />
			############## Selection based significance of added variable-block<br />
			##############	</p>
<p>			if(zt != FALSE)<br />
				{<br />
				b.model <- summary(models[[j]])@coefs<br />
				diff.par <- setdiff(rownames(b.model), rownames(summary(model.basis)@coefs))<br />
				if (length(diff.par)==0) break<br />
				sig.par <- FALSE</p>
<p>				for (k in 1:length(diff.par))<br />
					{<br />
					if(abs(b.model[which(rownames(b.model)==diff.par[k]),3]) > zt)<br />
						{<br />
						sig.par <- TRUE<br />
						zt.temp <- b.model[which(rownames(b.model)==diff.par[k]),3]<br />
						break<br />
						}<br />
					}					</p>
<p>				if(sig.par==TRUE)<br />
					{<br />
					model.basis <- models[[j]]</p>
<p>					# Writing the logs<br />
					log.step <- log.step + 1<br />
					log.block[log.step] <- blocks[j]<br />
					log.LL[log.step] <- as.numeric(logLik(model.basis))<br />
					log.zt[log.step] <- zt.temp<br />
					blocks <- blocks[-j]</p>
<p>					break<br />
					}<br />
				}<br />
			}<br />
		}</p>
<p>	## Create and print log<br />
	log.df <- data.frame(log.step=1:log.step, log.block, log.LL, log.p, log.zt)<br />
	if(print.log == TRUE) print(log.df, digits=4)</p>
<p>	## Return the 'best' fitting model<br />
	return(model.basis)<br />
	} </p>
<p></code></p>
<p>As always, you're invited to use this function, or to adapt it and use that. However, it is required to make mention of this function and its author. Additionally, since I intend to continue working on this function (perhaps even evolve it to a 'package' on CRAN), I would love to hear about any experiences in using it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-32/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>R-Sessions 31: Combining lmer output in a single table (UPDATED)</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-31-combining-lmer-output-in-a-single-table/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-31-combining-lmer-output-in-a-single-table/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 11:00:38 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[lme4]]></category>
		<category><![CDATA[lmer]]></category>
		<category><![CDATA[mixed effect models]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=891</guid>
		<description><![CDATA[There are various ways of getting your output from R to your publication draft. Most of them are highly efficient, but unfortunately I couldn&#8217;t find a function that combines the output from several (lmer) models ...]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg?w=470" alt="" data-recalc-dims="1" /></a><br />
<!--adsense--></p>
<p>There are various ways of getting your output from R to your publication draft. Most of them are highly efficient, but unfortunately I couldn&#8217;t find a function that combines the output from several (lmer) models and presents it in a single table. lmer is the mixed effects model function from the lme4 package. So, I wrote a simple function that does exactly that.<br />
<span id="more-891"></span></p>
<p>Using it for a specific purpose, it is not a general function or something, but it can easily be adapted for use in other settings. Here it goes:</p>
<p><code><br />
require(lme4)<br />
require(mlmRev)<br />
require(lme4)<br />
require(mlmRev)</p>
<p>model.1 <- lmer(normexam ~ 1 + (1 | school), data=Exam)<br />
model.2 <- lmer(normexam ~ standLRT + (1 | school), data=Exam)<br />
model.3 <- lmer(normexam ~ standLRT + sex + (1 | school), data=Exam)<br />
model.4 <- lmer(normexam ~ standLRT + sex + schavg + (1 | school), data=Exam)</p>
<p>model.a <- lmer(use ~ 1 + (1 | district), family=binomial, data=Contraception)<br />
model.b <- lmer(use ~ livch + (1 | district), family=binomial, data=Contraception)<br />
model.c <- lmer(use ~ age + (1 | district), family=binomial, data=Contraception)<br />
model.d <- lmer(use ~ livch + age + (1 | district), family=binomial, data=Contraception)</p>
<p>m1 <- c(model.1, model.2, model.3, model.4)<br />
m2 <- c(model.a, model.b, model.c, model.d)</p>
<p>combine.output.lmer <- function(models, labels=FALSE)<br />
	{</p>
<p>	fix.coef <- lapply(models, function(x) summary(x)@coefs)<br />
	var.coef <- lapply(models, function(x) summary(x)@REmat)<br />
	n.par <- dim(summary(models[[1]])@coefs)[2]</p>
<p>	ifelse(labels==FALSE,<br />
		fix.labels <- colnames(summary(models[[1]])@coefs),<br />
		fix.labels <- labels)</p>
<p>	var.labels <- colnames(var.coef[[1]])</p>
<p>	# Creating table with fixed parameters<br />
	output.coefs <- data.frame(Row.names=row.names(fix.coef[[1]]))<br />
	for (i in 1:length(models))<br />
		{</p>
<p>		a <- fix.coef[[i]]<br />
		colnames(a) <- paste("Model", i, fix.labels)<br />
		output.coefs <- merge(output.coefs, a, by.x=1, by.y=0, all=T, sort=FALSE)</p>
<p>		}<br />
	output.coefs[,1] <- as.character(output.coefs[,1])<br />
	output.coefs[dim(output.coefs)[1]+2, 1] <- "Loglikelihood"<br />
	LL <- unlist(lapply(models, function(x) as.numeric(logLik(x))))<br />
	output.coefs[dim(output.coefs)[1], 1:length(models)*n.par-n.par+2] <- LL</p>
<p>	# Creating table with random parameters<br />
	output.vars <- data.frame(var.coef[[1]])[,1:2]<br />
	for (i in 1:length(models))<br />
		{</p>
<p>		a <- var.coef[[i]]<br />
		colnames(a) <- paste("Model", i, var.labels)<br />
		output.vars <- merge(output.vars, a, by.x=1:2, by.y=1:2, all=T, sort=FALSE)</p>
<p>		}</p>
<p>	# Combining output.coefs and output.vars<br />
	n.cols <- dim(output.coefs)[2]<br />
	n.coefs <- dim(output.coefs)[1]<br />
	n.vars <- dim(output.vars)[1]</p>
<p>	output <- matrix(ncol=n.cols +1 , nrow=n.vars+n.coefs+2)</p>
<p>	output[1:n.coefs, -2] <- as.matrix(output.coefs)<br />
	output[n.coefs+2, 1] <- "Variance Components"<br />
	output[(n.coefs+3) : (n.coefs+n.vars+2), 1:2] <- as.matrix(output.vars[,1:2])<br />
	output[<br />
		(n.coefs+3) : (n.coefs+n.vars+2),<br />
		which(rep(c(1,1,rep(0, n.par-2)),length(models))!=0)+2] <- as.matrix(output.vars[,c(-1,-2)])</p>
<p>	colnames(output) <- c("Parameter", "Random", colnames(output.coefs)[-1])</p>
<p>	return(output)<br />
	}</p>
<p>combined <- combine.output.lmer(m1)<br />
combined <- combine.output.lmer(m2)</p>
<p>combined <- combine.output.lmer(m1, labels=c("appel", "banaan", "grapefruit"))<br />
combined <- combine.output.lmer(m2, labels=c("appel", "peer", "banaan", "grapefruit"))</p>
<p>write.csv(combined, "combined.csv", na=" ")<br />
</code></p>
<p>In this example I estimate four mixed effects models, which are concatenated in a single object 'm'. The function itself is called 'combine.output.lmer', and is used on the object 'm'. The output is a data.frame with the variable names in the first column. Not-estimated parameters in models are indicated by 'NA' in their respective columns. By writing the 'combined'-object to an external file, the NA's are lost and the file can be read into other software, such as Open Office Spreadsheet or Excell. Use the xtable-package to get it in your latex document. </p>
<p>UPDATE<br />
I updated and improved the code somewhat, for I wasn't satisfied with the results. Now the code adapts to the number of parameters derived form the models' summary, allows to add your own names to the columns, and, most importantly, also reports the random slopes.</p>
<p>Please note: due to the internal matching procedure, errors may occur when the same variable is random 'within' more than one other variable. This is only the case when other variables are random within each nesting factor as well. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-31-combining-lmer-output-in-a-single-table/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>R-Sessions 30: Visualizing missing values</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-30-visualizing-missing-values/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-30-visualizing-missing-values/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 10:00:39 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[GSS]]></category>
		<category><![CDATA[GSS cumulative file]]></category>
		<category><![CDATA[Missing values]]></category>
		<category><![CDATA[R-Project]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=872</guid>
		<description><![CDATA[<a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg" alt="" width="470" /></a> 

It always takes some time to get a grip on a new dataset, especially large ones. The code-books are often as indispensable as they are massive, and not always as clear as one would want. Routings, and resulting and strange patterns of missing values are at times difficult to find.

I found a nice way to plot missing values, using R. Basically, I thought it would be nice to calculate the percentage of missings on each variable, and do so for each year represented in the data. These numbers could be visualized using a levelplot(), which resulted in the graph below.]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://i0.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg?w=470" alt="" data-recalc-dims="1" /></a> </p>
<p>It always takes some time to get a grip on a new dataset, especially large ones. The code-books are often as indispensable as they are massive, and not always as clear as one would want. Routings, and resulting and strange patterns of missing values are at times difficult to find.</p>
<p>I found a nice way to plot missing values, using R. Basically, I thought it would be nice to calculate the percentage of missings on each variable, and do so for each year represented in the data. These numbers could be visualized using a levelplot(), which resulted in the graph below.</p>
<p><a href="http://i2.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2009/01/missings.jpg"><img src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2009/01/missings.jpg?w=450" alt="missings" title="missings" class="alignnone size-medium wp-image-873" data-recalc-dims="1" /></a><br />
<span id="more-872"></span><br />
In this example I used a small subset of variables from the <a href="http://www.icpsr.umich.edu/cocoon/ICPSR/STUDY/04697.xml">cumulative file of the General Social Survey</a>, which is freely available from the web. I used this syntax:</p>
<p><code><br />
testing.NA <- matrix(ncol=26, nrow=21)<br />
for (i in 1:dim(GSS)[2])<br />
	{<br />
	testing.NA[i,] <- tapply(GSS[[i]], GSS$year, function(x) sum(is.na(x)) / length(x))<br />
	}</p>
<p>dimnames(testing.NA) <- list(<br />
	names(GSS),<br />
	sort(unique(GSS$year)))</p>
<p>library(lattice)</p>
<p>levelplot(testing.NA,<br />
	scales=list(x=list(rot=90)),<br />
	main="Percentage missing values on variables in GSS",<br />
	xlab="Variable",<br />
	ylab="Year")<br />
</code></p>
<p>First, I defined the testing.NA matrix, using the number of years and variables. Then, in a loop, I calculate the percentage missing values, basically using is.na() and length(). I assign dimnames to the matrix and use the levelplot() function from the lattice-library to plot the matrix. That's it, easy does it.</p>
<p>But: does it help? I think it does. Of course, all this information can be gained from the code-book, and needs to be verified. However, it does give us some immediate notes on the availability of these variables. For instance, we see that in the first few years, the abany variable is missing, whereas other variables on abortion don't. When creating scales this needs to be taken into account, not to lose the complete data on the first few years. The speduc-variable (spouse's educational level) has a high number of missings, as does the denom variable. This, however, makes sense: not everybody has a spouse and the denom-variable only applies to protestants. Finally, this graph gives some pointers on a change in survey-strategy from 1988 onwards regarding the items on induced abortion. The percentage missing values increased sharply at that point, and does so for all abortion-related variables. </p>
<p>This graph does not tell what exactly happened, but does provides nice pointers on what to look for when reading the code-book. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-30-visualizing-missing-values/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>R-Sessions 29: Running R-Project twice on Apple Mac OS X</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-29-running-r-project-twice-on-apple-mac-os-x/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-29-running-r-project-twice-on-apple-mac-os-x/#comments</comments>
		<pubDate>Mon, 24 Nov 2008 10:00:40 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[statistical software]]></category>
		<category><![CDATA[twice]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=835</guid>
		<description><![CDATA[Working with statistics can be quite time consuming. As anyone working with relatively advanced models and large amounts of data knows, especially the waiting can be excruciating. Your statistical software is locked up while crunching those numbers, while you'd actually prefer to run some minor procedures, such as post-estimations, testing some loops, or simply displaying the output of a previously estimated model. With Apple's Mac OS X you now can run R-Project twice, making the most of your dual core processor. ]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg?w=470" alt="" data-recalc-dims="1" /></a><br />
Working with statistics can be quite time consuming. As anyone working with relatively advanced models and large amounts of data knows, especially the waiting can be excruciating. Your statistical software is locked up while crunching those numbers, while you&#8217;d actually prefer to run some minor procedures, such as post-estimations, testing some loops, or simply displaying the output of a previously estimated model. With Apple&#8217;s Mac OS X you now can run R-Project twice, making the most of your dual core processor. <span id="more-835"></span></p>
<p>The procedure is very easy, and it works like a charm. Mind though that, obviously, it drains your computers&#8217; resources heavily, so performance of each instance of R-Project decreases slightly at least. For that to change, we would need dual-hard disk laptops, and dual-RAM laptops and the such. Dual laptop-laptops basically.</p>
<p><img src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/11/r-copy.jpg?resize=500%2C287" alt="" title="Duplicating R-Project to run it twice" class="aligncenter size-full wp-image-837" data-recalc-dims="1" /></p>
<p>Back to running R-Project twice. Just start R-Project as usual. Then go to your applications folder and secondary-click on the R-Project app. Select &#8216;duplicate&#8217;, and there you are: an app named R copy emerges. Start this as usual and start working. </p>
<p>in the image below you see two instances of R-Project running. The first is working on a heavy-weight function that results in some output every hour or so and runs 96 times. In other words: it takes ages. However, it stores the output in an external file, and since each little bit of output needs some post-estimation before being interpreted, I can use the second instance to load that data and examine it (not shown).</p>
<p><a href="http://i0.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/11/2-r.jpg"><img src="http://i0.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/11/2-r.jpg?w=470" alt="" title="R-Project running twice on Mac OS X" class="alignnone size-medium wp-image-836" data-recalc-dims="1" /></a></p>
<p>Although you don&#8217;t need to re-install packages, the only thing I did not (yet) find out how to do is to share resources between these two instances of R-Project. Being able to share variables, models, and such would be great. Ideas anyone? </p>
<p><!--adsense--></p>
<p>&#8211; &#8211; &#8212; &#8212; &#8212;&#8211; &#8212;&#8212;&#8211;</p>
<ul>
<li><strong><a href="http://www.rensenieuwenhuis.nl/r-forum/topic/r-sessions-29-running-r-project-twice-on-apple-mac-os-x">Discuss this article and pose additional questions in the R-Sessions Forum</a></strong></li>
</ul>
<p>&#8211; &#8211; &#8212; &#8212; &#8212;&#8211; &#8212;&#8212;&#8211;<br />
<a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/">R-Sessions</a> is a collection of manual chapters for R-Project, which are maintained on <a href="www.rensenieuwenhuis.nl">Curving Normality</a>. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.<br />
&#8212;&#8212;&#8211; &#8212;&#8211; &#8212; &#8212; &#8211; &#8211;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-29-running-r-project-twice-on-apple-mac-os-x/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>R-Sessions 28: Impressive R Speeds</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-28-impressive-r-speeds/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-28-impressive-r-speeds/#comments</comments>
		<pubDate>Thu, 30 Oct 2008 10:00:22 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[macbook]]></category>
		<category><![CDATA[matrix]]></category>
		<category><![CDATA[R-Project]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[system.time]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=778</guid>
		<description><![CDATA[Yesterday, I received my new Apple MacBook. It's running a Core 2 Duo at 2.4 Ghz and it's fast. Really fast! I tested it with using R-Project, doing some timings on matrix transformations.

Apparently, it's very cool to show of the speed of R-Project on your system. Optimized .DLL files help to speed up your R on Windows systems (and possibly other systems as well) with respect to matrix transformations, which has led to enormous speed increases. So, let's perform a speed-test of our own.]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/"><img title="R-Sessions" src="http://i2.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/07/r-sessions.jpg?w=470" alt="" data-recalc-dims="1" /></a> </p>
<p>Yesterday, I received my new Apple MacBook. It&#8217;s running a Core 2 Duo at 2.4 Ghz and it&#8217;s fast. Really fast!</p>
<p>Apparently, it&#8217;s very cool to show of the speed of R-Project on your system. <a href="http://cran.r-project.org/bin/windows/contrib/ATLAS/">Optimized .DLL files</a> help to speed up your R on Windows systems (and possibly other systems as well) with respect to matrix transformations, which has led to enormous speed increases. So, let&#8217;s perform a speed-test of our own.<br />
<span id="more-778"></span></p>
<p>First of all, in the syntax below, the Matrix package is activated, using the require() command. Since we will be creating random data, we set the seed in order to receive the exact same data every time the test is run. This is done with set.seed(). The next line creates a matrix X, which in the last three lines is manipulated in different ways. </p>
<p>To test how long this takes, we enclose that matrix operations in the system.time() function, which clocks the operation.</p>
<p><code><br />
require(Matrix)<br />
set.seed(123)<br />
X <- Matrix(rnorm(1e6), 1000)<br />
system.time(for(i in 1:25) X%*%X)<br />
system.time(for(i in 1:25) solve(X))<br />
system.time(for(i in 1:10) svd(X))<br />
</code></p>
<p>This results in the following output:</p>
<p><code><br />
> X <- Matrix(rnorm(1e6), 1000)<br />
> system.time(for(i in 1:25) X%*%X)<br />
   user  system elapsed<br />
  8.306   0.591   5.031<br />
> system.time(for(i in 1:25) solve(X))<br />
   user  system elapsed<br />
  8.933   1.331   6.684<br />
> system.time(for(i in 1:10) svd(X))<br />
   user  system elapsed<br />
 36.989   3.665  33.384<br />
 </code></p>
<p>WOW! This is the <a href="http://stijnr.socsci.ru.nl/blog/?p=228">fastest I've seen in real life</a>, even faster than some of the desktops that I know people currently work with (i.e. my own). I'm however very sure that it is not the fastest possible, not to say compared with how fast future calculations will be. </p>
<p>Additionally, in the near future my MacBook will be configured with 4 Gb RAM, so I'm curious to find out whether or not this will result in an additional speed increase. I expect, however, most benefit from the additional RAM when doing binomial mixed effects models, so of course expect a comparative benchmark on that one as well as soon as the new RAM arrives.</p>
<p>So, in the meantime, you can use this code to do some benchmarks yourself, on various computers. Please post the results here, or discuss them in the R-Sessions Forum.</p>
<p>UPDATE:<br />
I also tested my old Powerbook G4 (1.5 Ghz, 1.25 Gb RAM):<br />
<code><br />
> set.seed(123)<br />
> X <- Matrix(rnorm(1e6), 1000)<br />
> system.time(for(i in 1:25) X%*%X)<br />
   user  system elapsed<br />
 34.661   1.590  47.528<br />
> system.time(for(i in 1:25) solve(X))<br />
   user  system elapsed<br />
 37.184   1.656  51.516<br />
> system.time(for(i in 1:10) svd(X))<br />
   user  system elapsed<br />
247.694  11.258 331.979<br />
</code></p>
<p><!--adsense--></p>
<p>- - -- --- ----- --------</p>
<ul>
<li><strong><a href="http://www.rensenieuwenhuis.nl/r-forum/topic/r-sessions-28-impressive-r-speeds">Discuss this article and pose additional questions in the R-Sessions Forum</a></strong></li>
</ul>
<p>- - -- --- ----- --------<br />
<a href="http://www.rensenieuwenhuis.nl/archive/category/r-project/r-sessions/">R-Sessions</a> is a collection of manual chapters for R-Project, which are maintained on <a href="www.rensenieuwenhuis.nl">Curving Normality</a>. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.<br />
-------- ----- --- -- - -</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-28-impressive-r-speeds/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
