<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rense Nieuwenhuis &#187; subset</title>
	<atom:link href="http://www.rensenieuwenhuis.nl/tag/subset/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rensenieuwenhuis.nl</link>
	<description>&#34;The extra-ordinary lies within the curve of normality&#34;</description>
	<lastBuildDate>Thu, 12 Mar 2026 14:58:15 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.2.2</generator>
	<item>
		<title>R Sessions 33: Select (nested) observations with equal number of occurences</title>
		<link>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/</link>
		<comments>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 10:00:05 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[R-Project]]></category>
		<category><![CDATA[R-Sessions]]></category>
		<category><![CDATA[balanced data]]></category>
		<category><![CDATA[merge]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[subset]]></category>
		<category><![CDATA[unbalanced data]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=1107</guid>
		<description><![CDATA[Recently, I was contacted with an question about R code. A befriended researcher was working with nested data, which was unbalanced. He was working with data in a &#8216;long&#8217; format: all observations nested within the ...]]></description>
				<content:encoded><![CDATA[<p>Recently, I was contacted with an question about R code. A befriended researcher was working with nested data, which was unbalanced. He was working with data in a &#8216;long&#8217; format: all observations nested within the same group had the same identification number. But, the number of observations in each of the groups differed (hence: unbalanced data).</p>
<p>He asked me for a piece of code that creates a subset of the data that <i>is</i> balanced, i.e. all observations that are nested within equally sized groups. Or, as an alternative, all observations nested within groups with at least a minimum number of observations.</p>
<p>I solved it the quick and dirty way, and the solution involves creating additional variables, a new data.frame, and merging. It sure can be done much prettier, but it works. </p>
<p>So, I share it below:<br />
<span id="more-1107"></span></p>
<p><code><br />
id <- c("a", "b","b", "c","c","c", "d","d","d","d", "e","e","e")<br />
y <-  c(3,4,3,2,4,5,6,5,6,7,5,4,3)<br />
df <- data.frame(id, y) # setting up original data.frame</p>
<p>tab <- data.frame(id=names(table(df$id)), fre=as.vector(table(df$id))) # table of frequencies</p>
<p>df.new <- merge(df, tab, by="id") # merging frequencies-variable</p>
<p>subset(df.new, fre==3) # subsetting<br />
subset(df.new, fre>3)<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/r-sessions-33-select-nested-observations-with-equal-number-of-occurences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
