<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rense Nieuwenhuis &#187; lying with statistics</title>
	<atom:link href="http://www.rensenieuwenhuis.nl/tag/lying-with-statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rensenieuwenhuis.nl</link>
	<description>&#34;The extra-ordinary lies within the curve of normality&#34;</description>
	<lastBuildDate>Wed, 03 Jun 2026 14:51:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.2.2</generator>
	<item>
		<title>Newsflash: Lucia de B. gets re-trial!</title>
		<link>http://www.rensenieuwenhuis.nl/newsflash-lucia-de-b-gets-re-trial/</link>
		<comments>http://www.rensenieuwenhuis.nl/newsflash-lucia-de-b-gets-re-trial/#comments</comments>
		<pubDate>Wed, 08 Oct 2008 10:00:37 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Lucia de B.]]></category>
		<category><![CDATA[lying with statistics]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=668</guid>
		<description><![CDATA[Dutch nurse Lucia de B., convicted to a life sentence for the murder on 7 infants during her shifts, is now entitled to a re-trial. Why do I write about it here? Because one of the grounds she was convicted on was a statistical argument. A statistical argument that has been thoroughly contested by prominent statisticians, arguing that according to the court's line of reasoning, <a href="http://www.math.leidenuniv.nl/~gill/hetero2.pdf">one out of every nine nurses</a> would go to jail!

I have written before about this statistical argument, but did so in Dutch. For those interested, I'll give you a short recap, and a nice movie.]]></description>
				<content:encoded><![CDATA[<p>Dutch nurse Lucia de B., convicted to a life sentence for the murder on 7 infants during her shifts, is now entitled to a re-trial. Why do I write about it here? Because one of the grounds she was convicted on was a statistical argument. A statistical argument that has been thoroughly contested by prominent statisticians, arguing that according to the court&#8217;s line of reasoning, <a href="http://www.math.leidenuniv.nl/~gill/hetero2.pdf">one out of every nine nurses</a> would go to jail!</p>
<p>I <a href="http://www.rensenieuwenhuis.nl/archive/zaak-lucia-de-b-wordt-herzien/">have</a> <a href="http://www.rensenieuwenhuis.nl/archive/lucia-de-b-deel-2/">written</a> <a href="http://www.rensenieuwenhuis.nl/archive/hoe-groot-is-de-kans-dat-lucia-de-b-onschuldig-vastzit/">before</a> about this statistical argument, but did so in Dutch. For those interested, I&#8217;ll give you a short recap, and a nice movie.<br />
<span id="more-668"></span></p>
<p>Lucia de B. has been convicted for murder on seven children on numerous grounds. Most of these have been contested or already been refuted. Ton Derksen, a Dutch philosopher of science, even wrote a book to discuss many of the court&#8217;s considerations. One of the main arguments has been, that an statistically highly improbable number of children died during her shifts. There are many arguments against this statement. For instance, after she was related to one unusual death, investigators specifically sought for other unusual deaths during her shifts. Clearly, this increases the numerator of the abovementioned chance. Later, it was discovered that the &#8216;unusual&#8217; deaths actually didn&#8217;t need to be unusual, for the &#8216;unusual&#8217; substance in the infants&#8217; blood had been mixed up with a similarly named, but completely different, substance that is found in infants blood very often.</p>
<p>Nevertheless, one of the courts&#8217; main considerations was that the chance that so many infants would die during or shortly after the shifts of Lucia de B. was so low, that she had to be guilty. Apparently, the court reasoned that the probability of these events (deaths during her shift) was so low, that other explanations would be highly improbable. </p>
<p>Clearly, something goes horribly wrong here. In the movie below a similar case is addressed by Peter Donnelly in a very accessible way. In the case that is addressed in the movie, a woman was convicted for the murder of her two children. These two children, independently, had died from sudden infant death syndrome. Sudden infant death syndrome is rather rare (and a tragedy for the family). As a matter of fact, it is so rare, that the chance that it happens to two babies of the same mother is so extremely small (according to the judge: extremely small chance times another extremely small chance), that this mother was sent to jail. </p>
<p>I&#8217;m not going to completely summarise Peter Donnelly&#8217;s arguments, but what it comes down to, is that we should interpret the court&#8217;s decision as a &#8216;test&#8217;. And we know of statistical tests that two errors can be made: we can erroneously conclude that an event is highly improbably, while it in fact is not. Or, we can erroneously conclude that something is not highly improbable, while in fact it is.</p>
<p>What this has to do with the case of the mother who lost two of her babies to infant death syndrome, and correspondingly to the case of Lucia de B., is made clear in the movie below. The basic argument, which relates to the case of Lucia de B., is that although some events are rather rare, if enough possibilities for the event to occur are present (lots of mothers have two babies, many nurses work with infants who die), the odds of the event to occur <i>in the whole population</i> isn&#8217;t that small at all. Watch and see how Peter Donnelly explains this eloquently:</p>
<p><object height="353" width="425"><param name="movie" value="http://www.youtube.com/v/kLmzxmRcUTo"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/kLmzxmRcUTo" type="application/x-shockwave-flash" wmode="transparent" height="353" width="425"></embed></object></p>
<p>And by the way: for the statisticians amongst us: the prominent statisticians <a href="<a href="http://www.math.leidenuniv.nl/~gill/hetero2.pdf">&#8220;>I mentioned before</a> basically argue that the assumption of homoscedasticity has not been met, which makes matters even worse!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/newsflash-lucia-de-b-gets-re-trial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lying with WordPress statistics</title>
		<link>http://www.rensenieuwenhuis.nl/lying-with-wordpress-statistics/</link>
		<comments>http://www.rensenieuwenhuis.nl/lying-with-wordpress-statistics/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 22:46:55 +0000</pubDate>
		<dc:creator><![CDATA[Rense Nieuwenhuis]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[lying with statistics]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://www.rensenieuwenhuis.nl/?p=376</guid>
		<description><![CDATA[I must admit that I repeatedly feel flattered by the number of page-views on my blog as shown by the WordPress statistics plugin. However, despite the nice graphical representation, they are a little too flattering ...]]></description>
				<content:encoded><![CDATA[<p><!--adsense--></p>
<p>I must admit that I repeatedly feel flattered by the number of page-views on my blog as shown by the <a href="http://wordpress.org/extend/plugins/stats/">WordPress statistics plugin</a>. However, despite the nice graphical representation, they are a little too flattering for the humble number of page-views my blog attracts. A traditional line-graph consists of two axes. Traditionally, these are referred to as the x-axis, and the y-axis. To say it bluntly: the wordpress statistics plug-in messes up on account of both axes. <span id="more-376"></span><br />
Â </p>
<h2>Problems with the Y-axis</h2>
<p>Regarding the y-axis, representing the number of page-views on a specific day, the problem lies with the numeric limits of the axis. In other words: in every representation the minimum and maximum value on the y-axis differs, which is especially problematic regarding the minimum value of the axis. When I try to discern a trend in the humble number of page-views on my blog, I&#8217;m often mislead due to this problem.</p>
<p>Let&#8217;s take a look. Today, I saw the graph represented below on my WordPress Dashboard. This graph clearly shows an increase in the number of page-views since June 5th. At least, so it appears. When we take a closer look at the y-axis, we see that it starts at 10, instead of 0. Is that a big deal? Yes it is. This means that the graph only shows  80% of the total reach of the graph.</p>
<p><a href="http://www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/wordpress-graph.tiff"><img class="alignnone size-full wp-image-379" style="vertical-align: middle;" title="wordpress-graph" src="http://www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/wordpress-graph.tiff" alt="" width="475" /></a></p>
<p>About ten minutes later a new day started which resulted in a totally different graph, as shown below. Still, we see an increase in the absolute number of page-views, but now the slope of the line seems to be less steep. Instead of a clear increase, we now see more clearly a relatively stable number of page-views of about 30, with an initial dip and a peak at the end of the selected time-period. Reason for this is the completely different values used to represent the y-axis. This results in a graph which is a lot less optimistic for my blog.</p>
<p><a href="http://i2.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/grab.jpg"><img class="aligncenter size-full wp-image-381" title="Wordpress stats" src="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/grab.jpg?resize=475%2C242" alt="" data-recalc-dims="1" /></a></p>
<h2>Problems with the X&#8211;axis</h2>
<p>Regarding the x-axis, representing the dates on which the page-views were registered, the problem lies with the absence of dates on which no page-views  were registered at all. This seems to be an even more serious problem. Again, I&#8217;ve added the graph created by WordPress statistics on one of my blog-posts. It shows a recognizable pattern: the most page-views on the day it was published and after that a steady decline. However, a superficial review of this graph would lead to the conclusion that readers have found this blog-post once or twice a day after that.</p>
<p>Â </p>
<p><a href="http://www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/pageviews-mars.tiff"><img class="aligncenter size-full wp-image-377" title="pageviews-mars" src="http://www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/pageviews-mars.tiff" alt="" width="475" /></a></p>
<p>But, is that a valid conclusion? No it is not. A more detailed view of the graph shows that there have been many days that no-one at all viewed this post. Below, I&#8217;ve printed a bar-chart that also shows the days with 0 page-views. Again, it shows an image a lot less optimistic regarding my blog.</p>
<p>Â </p>
<p><a href="http://i1.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/page-views-mars.jpg"><img class="aligncenter size-medium wp-image-378" title="page-views-mars" src="http://i2.wp.com/www.rensenieuwenhuis.nl/wp-content/uploads/2008/06/page-views-mars-300x199.jpg?w=475" alt="" data-recalc-dims="1" /></a></p>
<h2>How bad is it and can it be solved?</h2>
<p>Is this all that bad? Well, it gives the users of WordPress a positive feeling about themselves. But, for those who want a realistic overview of the success of their blog, the image given by these stats is not all that helpful. The more page-views a blog has, the worse this problem becomes: the plugin adjust the y-axis to show the absolute variation in number of page-views as large as possible, thereby overestimating the relative differences. The problem with the x-axis is highly problematic for the posts that are rarely visited.</p>
<p>Should this be adjusted in a next version of the plugin? Well, that&#8217;s a point for discussion. I can image a few people disappointed when their optimistic images turn more realistic. Nevertheless, I would suggest that the folks at WordPress at least would add the possibility for users to manually define the limits of the axes. Or, at the least, allow users to select the visualization of days that no page-views were registered on the x-axis and to have the y-axis start at zero.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rensenieuwenhuis.nl/lying-with-wordpress-statistics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
