<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistical Graphics and more</title>
	<atom:link href="http://www.theusrus.de/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theusrus.de/blog</link>
	<description>Statistical Graphics, Data Visualization, Visual Analytics, Data Analysis, Data Mining, User Interfaces - you name it</description>
	<lastBuildDate>Thu, 17 May 2012 08:42:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Fundamentals: What&#8217;s the story?</title>
		<link>http://www.theusrus.de/blog/fundamentals-whats-the-story/</link>
		<comments>http://www.theusrus.de/blog/fundamentals-whats-the-story/#comments</comments>
		<pubDate>Thu, 17 May 2012 08:42:45 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[Fundamentals of Graphical Data Analysis]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1285</guid>
		<description><![CDATA[In an age where &#8220;data is the new oil&#8221; (a controversial claim, worth its own post &#8230;) there is data everywhere, i.e., data is collected more and more automatically, be it by smartphones, cameras, or social networks sucking up people&#8217;s privacy. Having all this data at hand, opens up the possibility to visualize things we [...]]]></description>
			<content:encoded><![CDATA[<p>In an age where &#8220;data is the new oil&#8221; (a controversial claim, worth its own post &#8230;) there is data everywhere, i.e., data is collected more and more automatically, be it by smartphones, cameras, or social networks sucking up people&#8217;s privacy. Having all this data at hand, opens up the possibility to visualize things we never had a chance to look at before. One (early) example is certainly the &#8220;<a href="http://www.theusrus.de/blog/facebook-map/">Facebook map</a>&#8220;.</p>
<p>Going back to a quote of <strong><span style="color: #800080;">John W. Tukey</span></strong> &#8211; who can be seen as the reviving power of statistical graphics, and thus ultimately of visualization in general &#8211; we can learn a bit about the motivation behind graphical data analysis</p>
<p><span style="color: #800080;">“&#8230; paradigm of exploratory data analysis</span><br />
<span style="color: #800080;">a) here is the data</span><br />
<span style="color: #800080;">b) what is it trying to tell us; in particular, <strong><span style="color: #800080;">which question does it want us to ask?</span></strong></span><br />
<span style="color: #800080;">c) <strong><span style="color: #800080;">what seems to be going on?</span></strong>”</span></p>
<p>Although there seem to be the &#8220;data first&#8221; aspect in both the classical EDA approach and the modern data visualization, we can find a fine distinction regarding the motivation.</p>
<p>Here are two examples which are dominated by their flashy presentation, but fail to ask the relevant questions and can&#8217;t really tell us a story showing what seems to be going on, apart from what we (trivially) knew before.</p>
<p style="text-align: center;"><a href="http://triposo.com/labs/snapshots"><img class="aligncenter" title="Snaphots taken on May 1st" src="http://www.theusRus.de/Blog-files/triposo.png" alt="Snaphots taken on May 1st" width="575" height="323" /></a></p>
<p>This example is taken from the <a href="http://triposo.com/labs/snapshots">triposo</a> website and shows locations of photos taken with smartphones and logged with the triposo trip advisory application. Whereas this is a cool visualization; what is it trying to tell us? From the comment on the website, we can see how badly the &#8220;story&#8221; behind the data fails: &#8220;<em>This is probably the clearest example of all: Labor Day celebrations light up Europe and China in a big way. Who doesn&#8217;t want to take a picture of a nice 1st of May Parade?</em> &#8230;&#8221; At least in the last 25 years, it was was hardtop find a single 1st of May Parade in Europe.</p>
<p>It gets even worse when the visualization actually shows things that are not in the data as in the next example from <a href="http://www.villevivante.ch/">villevivante</a>.</p>
<p><a href="http://www.villevivante.ch/"><img class="aligncenter" title="mobile phone traces in Geneva" src="http://www.theusRus.de/Blog-files/villevivante.png" alt="mobile phone traces in Geneva" width="575" height="371" /></a></p>
<p><a href="http://flowingdata.com/2012/03/01/mobile-phone-digital-traces/">Nathan</a> did post this example and finished with &#8220;<em><strong>It&#8217;s hard to say exactly what you&#8217;re seeing here</strong> because it does move so fast, and it probably means more if you live in or near Geneva, but speaking to the video itself, you have your highs and lows during the start and end of days.</em>&#8221; It is not a particular insight that most of us travel into cities to go to work in the morning and move back out to our home at the end of the day. What is interesting though, is that according to the visualization, people in Geneva do not move along roads, but seem to enter the city like a swarm of bees &#8230;</p>
<p>To summarize, a good visualization should (at least) fulfill these requirements:</p>
<ol>
<li>Be clear about what data was used (especially regarding generalization)</li>
<li>Make sure the visual abstraction does not lead to misinterpretations</li>
<li><strong>Actually tell a story</strong></li>
<li><strong>Answer questions where we didn&#8217;t know the answer already</strong></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/fundamentals-whats-the-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Titanic Disaster: Visualization Insights 100 years later</title>
		<link>http://www.theusrus.de/blog/titanic-disaster-visualization-insights-100-years-later/</link>
		<comments>http://www.theusrus.de/blog/titanic-disaster-visualization-insights-100-years-later/#comments</comments>
		<pubDate>Sun, 15 Apr 2012 10:11:45 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1374</guid>
		<description><![CDATA[Today is the 100th anniversary of the Titanic disaster and it is all over in the media &#8211; ranging from movies to serious reportage. Having worked on visualization of categorical data visualization for a long time, the Titanic data became somewhat like &#8220;the mother of all demo datasets&#8221; and may be boring people by now [...]]]></description>
			<content:encoded><![CDATA[<p>Today is the 100th anniversary of the Titanic disaster and it is all over in the media &#8211; ranging from movies to serious reportage. Having worked on visualization of categorical data visualization for a long time, the Titanic data became somewhat like &#8220;the mother of all demo datasets&#8221; and may be boring people by now as much as the Iris dataset does for multivariate analyses.</p>
<p>Nonetheless, there are two issues (which can be found in past posts on this blog), which summarize the most important facts on how the disaster was handled:</p>
<p style="text-align: left; padding-left: 30px;">1. &#8220;<a title="Women and Children first" href="http://www.theusrus.de/blog/understanding-mosaic-plots/">Women and Children first!</a>&#8220;</p>
<p style="text-align: center;"><img class="aligncenter" title="Women and Children first!" src="http://www.theusRus.de/Blog-files/Mosaic2a.png" alt="" width="485" height="455" />(each rectangle shows a combination of Class x Age x Sex) and<br />
the proportion of survived passengers via highlighting)</p>
<p style="text-align: left; padding-left: 30px;">2. Travelling 1st Class makes it easier to get into a <a title="Titanic Life Boat Data" href="http://www.theusrus.de/blog/titanic-disaster-revisited/">life boat</a> &#8230;</p>
<p style="text-align: center;"><a href="http://www.theusRus.de/Blog-files/TitanicBoats.png"><img class="aligncenter alignnone" title="Launch Sequence of Life Boats (Woman Highlighted)" src="http://www.theusRus.de/Blog-files/TitanicBoatsSmall.png" alt="" width="575" height="277" /></a> (Launch sequence of life boats &#8211; woman highlighted)</p>
<p style="padding-left: 30px; text-align: left;">The shocking result of this visualization is that the first 6 boats were exclusively used for 1st class passengers (although boat No. 1 was not filled at all), and it took another 4 boats until 3rd class passengers were considered to be saved. Starting with the 15th life boat, chaos spread, and the last three boats went afloat almost empty.</p>
<p style="text-align: left;">Looking at the two visualizations, shows clearly what fuels the stories (not only) in the Titanic movies. 2nd class males are among those with the lowest survival rate, indicating a very heroic attitude. The boat filling strategy shows a strong social bias, which can only partly be excused by the coincidence of passengers locations and boat locations.</p>
<p style="text-align: left;">(Here is the data on the <a href="http://www.theusRus.de/Blog-files/Titanic.txt">passengers</a> and <a href="http://www.theusRus.de/Blog-files/boats-breakdown.txt">boats</a> used for the <a href="http://www.theusRus.de/Mondrian">visualization</a>)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/titanic-disaster-visualization-insights-100-years-later/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Happy Statistics</title>
		<link>http://www.theusrus.de/blog/happy-statistics/</link>
		<comments>http://www.theusrus.de/blog/happy-statistics/#comments</comments>
		<pubDate>Wed, 28 Mar 2012 19:05:34 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[References]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1362</guid>
		<description><![CDATA[Some weeks ago I was browsing through the categories of TED Talks and was surprised to find an entry on statistics &#8211; which is regraded to be a very boring topic by most people. The talk I chose to watch was by Nic Marks &#8211; as I tried to avoid yet another Hans Rosling talk. [...]]]></description>
			<content:encoded><![CDATA[<p>Some weeks ago I was browsing through the <a href="http://www.ted.com/talks/tags" target="_blank">categories</a> of <a title="TED" href="http://www.ted.com/talks" target="_blank">TED Talks</a> and was surprised to find an entry on <a href="http://www.ted.com/talks/tags/statistics" target="_blank">statistics</a> &#8211; which is regraded to be a very boring topic by most people.</p>
<p>The talk I chose to watch was by Nic Marks &#8211; as I tried to avoid yet another Hans Rosling talk.</p>
<p><center><br />
<object width="526" height="374" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="bgColor" value="#ffffff" /><param name="flashvars" value="vu=http://video.ted.com/talk/stream/2010G/Blank/NicMarks_2010G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/NicMarks-2010G.embed_thumbnail.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=944&amp;lang=&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=nic_marks_the_happy_planet_index;year=2010;theme=what_makes_us_happy;event=TEDGlobal+2010;tag=culture;tag=data;tag=economics;tag=global+issues;tag=happiness;tag=statistics;tag=tedbooks;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><param name="src" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" /><param name="pluginspace" value="http://www.macromedia.com/go/getflashplayer" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><embed width="526" height="374" type="application/x-shockwave-flash" src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" allowFullScreen="true" allowScriptAccess="always" wmode="transparent" bgColor="#ffffff" flashvars="vu=http://video.ted.com/talk/stream/2010G/Blank/NicMarks_2010G-320k.mp4&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/NicMarks-2010G.embed_thumbnail.jpg&amp;vw=512&amp;vh=288&amp;ap=0&amp;ti=944&amp;lang=&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=nic_marks_the_happy_planet_index;year=2010;theme=what_makes_us_happy;event=TEDGlobal+2010;tag=culture;tag=data;tag=economics;tag=global+issues;tag=happiness;tag=statistics;tag=tedbooks;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" pluginspace="http://www.macromedia.com/go/getflashplayer" allowfullscreen="true" allowscriptaccess="always" /></object></center><br />
Apart from Marks&#8217; sweeping positive charisma, what struck me most, was the very relevant question, why we choose measures like GDP or NASDAQ to measure our &#8220;wellbeing&#8221; &#8211; as these KPIs mostly measure how efficient we destroy our (or others) environment or how much we increased the imbalance of wealth within our societies.</p>
<p>After watching the talk I ended up at the <a href="http://www.happyplanetindex.org/" target="_blank">happy planet index</a> site, and found myself flipping through their <a href="http://www.happyplanetindex.org/public-data/files/happy-planet-index-2-0.pdf" target="_blank">report</a>. On page 16, I found Figure 2, which kind of blew me away.</p>
<p><img class="aligncenter" title="College Student's Prospects" src="http://www.theusRus.de/Blog-files/Meaning.png" alt="" width="575" height="373" /></p>
<p>I don&#8217;t want to get into moral preachings here, but take your time and rethink your life and attitude and try to come up with some explanation and projection of what has changed over the last 45 years &#8211; I am curious to see your comments.</p>
<p>Sometimes even very simple statistics make you think hard &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/happy-statistics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>EU Debt Crisis Visualizations: No JMP forward</title>
		<link>http://www.theusrus.de/blog/eu-debt-crisis-visualizations-no-jmp-foreward/</link>
		<comments>http://www.theusrus.de/blog/eu-debt-crisis-visualizations-no-jmp-foreward/#comments</comments>
		<pubDate>Thu, 23 Feb 2012 19:11:33 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[The Good & the Bad]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1355</guid>
		<description><![CDATA[On the JMP blog you find a post which uses the same data source I took for my first attempt to visualize the web of the EU debt. I found it hard to really make a point with this data. Looking at the JMP post tells me I wasn&#8217;t all too bad. Let&#8217;s walk through [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Hellas by Klenze" src="http://www.theusRus.de/Blog-files/Hellas.png" alt="" width="575" height="119" /></p>
<p>On the JMP blog you find a <a href="http://blogs.sas.com/content/jmp/2012/02/14/visualizing-eurozone-debt-crisis-data/" target="_blank">post</a> which uses the same <a href="http://www.bbc.co.uk/news/business-15748696" target="_blank">data source</a> I took for <a href="http://www.theusrus.de/blog/eu-debt-crisis-what-crisis/">my first attempt to visualize the web of the EU debt</a>. I found it hard to really make a point with this data. Looking at the JMP post tells me I wasn&#8217;t all too bad. Let&#8217;s walk through their visualizations:</p>
<ol>
<li><strong>The Map</strong><br />
Certainly necessary for someone in the US &#8211; who knows where Portugal, Spain and Austria are; or was it Australia ..?<br />
But seriously, scaling the very irregular shapes of the countries outline has more problems (on the perceptual side) as benefits.</li>
<li><strong>The Heat Map</strong><br />
This graph degrades the information to a binary information of being creditor or not &#8211; be it 1€ or 1,000,000,000€. The conclusion that Germany and France are in trouble because they lend to many other countries is not convincing as it only reflects their economical power.</li>
<li><strong>The Tree Map</strong><br />
As tree maps are &#8220;only&#8221; a different representation of a tree, i.e., a hierarchy, why would you visualize a matrix with it? I can&#8217;t really  follow the chain reaction of defaults which is interpreted from this tree map. Looks a bit along the lines &#8220;if all you have is a hammer, every problem looks like a nail&#8221;.</li>
</ol>
<p>One thing that is really not well thought about the graphs are the different color schemes between the graphs.</p>
<p>In the end, I think someone really needs to get &#8220;the right&#8221; data and show us &#8220;the right&#8221; visualizations such that we understand what&#8217;s going on &#8211; but hurry up, Greece might be broke by then &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/eu-debt-crisis-visualizations-no-jmp-foreward/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Good &amp; the Bad [2/2012]</title>
		<link>http://www.theusrus.de/blog/the-good-the-bad-22012/</link>
		<comments>http://www.theusrus.de/blog/the-good-the-bad-22012/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 19:58:18 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[The Good & the Bad]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1336</guid>
		<description><![CDATA[Looking for a map of the french Departments, I came across this map of the population density of France on Departments level which can be found on Wikipedia &#8211; and you may guess: this is this month&#8217;s &#8220;The Bad&#8221;. At first sight there seems to be a contradiction between the apparently continuous color scale (see [...]]]></description>
			<content:encoded><![CDATA[<p>Looking for a map of the french Departments, I came across this map of the population density of France on Departments level which can be found on <a href="http://en.wikipedia.org/wiki/Departments_of_France" target="_blank">Wikipedia</a> &#8211; and you may guess: this is this month&#8217;s &#8220;The Bad&#8221;.</p>
<p><img class="aligncenter" title="Population Density in France from Wikipedia" src="http://www.theusRus.de/Blog-files/WikiVersion.png" alt="" width="565" height="600" /></p>
<p>At first sight there seems to be a contradiction between the apparently continuous color scale (see <a href="http://gabrielflor.it/" target="_blank">here</a> for some thoughts on coropleth maps) and the map that does not seem to give any decent insight in the geographical distribution of population density. The answer is twofold.</p>
<p>1. The color scale is not continuous but has a break between green and blue (unless you invert the shades of blue) and blue and yellow. What we would expect &#8211; in less saturated colors &#8211; looks like this:<br />
<img class="alignleft" title="Color Scale" src="http://www.theusRus.de/Blog-files/ColorScale.png" alt="" width="577" height="90" /></p>
<p>2. For a map showing a continuous quantity, we usually would not choose so many different saturated colors.</p>
<p>Let&#8217;s approach &#8220;The Good&#8221; as I still need to convince you that there might be a better version of the map. In a perfect world, coropleth maps look smooth and &#8220;continuous&#8221;. For the map of France we might want to look at the distance to the capitol Paris, as France is very centralistic. This map uses a monochromatic scale and shows &#8220;the perfect world&#8221; &#8230;</p>
<p><img class="aligncenter" title="A smooth coropleth map" src="http://www.theusRus.de/Blog-files/DistFromParis.png" alt="" width="533" height="494" /></p>
<p>As this one is obviously too trivial, we want to look at the population density as in the above plot (2011 census data from <a href="http://en.wikipedia.org/wiki/List_of_French_departments_by_population" target="_blank">wikipedia</a>). Using a simple linear scale we would end up with this (useless) map, which uses a color scale that ranges from blue (small values) over white (median values) to red (large values):</p>
<p><img class="aligncenter" title="Linear Color Mapping" src="http://www.theusRus.de/Blog-files/FranceLinear.png" alt="" width="533" height="494" /></p>
<p>Except for Paris and three other departments, all regions are unpopulated compared to the capitol. The extremely skewed distribution which is shown in the lower left, explains the dilemma.</p>
<p>Using the same &#8220;trick&#8221; as in the original wiki-map, i.e., cutting off all values above 150 we get a map that is easier to read, but now equalizes all information for areas above 150.</p>
<p><img class="aligncenter" title="Large Values Cut Off" src="http://www.theusRus.de/Blog-files/FranceCutOff.png" alt="" width="533" height="494" /></p>
<p style="text-align: center;"><span style="color: #808080;">(Note, I used the histogram of log(population Density) for the legend) </span></p>
<p style="text-align: left;"><span style="color: #000000;">The result is much better now, but there seem to be too many departments put into a single class.</span></p>
<p>From the data on the log-scale, we already see what would be most desirable, i.e., a distribution of colors, which is close to a normal distribution. Using a non-continuous transformation of the variable we display, we can map the color-shades to be normal, which ends up in the following map, which I would classify as &#8220;The Good&#8221;.</p>
<p><img class="aligncenter" title="A better map for the population density in France" src="http://www.theusRus.de/Blog-files/FranceTheGood.png" alt="" width="533" height="494" /></p>
<p>We now get a fairly good feeling of which regions are highly populated, which ones are close to the median (even with a distinction of being above or below average) and also clearly see the extremely unpopulated departments.</p>
<p>There is a lot more to say about the do&#8217;s and don&#8217;ts for drawing choropleth maps (which can be found <a href="http://www.interactivegraphics.org" target="_blank">here</a> in Chapter 6). What is even more fun is to play around yourself! Here is the <a href="http://www.theusRus.de/Blog-files/France.zip" target="_blank">data</a> (unzip and load France.txt with Mondrian) and here is the <a href="http://www.theusRus.de/Mondrian" target="_blank">software</a> &#8211; have fun!</p>
<p>(Thanks to <a href="http://www.rosuda.org/~unwin" target="_blank">Antony</a> for providing the map!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/the-good-the-bad-22012/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>On Twisters and Killer Tornados</title>
		<link>http://www.theusrus.de/blog/on-twisters-and-killer-tornados/</link>
		<comments>http://www.theusrus.de/blog/on-twisters-and-killer-tornados/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 19:07:58 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[References]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1195</guid>
		<description><![CDATA[Given the trouble I got into after my post on the Japan earthquake, I probably should stay put when it comes to looking at data on hazardous events &#8230; More seriously, as statistician (or data analyst in general) we often lack the expertise from the domain expert, who usually collected the data. Today, in a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" title="Tornado" src="http://www.theusRus.de/Blog-files/torn.png" alt="" width="575" height="207" /></p>
<p>Given the trouble I got into after my post on the <a href="http://www.theusrus.de/blog/japan-earthquake-an-exploratory-view/" target="_blank">Japan earthquake</a>, I probably should stay put when it comes to looking at data on hazardous events &#8230;</p>
<p>More seriously, as statistician (or data analyst in general) we often lack the expertise from the domain expert, who usually collected the data. Today, in a &#8220;data everywhere&#8221; world, we are in the fortunate position to easily access interesting data from various domains, but probably don&#8217;t know much about the background.</p>
<p>Thus I was happy to see the three posts</p>
<ul>
<li><a href="http://drjimmyc.blogspot.com/2011/12/killer-tornado-perspective.html" target="_blank">Killer tornado perspective</a></li>
<li><a href="http://drjimmyc.blogspot.com/2012/01/tornado-reports-year-in-review.html" target="_blank">Tornado reports year in review</a></li>
<li><a href="http://drjimmyc.blogspot.com/2012/01/tornado-days.html" target="_blank">Tornado days</a></li>
</ul>
<p>on Jim&#8217;s blog. As Jim has a BS from SUNYA in Atmospheric Sciences, MS from FSU in Meteorology, and a PhD from ISU in Agricultural Meteorology, I am pretty sure he knows enough about tornados to reason beyond speculations.</p>
<p>You can find the data (5.5MB) <a href="http://www.theusRus.de/Blog-files/torn.csv" target="_blank">here</a> to play around yourself, which was compiled from this <a href="http://www.spc.noaa.gov/wcm/">NOAA website</a>. If you need a tool, you might be happy to use <a href="http://www.theusRus.de/Mondrian" target="_blank">Mondrian</a>.</p>
<p>PS: Jim agreed to write a guest post in the next few weeks, so we might learn a bit more on tornados here soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/on-twisters-and-killer-tornados/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Happy Holidays &#8230;</title>
		<link>http://www.theusrus.de/blog/happy-holidays/</link>
		<comments>http://www.theusrus.de/blog/happy-holidays/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 19:42:47 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1320</guid>
		<description><![CDATA[&#8230; which is usually Merry Christmas around here and some (still too few) Happy Chanukah. I had a good laugh when I saw Andrew&#8217;s reference to this barchart. Maybe this is the right way to teach upper management the concept of uncertainty via confidence intervals, as the concept of mistrust is surely well known in [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230; which is usually Merry Christmas around here and some (still too few) Happy Chanukah.</p>
<p>I had a good laugh when I saw Andrew&#8217;s <a href="http://andrewgelman.com/2011/12/suspicious-histograms/" target="_blank">reference</a> to this barchart.</p>
<p><img class="aligncenter" title="Happy Barchart" src="http://www.theusRus.de/Blog-files/HappyBars.png" alt="" width="560" height="440" /></p>
<p>Maybe this is the right way to teach upper management the concept of uncertainty via confidence intervals, as the concept of mistrust is surely well known in these circles.</p>
<p>But to leave you with a bit of Christmas feeling, here is a great version of the ancient, latin Christmas hymn &#8220;Veni, Veni Emmanuel!&#8221;. Although I am not a particular bluegrass fan, I have to admit that this version is far closer to the original intention then what many well-meaning church choirs around will deliver.</p>
<p>-Enjoy!</p>
<p><iframe src="http://www.youtube.com/embed/p9Z-4H39BCM" frameborder="0" width="560" height="315"></iframe></p>
<p>Note: The video was recorded on Canon 5D Mark II, which gives you (given the right lenses) impressive depth of field effects &#8211; given you have the money to own a 5D MII.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/happy-holidays/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EU Debt Crisis &#8211; What Crisis?</title>
		<link>http://www.theusrus.de/blog/eu-debt-crisis-what-crisis/</link>
		<comments>http://www.theusrus.de/blog/eu-debt-crisis-what-crisis/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 19:53:56 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[The Good & the Bad]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1314</guid>
		<description><![CDATA[Following the news and trying to understand what is going on in the &#8220;EU debt crisis&#8221; is a hard job and maybe a good visualization can help. At first sight the BBC did it. Eurozone debt web: Who owes what to whom? shows nicely how the relation between the most &#8220;interesting&#8221; debtors and creditors in the [...]]]></description>
			<content:encoded><![CDATA[<p>Following the news and trying to understand what is going on in the &#8220;EU debt crisis&#8221; is a hard job and maybe a good visualization can help. At first sight the BBC did it. <a href="http://www.bbc.co.uk/news/business-15748696" target="_blank">Eurozone debt web: Who owes what to whom?</a> shows nicely how the relation between the most &#8220;interesting&#8221; debtors and creditors in the EU (spiced up with the US and Japan) is.</p>
<p><a href="http://www.bbc.co.uk/news/business-15748696"><img class="aligncenter" title="EU Debt Web" src="http://www.theusRus.de/Blog-files/BBCEUDebtWeb.png" alt="" width="560" height="490" /></a></p>
<p>There is also a short explanation to each country&#8217;s situation in relation to its GDP right of the graph, but that is full of interpretations and &#8220;insights&#8221; which hardly match with the figures in the graph.</p>
<p>After spinning around in the debt web, I keyed in all the <a href="http://www.theusRus.de/Blog-files/EU-Debt.txt">data</a>, and can now create the debt matrix:</p>
<p><a href="http://www.theusRus.de/Blog-files/EUDebtMatrix.png"><img class="aligncenter" title="The EU debt relations in a fluctuation diagram" src="http://www.theusRus.de/Blog-files/EUDebtMatrixSmall.png" alt="" width="551" height="582" /></a></p>
<p>I am not sure how much more I can seen now, but I see it now all at once at least. Surprisingly (or maybe not surprisingly) the two countries which would trouble me most, are not within the Euro-Zone and don&#8217;t seem to be part of any concerns: UK and US.</p>
<p>One last graph which looks at the influence of the highlighted countries, which already called for help and thus have quite some potential of defaulting on their debts:</p>
<p><img class="aligncenter" title="Creditors sorted according to the share of troubled debt." src="http://www.theusRus.de/Blog-files/UnderPressure.png" alt="" width="250" height="378" /></p>
<p>The barchart shows creditors sorted according to the share of troubled debt &#8211; though I don&#8217;t feel enlightened enough to draw any immediate conclusion form this result &#8230; I guess the data does only show a small part of what&#8217;s going on and no matter how we visualize it, we are not really getting more insight into the crisis.</p>
<p>Maybe it takes another post with more / better data &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/eu-debt-crisis-what-crisis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>We know what you like &#8211; do you?</title>
		<link>http://www.theusrus.de/blog/we-know-what-you-like-do-you/</link>
		<comments>http://www.theusrus.de/blog/we-know-what-you-like-do-you/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 19:45:34 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1301</guid>
		<description><![CDATA[It&#8217;s been a while since Georgios sent me the link to this interesting &#8220;psychogram&#8221; of iOS users vs. Android users. In the first place I thought the really bad thing (but maybe also amusing thing) of this &#8220;analysis&#8221; is the fact that some sample has been pushed through some multivariate statistical procedure and generated some [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been a while since Georgios sent me the link to this interesting &#8220;psychogram&#8221; of iOS users vs. Android users.</p>
<p><a href="http://aka-img-2.h-img.com/media/img/blog/droid-vs-ios.png"><img class="aligncenter" title="Android vs iOS Users" src="http://www.theusRus.de/Blog-files/DroidVSiOS.png" alt="" width="496" height="622" /></a><br />
In the first place I thought the really bad thing (but maybe also amusing thing) of this &#8220;analysis&#8221; is the fact that some sample has been pushed through some multivariate statistical procedure and generated some output &#8211; many opportunities for failure and no idea about significance. While this kind of &#8220;analysis&#8221; (how did you find yourself in the two worlds?) might be somewhat frightening, the real frightening thing is the site, which generated the data.</p>
<p><center><br />
<iframe src="http://player.vimeo.com/video/27582879?title=0&amp;byline=0" frameborder="0" width="400" height="225"></iframe></p>
<p></center>Hunch.com is a site which gives you automated recommendations about things you (apparently) like, using some &#8220;psychogram&#8221; questions and sniffing your social network neighborhood. From a statistical or machine learning point of view the task is clear: classification and prediction; from a personal point of view it might feel a bit disconcerting. Each individual, no matter how smart or dumb, is far more nuanced than the few dimensions set up in the model hunch.com might use. In the end, hunch.com does not do this out of pure altruism, they want to sell you stuff you otherwise would not have bought which makes them put us into categories we probably don&#8217;t fit into.</p>
<p>Statistics can be of great help in many places, but we should not actively hand over our interests to the results of some data mining algorithm.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/we-know-what-you-like-do-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Good &amp; the Bad [12/2011]</title>
		<link>http://www.theusrus.de/blog/the-good-the-bad-122011/</link>
		<comments>http://www.theusrus.de/blog/the-good-the-bad-122011/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 20:40:47 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[The Good & the Bad]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1289</guid>
		<description><![CDATA[This was not meant to be a Good &#38; Bad, but it turned out, that the argument is most effective, when it goes beyond pure criticism and actually offers alternative &#8211; so we need a Good. We find this nice illustration of German energy data at the GE visualization site: This kind of visualization is [...]]]></description>
			<content:encoded><![CDATA[<p>This was not meant to be a Good &amp; Bad, but it turned out, that the argument is most effective, when it goes beyond pure criticism and actually offers alternative &#8211; so we need a Good.</p>
<p>We find this nice <a href="http://visualization.geblogs.com/visualization/germanenergy/" target="_blank">illustration</a> of German energy data at the GE visualization <a href="http://visualization.geblogs.com/" target="_blank">site</a>:<br />
<a href="http://www.theusRus.de/Blog-files/GermanEnergy.png"><img class="aligncenter" title="GE Vis of German Energy Mix" src="http://www.theusRus.de/Blog-files/GermanEnergySmall.png" alt="" width="491" height="314" /></a>This kind of visualization is quite common now and had its &#8220;initial public offering&#8221; with &#8220;<a href="http://babynamewizard.com/voyager" target="_blank">The Baby Name Wizard</a>&#8221; by Martin Wattenberg. The stacked display has some issues (which can make it to &#8220;a Bad&#8221;) and it takes a careful construction to make sure it is well readable (it actually &#8220;only&#8221; needs the right stacking order &#8211; if there is one). What struck me with above graphics was the fact, that none of the bands is actually aligned at some sort of straight base &#8211; typically the x-axis in a plot. As a consequence it is really hard to tell the story behind the data. Most frustrating, the most recent data is extremely jiggling which makes a judgement of the current trend almost impossible.</p>
<p>It took me a while to get the <a href="http://www.theusRus.de/Blog-files/energymix.txt">data</a> out of the visualization, but you can actually download the whole visualization <a href="http://www.visualizing.org/dl/35136" target="_blank">here</a>. My first attempt to understand the data better was using simple time series which I created by &#8220;misusing&#8221; a parallel coordinate plot:<br />
<a href="http://www.theusRus.de/Blog-files/GermanEnergyTS.png"><img class="aligncenter" title="German Energy Mix as Time Series" src="http://www.theusRus.de/Blog-files/GermanEnergyTSsmall.png" alt="" width="568" height="264" /></a>What we lose is the total, as the series are no longer stacked &#8211; though, it was quite hard to judge the total in the original visualization as well. The barchart is used as a reference and shows the most recent distribution. What can we learn from this graph:</p>
<ol>
<li>Well, there was the oil crisis in 1973 &#8211; God knows what would have happened without the crisis stopping this ridicules greed for oil in the early 70s.</li>
<li>The second oil crisis in 1979 was actually having a real impact, as the decline in oil consumption lasted for four years and since then stayed on a lower level &#8211; quite contrary to the crisis in 1973.</li>
<li>Germany abandoned half of the brown coal sources shortly after the reunification.</li>
<li>Nuclear energy stalled in 2000 and is now on a (projected) decline.</li>
<li>Renewable energy sources are the only ones with a significant growth, but it still takes a long way to supersede oil and gas.</li>
<li>Coal is declining steadily.</li>
</ol>
<p>You certainly can read off all the topics from the GE-visualization, but you probably would need to know these fact before, which is certainly the wrong way, as a visualization should generate insight and not visualize already existing knowledge.</p>
<p>PS: I tried to find a good stacking order, but after 30min. moving series up and down it looked like there is none.</p>
<p>PPS: There is a quite similar post <a href="http://www.theusrus.de/blog/the-good-and-the-bad-chicken-and-egg-problem/">here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/the-good-the-bad-122011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding Area Based Plots: Mosaic Plots</title>
		<link>http://www.theusrus.de/blog/understanding-mosaic-plots/</link>
		<comments>http://www.theusrus.de/blog/understanding-mosaic-plots/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 20:22:30 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[Fundamentals of Graphical Data Analysis]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Stat. Graphics 101]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=679</guid>
		<description><![CDATA[Mosaic Plots are the swiss army knife of categorical data displays. Whereas bar charts are stuck in their univariate limits, mosaic plots and their variants open up the powerful visualization of multivariate categorical data. But let&#8217;s start with an introductory example. The Titanic data is still the most convincing application of mosaic plots, though many [...]]]></description>
			<content:encoded><![CDATA[<p>Mosaic Plots are the swiss army knife of categorical data displays. Whereas bar charts are stuck in their univariate limits, mosaic plots and their variants open up the powerful visualization of multivariate categorical data.</p>
<p>But let&#8217;s start with an introductory example. The <a href="http://www.theusRus.de/Blog-files/Titanic.txt" target="_blank">Titanic data</a> is still the most convincing application of mosaic plots, though many of us saw this example over and over again &#8211; I will show other examples as well once we are done with it.</p>
<p><img class="aligncenter" title="A simple 2-dim. mosaic plot" src="http://www.theusRus.de/Blog-files/Mosaic1.png" alt="" width="537" height="533" />Above example starts with a simple bar chart of passengers by class at the top left, with all surviving passengers highlighted (I guess everybody is familiar with what happened to the Titanic &#8230;). The top right plot modifies the bar chart such that we can compare the highlighted proportions, i.e., the proportionality of width and height is interchanged, without changing the highlighting direction. We call this plot a spineplot.</p>
<p>With a spineplot, we are almost there for a 2-dim. mosaic plot, shown at the bottom of above graphic. Now we can derive the general building principle of a mosaic plot. <strong>We start with a blank rectangle and recursively split each tile according to the conditional distribution of the variable to add within that tile</strong>, e.g., we split the whole according to the distribution of class, and each class according to the second variable &#8211; in our case survived.</p>
<p>Leaving the survival information as highlighting, we can recursively split Class by Age and Gender and get the classical Titanic mosaic plot:</p>
<p><img class="aligncenter" title="The " src="http://www.theusRus.de/Blog-files/Mosaic2.png" alt="" width="485" height="455" /></p>
<p>I guess it won&#8217;t take you long to find the &#8220;Women and Children first!&#8221; in the plot &#8230;</p>
<p>Now it is easy to see the fundamental difference to <a href="http://www.theusrus.de/blog/understanding-tree-maps/" target="_blank">tree maps</a>. Whereas in a tree map, we may split each node according to an individual criterion, the &#8220;tree&#8221; behind a mosaic plot is always fully balanced and the splits on a specific level are always according to the distribution of one fixed variable.</p>
<p>On the highest level, there are basically two general uses of mosaic plots.</p>
<ol>
<li><strong>Conditional Distributions</strong><br />
Looking at a single response (like survival in the above example) or an interaction, conditioned on (or given a) set of variables (class x age x sex)</li>
<li><strong>Structural properties of high-dim. categorical data</strong><br />
Often we need to understand the general structure of a high-dim. categorical datasets in terms of finding empty or very small combinations, the dominating classes, or trends and patterns in the data.<br />
In this case we can make use of the numerous variations of mosaic plots (see, e.g., <a href="http://www.theusrus.de/blog/parallel-sets-vs-mosaic-plots-take-i/" target="_blank">here</a> for a Multiple Barchart), which mostly leave the strict area proportional constraint (which we need in 1.) and move to a matrix like layout (see <a href="http://www.springerlink.com/content/w05p79j36tm6k8q8/" target="_blank">Heike&#8217;s paper</a> on more details, or try them out in <a href="http://www.rosuda.org/Mondrian" target="_blank">Mondrian</a>. See also Alex&#8217;s <a href="http://www.warwick.ac.uk/statsdept/useR-2011/abstracts/pilhoefer.pdf" target="_blank">RMB-plots</a> as latest contribution to this class of plots.)</li>
</ol>
<p>Let me give you two more examples of mosaic plots. The first is using longitudinal categorical data on <a href="http://www.theusRus.de/Blog-files/Respiratory.txt" target="_blank">respiratory diseases</a>.</p>
<p><img class="aligncenter" title="A longitudinal categorical dataset" src="http://www.theusRus.de/Blog-files/Mosaic3.png" alt="" width="586" height="268" /></p>
<p>For five points in time we see the different development of the disease depending on gender and kind of treatment, with highlighted cases marking patients with a &#8220;good&#8221; status. We see the highest discrimination between the treatments for t(2) for female patients and t(3) for male patients, and a decreasing effect for t(4) for both genders.</p>
<p>I will close with showing Simpson&#8217;s Paradox with the famous <a href="http://www.theusRus.de/Blog-files/Berkeley.txt" target="_blank">Berkeley admission</a> data using mosaic plots:</p>
<p><img class="aligncenter" title="Berkeley Admission Data in mosaic plots" src="http://www.theusRus.de/Blog-files/Mosaic4.png" alt="" width="554" height="665" />The mosaic plot of gender with admitted students highlighted (left) shows clearly that the proportion of females is smaller than the one of males. If we split up by department (lower right plot) the share of admitted students is almost completely balanced for departments B-F and even higher for females in department A.</p>
<p>I leave it to the reader to find a neat verbal explanation of what is going one here (as this post is already way too long &#8230;), but so much can be said: it has to do with the proportion of females and males within the different departments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/understanding-mosaic-plots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacOSX Lion: King of OS&#8217;s GUIs</title>
		<link>http://www.theusrus.de/blog/macosx-lion-king-of-os-guis/</link>
		<comments>http://www.theusrus.de/blog/macosx-lion-king-of-os-guis/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 19:25:47 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[User Interface]]></category>

		<guid isPermaLink="false">http://www.theusrus.de/blog/?p=1257</guid>
		<description><![CDATA[Mac OS X Lion is now the 7th incarnation of Apple&#8217;s new operating system. Each of the version upgrades had minor additions to the graphical user interface (GUI). None of the increments did really have a big impact on how we used the OS &#8211; at least for me, things like Exposé, Spaces or the [...]]]></description>
			<content:encoded><![CDATA[<p>Mac OS X Lion is now the 7th incarnation of Apple&#8217;s new operating system. Each of the version upgrades had minor additions to the graphical user interface (GUI). None of the increments did really have a big impact on how we used the OS &#8211; at least for me, things like Exposé, Spaces or the Dashboard were functions I once in a while used, but they didn&#8217;t really add to my productivity.</p>
<p style="text-align: center;"><img class="aligncenter" title="Mac OS X Lion integrates single UI functions" src="http://www.theusRus.de/Blog-files/MacOSXLion.png" alt="" width="577" height="579" /></p>
<p>With Mission Control, we now have all things in one place, and it is only the next swipe away to reach the desired functionality. I think this is a good example, that often we only lack the last missing link to get to the point where the UI functions fall into place &#8211; all of the single functions where released in previous OS releases before, but only now it is completely natural to use them all &#8211; and not just once in a while.</p>
<p>There is certainly the &#8220;<em>one more thing</em>&#8221; regarding UI changes in Lion: <strong>Natural Scrolling</strong>. Just search for the comments you find on the web &#8211; they reach from &#8220;<em>Apple&#8217;s &#8216;natural scrolling&#8217; feels horribly unnatural. Here&#8217;s why</em>.&#8221; to &#8220;<em>Wow, Everyone&#8217;s Complaining About &#8220;Natural Scrolling&#8221; In OS X Lion</em>&#8220;. Well to be honest, it took me a few days to adopt as well, but once you are &#8220;over it&#8221;, it just works fine (even switching back and forth between the scroll wheel on my Win PC at work and my Mac at home). It is amazing how conservative people are regarding the way they use their computer &#8211; even if it is wrong. If we did it wrong for ten years, it has to stay that way &#8230; And there is no doubt about the fact, that there is really no physical metaphor behind the direction we used the scroll wheel so far &#8211; someone just programmed it this way and we used it.</p>
<p>Removing the scroll bars seems to be a comparatively small interference to the user&#8217;s expectation &#8211; still having enough potential to stir users up.</p>
<p>To sum up, with Lion we see how little progress we made with UI improvements in the last decades &#8211; but if we really leap forward, we feel the resistive force in the user base &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theusrus.de/blog/macosx-lion-king-of-os-guis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

