Statistical Graphics and more

Understanding Area Based Plots: Tree Maps

Posted on 05/31/2010, 21:49, by martin, under Fundamentals of Graphical Data Analysis, General, Stat. Graphics 101.

Tree layouts are not too uncommon in statistics. CART is build upon tree hierarchies and random forrest uses these trees extensively. Area based plots like barcharts or histograms are also well understood by most statisticians. But when it comes to joining the two concepts – which will yield treemaps – many statisticians get somewhat lost. The reason is somehow related to that fact that concurrent concepts like mosaic plots and trellis displays aka lattice graphics seem to get mixed up.
There will be a series of three posts on the three concepts from which this is the first one on

Tree Maps:

In short, a treemap is nothing else than an alternative display of a tree hierarchy. The primary reference to treemaps goes back to Ben Shneiderman’s technical report. Whereas the classical tree usually “just” shows the hierarchy, a treemap usually also encodes a quantitative attribute which is attached to the leaf nodes (and add up to the root of the tree).

Here is an example of a classification tree

This tree has 10 terminal nodes, and each node has a size proportional to the number of cases, which fall into this category. The corresponding treemap looks like this:

Note that in this tree all splits are binary, which means that in the corresponding treemap the splits always alternate between vertical and horizontal.

Squarified Tree Maps:

When treemaps are used to visualize a recursive partitioning, the number of splits and terminal nodes will be relatively small. But with arbitrary hierarchies, we will end up with potentially very many splits and/or terminal nodes. As splits on the same level will be split along the same direction, aspect ratios will get extreme. The following sketch shows the problem for 7 terminal nodes of size 6, 6, 4, 3, 2, 2, 1.

The classical layout will yield extreme aspect ratios, such that the idea of “squarification” was introduced by Bruls et al. At first sight, the idea is convincing as the generated tiles are far better to perceive. There is a drawback, though: the switching between horizontal and vertical discriminates between hierarchy levels – not so for squarified treemaps. Whereas in the above figure all nodes are on the same level. Interpreting the change between horizontal and vertical splits would result in the following hierarchy though:

duality of treemaps and tree hierarchies

Thus we need some way to distinguish between “real” splits, which define hierarchies, and “convenience” splits, which are used to improve the aspect ratios of the tiles.

Cushioned Tree Maps:

One way to achieve this distinction would be to use thicker lines for “real” splits. The more popular version is to use so called cushioned treemaps. Here is an example, depicting parts of a file system (actually the files of my talks since 1994):

Looks pretty fancy, but does only partly cure the problem.

The bottom line is that if you ever have a hard time to understand a treemap, it is most probably due to the fact of squarification, which does not properly distinguish between hierarchy splits and squarification splits.

Comment (RSS) | Trackback

One Comment

Statistical Graphics and more » Blog Archive » Understanding Area Based Plots: Trellis Displays says:

01/27/2013 at 23:51

[…] is the third and last post on area based plots. Area based was certainly true for tree maps and mosaic plots, but falls a bit short for trellis displays, such that the term “grid […]

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Understanding Area Based Plots: Tree Maps

One Comment

Leave a Reply

Most Popular posts

Categories

Recent Comments

Blogroll

Books

Archive