Pretty Pictures [vs|and|or] Hard Models

It is a common theme when statisticians look at data visualization output – they ask for the model. Although I am usually not an unconditional friend of building models (especially before you understand the data), but I feel the need for some kind of model in order to make this visualization more than just a nice picture:

I found the chart on Junk Charts but it was initially published on Wired. Here is what I commented on Kaiser’s blog:

Kaiser,

I think you already phrased the most important issue: “no insights”.

From a statistical point of view we need to ask what model do we expect behind the data. Are all issues people are calling in for more or less equally distributed and only the intensity changes over time? This is certainly too simple, as we already know that people will complain about noise more likely during nighttime.

That will lead us to a model that has certain *expected* intensities of complaints for certain times over the course of one day, estimated from a larger period of time.

To get insights of what is going on on a particular day, we then would need to plot the differences between the “model day” and the actual data.

This difference is something I keep on preaching to business people: “Don’t be surprised by the data you look at, but be surprised by the deviation of that data from your expectation!” But for an expectation you need to have at least some kind of (naive) model …

Don’t get me wrong: there is a whole lot we already can learn from the raw data, but to be alerted regarding the unexpected would be the real insight, and that would definitely be a prefect showcase for an efficient use of graphics.

3 Comments

  1. Joe Mako says:

    In addition to a comparison to a “model day”, I would think the most insight would be gained my looking at how this data changes over time.

    As with many visual displays like this, they do not provide the underlying data, and that is my biggest complaint. Without access to the underlying data in addition to the commentary, I do not fully trust the visualization, a stacked area, or stream in this case are visually misleading, and with access to the data, additional viewpoints could be created.

  2. John says:

    I don’t understand your comment about the timing of model-building: how else do you understand data, except with a model? That is, how is it possible to understand the data without a model, and if you understand the data, why do you need a(nother) model?

  3. martin says:

    Well, much of the structure, errors and artifacts in the data won’t really show up in the model. Only with a thorough exploratory analysis you will be able to create sensible models. Much of the late work of John W. Tukey is concerned with this problem and it is a good idea to read some of his papers on this topic.

Leave a Reply