Choosing the right range


In the summer months, it never seems to get as cool as the daily low in the weather forecast would suggest. Those 65 degree nights always end up hotter and muggier than I expect. Thinking about this a bit, the reason is fairly obvious: the day’s lowest temperature usually occurs in the early morning hours when virtually nobody is awake, let alone outside to experience it.

With that in mind, the whole concept of the day’s high and low temperature is somewhat flawed. Wouldn’t it make more sense to report the ranges for when most people are actually awake? Focusing on the highs and lows for the period of say 6 am to midnight should cover most of the population, and would be a lot more useful for helping us decide how to dress on a given day. The forecast can still provide the overnight low for those who are curious, but without the emphasis that it receives today.

More generally, the problem here is that you’re starting with a wealth of data, and the easy choice is just to report the full extent of that data set. In the weather example, this means temperatures across the full 24-hour period. However, it turns out that a subset of the range is what most people care about. For this audience, generating summary data based on the full scope of the variations can easily be misleading. So, the next time someone asks you to summarize a set of data — whether it’s daily temperatures, sales numbers, or airline delays — be sure that the range you’re studying is what the audience actually cares about most.