Saturday, October 30, 2010

Statistics, stories, and their tensions

The New York Times's Opinionator featured an article, John Allen Paulos' "Stories vs. Statistics", that caught my attention. I'll quote at length, if you'll forgive me for that.

[T]he notions of probability and statistics are not alien to storytelling. From the earliest of recorded histories there were glimmerings of these concepts, which were reflected in everyday words and stories. Consider the notions of central tendency — average, median, mode, to name a few. They most certainly grew out of workaday activities and led to words such as (in English) “usual,” “typical.” “customary,” “most,” “standard,” “expected,” “normal,” “ordinary,” “medium,” “commonplace,” “so-so,” and so on. The same is true about the notions of statistical variation — standard deviation, variance, and the like. Words such as “unusual,” “peculiar,” “strange,” “original,” “extreme,” “special,” “unlike,” “deviant,” “dissimilar” and “different” come to mind. It is hard to imagine even prehistoric humans not possessing some sort of rudimentary idea of the typical or of the unusual. Any situation or entity — storms, animals, rocks — that recurred again and again would, it seems, lead naturally to these notions. These and other fundamentally scientific concepts have in one way or another been embedded in the very idea of what a story is — an event distinctive enough to merit retelling — from cave paintings to “Gilgamesh” to “The Canterbury Tales,” onward.

With regard to informal statistics we’re a bit like Moliere’s character, who was shocked to find that he’d been speaking prose his whole life.

The idea of probability itself is present in such words as “chance,” “likelihood,” “fate,” “odds,” “gods,” “fortune,” “luck,” “happenstance,” “random,” and many others. A mere acceptance of the idea of alternative possibilities almost entails some notion of probability, since some alternatives will be come to be judged more likely than others. Likewise, the idea of sampling is implicit in words like “instance,” “case,” “example,” “cross-section,” “specimen” and “swatch,” and that of correlation is reflected in “connection,” “relation,” “linkage,” “conjunction,” “dependence” and the ever too ready “cause.” Even hypothesis testing and Bayesian analysis possess linguistic echoes in common phrases and ideas that are an integral part of human cognition and storytelling. With regard to informal statistics we’re a bit like Moliere’s character who was shocked to find that he’d been speaking prose his whole life.

Despite the naturalness of these notions, however, there is a tension between stories and statistics, and one under-appreciated contrast between them is simply the mindset with which we approach them. In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled. A drily named distinction from formal statistics is relevant: we’re said to commit a Type I error when we observe something that is not really there and a Type II error when we fail to observe something that is there. There is no way to always avoid both types, and we have different error thresholds in different endeavors, but the type of error people feel more comfortable may be telling. It gives some indication of their intellectual personality type, on which side of the two cultures (or maybe two coutures) divide they’re most comfortable.

People who love to be entertained and beguiled or who particularly wish to avoid making a Type II error might be more apt to prefer stories to statistics. Those who don’t particularly like being entertained or beguiled or who fear the prospect of making a Type I error might be more apt to prefer statistics to stories. The distinction is not unrelated to that between those (61.389% of us) who view numbers in a story as providing rhetorical decoration and those who view them as providing clarifying information.

The so-called “conjunction fallacy” suggests another difference between stories and statistics. After reading a novel, it can sometimes seem odd to say that the characters in it don’t exist. The more details there are about them in a story, the more plausible the account often seems. More plausible, but less probable. In fact, the more details there are in a story, the less likely it is that the conjunction of all of them is true. Congressman Smith is known to be cash-strapped and lecherous. Which is more likely? Smith took a bribe from a lobbyist or Smith took a bribe from a lobbyist, has taken money before, and spends it on luxurious “fact-finding” trips with various pretty young interns. Despite the coherent story the second alternative begins to flesh out, the first alternative is more likely. For any statements, A, B, and C, the probability of A is always greater than the probability of A, B, and C together since whenever A, B, and C all occur, A occurs, but not vice versa.

It's worth noting that Demography Matters' bloggers are concerned with bridging the gap between statistics and narratives, trying to produce narratives of what's going on in our world that are firmly based on statistics--anecdotes illuminate and animate, yes, but we try to limit anecdotes to those highly specific roles. With my specific educational background--English and Anthropology were my two majors--I may be more inclined to emphasize the power of the narrative. Stories provide useful and--I'd argue--necessary framing, very often inspiring statistical research, explaining it, and showing how and why these statistics are used. Certainly Eurabia or hyperaged societies or explanations for low fertility can't be examined--proven, disproven, whatever--without these narratives. But then, I quite agree you also need empirical evidence. Obviously.

It can be a difficult thing to do. What are your thoughts on the issue?


Greg said...

Stories are paramount. Einstein is reputed to have said something along the lines of "if you don't have a theory, you don't know what to look for": without General Relativity, the bending of light from a distant star around the sun would have been put down to instrument error - if it was noticed at all. Similar things could be said about the microbe theory of disease, the circulation of the blood, the oxygen theory of combustion, etc., etc.

Without the story, there's nothing to direct your attention.

The special thing about scientific stories is that they are framed in a way such that they can be tested. This is where the difficult, subtle and fragile art of statistics comes in -- after observation, logic and simple mathematics have had their turn.

Colin Reid said...

We do use statistics informally, but experiments have found that the way people typically process statistical information is profoundly at odds with statisticians' models of probability or of evidence-gathering. For instance, people can judge the statement 'A occurs and B occurs' as being more likely than 'A occurs'.

So in a way, the problem with statistics is not that they lie, but our 'intuitive' thought processes are almost bound to mislead us when we hear them. Obviously, with education/training one can learn to think differently, but those who have received and taken on board this training are a small minority even in modern societies.