Wednesday, April 13, 2005

Lying with Numbers

If at first it doesn't fit, fit, fit again.
--John McPhee

Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital.
--Aaron Levenstein

There are two kinds of statistics, the kind you look up and the kind you make up.
--Rex Stout

If you can't tell by now... this post is going to be about statistics. As part of my job I review the medical literature and critically evaluate the articles, including the statistical methods and analysis. I've known for a long time of the article discussed here at Slate. The Slate article is really good at describing all of the horrible errors contained within the Lancet trial that tried to estimate the number of dead Iraqi civilians since the start of the war. The author is tough on the Lancet authors, but here and there he also cuts them slack. Go read the Slate article and come back to talk about confidence intervals. I'll wait... haven't you heard? I'm just up here in Rhode Island killing time anyway.

Here's a point I wanted to expand upon:
The report's authors derive this figure [100,000] by estimating how many Iraqis died in a 14-month period before the U.S. invasion, conducting surveys on how many died in a similar period after the invasion began (more on those surveys later), and subtracting the difference. That difference—the number of "extra" deaths in the post-invasion period—signifies the war's toll. That number is 98,000. But read the passage that cites the calculation more fully:

We estimate there were 98,000 extra deaths (95% CI 8000-194 000) during the post-war period.

Readers who are accustomed to perusing statistical documents know what the set of numbers in the parentheses means. For the other 99.9 percent of you, I'll spell it out in plain English—which, disturbingly, the study never does. It means that the authors are 95 percent confident that the war-caused deaths totaled some number between 8,000 and 194,000. (The number cited in plain language—98,000—is roughly at the halfway point in this absurdly vast range.)

I did not particularly like the way he defined a confidence interval. The way I would have described it to my students is that the authors are 95% confident that the true value [ie # of dead Iraqis] is between 8,000 to 194,000. Here's the kicker: Each and every value represented in that range is equally likely to be the true value of the variable we are studying. So 8000 is as likely to be correct as 194,000. To take the mid-point of the confidence interval and report it to the press as the number or even the more likely number is so dishonest for a supposed scientist... it is truly beyond the pale.

Now to continue with the definition of confidence intervals:
The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter (see precision). A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter.
Now that we all know the true meaning of a confidence interval, I'm sure we can all agree that this study is complete and total crap.

I had a very, very low opinion of the Lancet before this article, and now... maybe they can go and work with the United Nations to help them cook their books. Oh wait... That's already been uncovered as well.

Hat tip to Beautiful Atrocities for the reminder.