September 8, 2004

Pre-emptive Strike: Anticipating the Counterarguments to Hausmann and Rigobon

I've been reading and re-reading Hausmann and Rigobon's piece to try to find holes in it. I'm finding it hard. The most obvious counterargument doesn't seem to work...and the conspiracy theories needed to explain them seem particularly far-fetched.

Obvious Counterargument 1: JIJO - Junk In, Junk Out.
Chavistas have held that the opposition run exit polls were very badly conducted, and that it's a mistake to take them seriously. The first part of the Hausmann/Rigobon analysis is based partly on exit-poll results. But as everyone knows, if you start with junk data, you end up with junk results. JIJO.

You have to work through the logic of Hausmann and Rigobon's argument to see off this challenge. What their paper argues is empathically not that the exit poll results do not match the election results, but rather that the error given by the exit polls is positively correlated with the error given by the Nov. 2003 signature collection drive. As these two estimators are independent of one another - their sources of error are unrelated - the only way to account for positive correlation in their errors is fraud.

What the hell does all of that mean?

Well, start with the Nov. 2003 Signature count. In some sense, the signatures are predictors of the number of SI votes in a subsequent election. If, for instance, you collected 1000 signatures in a signing center in November, you would expect a similar number of Si votes in that voting center this year. The correlation will not be perfect, but it's something. You would not expect to get 200 SI votes in such a center, and neither would you expect 2000.

Then take the exit-polls. Again, you would expect them to be correlated - imperfectly - with the ultimate election results. There will be error, as there always is with any estimator, but the poll gives you an idea of voting intentions.

Now, what Hausmann and Rigobon have done is take these two estimators and analyze them together. And what they find is quite interesting: in some voting centers, the two estimators work quite well. But in other centers, the estimators are wrong - they predict more SI votes than CNE eventually reported. What's more, in a given center, when one estimator is wrong, the other one also tends to get that center wrong - and both in the same direction, of overestimating the number of SI votes.

Or, in their words, "in those places where the signatures are proportionally more wrong in predicting more Si votes than those obtained, the exit polls do the same."

So, say CNE reports only 800 SI votes in our hypothetical voting center that had 1000 signatures back in November. Well, ok, that might be possible - maybe 200 voters there decided to become chavistas. But imagine that the Sumate or PJ exit poll also estimates 1000 votes for that center - or 950, or 1100. Well, then you have a strange situation: it's not immediately clear why both estimators would make the same mistake in the same direction. "Because both measures are independent," they explain "what they have in common is the fraud. "

Of course, if this had happened in just one or two voting centers, you could put it down as a fluke. But Hausmann and Rigobon find it happening systematically across a whole swathe of voting centers.

They peg the chances of this pattern stemming from a random coincidence at less than 1%.

Now, what this means is that if the Sumate and PJ exit polls are wrong, they're wrong in a really funny and strange way. They're more wrong for some voting centers than for others. You could say that Sumate and PJ just happened to have some competent exit pollsters and some incompetent ones, so it makes sense that some got the results more wrong than others. But what you can't explain with that argument is why those Incompetent Pollsters ended up getting sent specifically to the voting centers where the number of Si votes also happened to be lower than one would expect from an analysis of the November 2003 signature patterns.

I can't see any easy way for CNE to explain this situation - other than asserting a wildly implausible conspiracy stretching all the way back to November 2003 where the opposition would seek to selectively manipulate both the number of signatures and the exit poll results for some voting centers, but not others, so that many months later two fancy US-based professors could keep the reasonable doubt hypothesis going. Now THAT'S UFO stuff!

So I can't see how JIJO applies to this analysis: what's at stake here is neither whether the Nov. 2003 signature collection was clean and valid nor whether the exit polls were well conducted. The mystery in need of explanation is why both of these estimators would be more wrong in some voting centers than in others, why both should coincidentally be more wrong in the same set of voting centers, and why they should both be wrong in the same direction - of undercounting the number of SI votes.

Hausmann and Rigobon can think of one neat, parsimonious little hypothesis to explain all of these questions: fraud.

If you can come up with a better one, do let us know...

Note: As statistics is not my forte, it's perfectly possible I've screwed something up in this post. Don't be shy to let me know if I did, so I can correct it:

Join a moderated debate on this post.