Thursday, October 6, 2016

Margin of Error

There are a lot of “what ifs” in political polling. For those on either side of the aisle who are smug in their likelihood of winning whichever public office is being contested based on poll results, perhaps there is a Dewey vs Truman reality they need to face (a 1948 Chicago Tribune headline incorrectly announced Thomas Dewey in correctly as the presidential winner). Do folks in some communities and some demographics actually tell the truth when ask whom they support? In some strata, where Hillary generally prevails, telling anyone you are a Trump supporter would make you an instant pariah, and as I have blogged, there are regions where the words, “Hillary Clinton,” are the functional equivalent of a four letter vituperative. And there a lots of additional variables that can make a very big difference.
In an Op-Ed piece in the October 5th New York Times, David Rothschild (a Microsoft Research economist) and Sharad Goel (a Stanford assistant professor) examine statistic realities in the American political arena. The general assumption is that such polling has an expected error rate of plus or minus three percentage points. Rothschild and Goel analyze what that assumption really means: “As anyone who follows election polling can tell you, when you survey 1,000 people, the margin of error is plus or minus three percentage points. This roughly means that 95 percent of the time, the survey estimate should be within three percentage points of the true answer.
“If 54 percent of people support Hillary Clinton, the survey estimate might be as high as 57 percent or as low as 51 percent, but it is unlikely to be 49 percent. This truism of modern polling, heralded as one of the great success stories of statistics, is included in textbooks and taught in college classes, including our own.” Except, Rothschild and Goel observe, when you really drill down on the statistical history of political polling accuracy, that relatively small three percent variation may actually be a whole lot more like a whopping seven percentage point differential. They cite their recent, highly-credible research to sustain that higher number:
“In a new paper with Andrew Gelman [professor of statistics and political science and director of the Applied Statistics Center at Columbia University] and Houshmand Shirani-Mehr [a Stanford graduate student], we examined 4,221 late-campaign polls — every public poll we could find — for 608 state-level presidential, Senate and governor’s races between 1998 and 2014. Comparing those polls’ results with actual electoral results, we find the historical margin of error is plus or minus six to seven percentage points. (Yes, that’s an error range of 12 to 14 points, not the typically reported 6 or 7.)
“What explains this big gap between the stated and observed error rates? All the polls we analyzed were conducted during the final three weeks of the campaign, by which time any changes in aggregate public opinion were very small, so the disparity is not simply driven by changes in sentiment as an election progresses.
“Let’s start with the stated margin of error, which captures sampling variation: error that occurs because surveys are based on only a subset of the full population of likely voters. Even if this sample of respondents is selected randomly from the full population, it is not a perfect representation of attitudes in the full population. That’s where the usual 3 percent error rate comes from.
“But the stated margin of error misses other important forms of error. Frame error occurs when there is a mismatch between the people who are possibly included in the poll (the sampling frame) and the true target population.
“For example, for phone-based surveys, people without phones would never be included in any sample. Of particular import for election surveys, the sampling frame includes many adults who are not likely to vote. Pollsters try to correct for this by using likely-voter screens — typically asking respondents if they will vote — but this screen itself can introduce error that can at times be larger than the bias it was intended to correct.
“And then there is nonresponse error, when the likelihood of responding to a survey is systematically related to how one would have answered the survey. For example, as another one of our papers shows, supporters of the trailing candidate are less likely to respond to surveys, biasing the result in favor of the more popular politician.
“A similar effect probably explains part of Mrs. Clinton’s recent dip in the polls, as Democrats became less enthusiastic about answering surveys when she appeared to be struggling. With nonresponse rates exceeding 90 percent for election surveys, this is a growing concern.
“Finally, there is error in the analysis phase. In one example, as Nate Cohn showed in an Upshot article, four pollsters arrived at different estimates even when starting from the same raw polling data.”
So what’s the lesson here? If you think your candidate is so far ahead in the polls that your vote doesn’t matter… think again. VOTE!
I’m Peter Dekom, and conventional wisdom is not always reality… and so ask questions, lots and lots of questions.

No comments: