fivethrityeight.com’s forecast bothers me – now I know why
It’s the era of “big data” (humans have always used all data at their disposal through history but that’s another blog post) and I think I have found a big problem with one of big data’s biggest reporting hubs: 538.
They are cherrypicking the polls they use.
fivethrityeight.com is a site by Nate Silver and they do a pretty good job of reporting and analyzing news from a “data science” perspective. Again, journalism has always been about ‘data’ but I digress.
538’s “Election Forecast” is a well made site that gives users 3 different ‘views’ of 538 projections based on how they are made: http://projects.fivethirtyeight.com/2016-election-forecast/florida/
Here are the categories:
- Polls Plus – 538’s proprietary research that weights (changes) poll results according to 538’s factors they thought up, like how a state voted historically or how well they rate the poll. More relevant factors get more ‘weight’ meaning.
- Polls Only – Just a poll of polls, no ‘weight’ from 538 data analysts
- Now-cast – Highly chaotic, based on flash polls and other sources of quick polling
How does 538 decide what polls to include in their ‘poll of polls’?
That’s the question of course. Big Data nerds will never admit this but no matter what, *the data analyst* makes subjective decisions on *how* to go about analyzing the data. It’s far from unbiased. Numbers don’t like, but people interpreting the numbers are apt to ‘lie’ or screw up just as any other human.
So why do I think 538 might be cherrypicking polls?
In the pic above, a highly weighted poll by Sienna College showing a +6 for Trump is listed as “new”.
However, browse over to the Sienna College site and see they have been polling for at least since March. Here’s their list of all polls: https://www.siena.edu/news-events/news-archive/category/sri-political
Why start including Sienna College now and not before?
I’m 100% sure they have an answer, and there’s a good chance it actually explains their choice here.
However, this is lesson in understanding polls. They are complex but they can be understood and their flaws are rooted in the same thing all system flaws are: human choice.