Is data science doomed with Trump being elected ?

Ever since Trump won the election , the question I have faced the most from family and friends is “is predictive analytics dead?”. I also got asked if Watson would have picked the correct winner . The more savvy doubts were about how Clinton missed the trends in places like Wisconsin and Michigan .

Here are my thoughts – and pls treat them as my personal opinions only as always !
To begin with – the analytics was not all wrong , and did many things right . It also did many things wrong . Rather than saying data science is dead , I think all it really is that it’s cloudy and some work needs to be done to make it less cloudy .
The thing we forget the most about data science is that it is all about odds . When Nate Silver said Trump had 35% chance of winning – he meant exactly that ! Having about 2/3 chance of winning for Clinton should not have been interpreted as Clinton will win ! This problem is one I face every day with my clients too on all kinds of predictive scenarios . It’s not a binary thing as we like it to be in most cases .

That said , the predictive models all had given significant odds for Clinton and now we know something was wrong with them . So yes – data science on politics should absolutely take some significant blame for what they missed .

To begin with – All analytics about people are hard . I wrote about it few weeks ago here .

Models are based on history and assumptions to give them context . It’s not uncommon in this business for calibration to go out of whack – usually because context changes , but the model continues to depend on old assumptions . Since all public analysis of this election trended the same way – I guess we can safely say that “establishment thinking” about polls needs an overhaul .

Then there is the actual data itself that comes from polls and the bias ( like selection bias , confirmation bias etc ) that gets associated with it . I often post twitter polls to get a pulse on topics I care about – and I should know about the selection bias when I look at the results . People who collected and analyzed the data should have been way more careful about bias .

Pollsters need to know the markets they are polling . Respondents don’t always literally say what they mean . This is nothing new – any kind of market research would have run into this scenario and there are ways to get around it . When I have done collection and analysis about foreign markets using folks who are technical experts , but largely ignorant of those markets – I have always had poor results . I have a feeling that a lot of polling was “lazy” this time around in election season . For example – if your call list only has landline numbers , you won’t know what I have to say ( I haven’t had a land line for quite some time and I am hardly alone in that ).

Weather forecasting is something we are all familiar with since it’s been around for a long time . However , our ability to accurately predict beyond the next week or ten days is actually not that high . Little events can change weather big time. If we extend that thought to how the sex tapes and FBI actions all came back to back – we probably can have some sympathy for the statisticians who had to deal with the data .

Even if all the models worked well , late happening events – like FBI director’s two notes to Congress – don’t leave a lot of room to actually act on what the model tells you . We were recently working on predictive maintenance solution at a client . The maintenance VP was very clear that if all I can give him is a 2 day window with failure prediction , there isn’t a whole lot he can do to avoid down time . While I don’t know for sure – I wouldn’t mind making a small bet that analytics used by Clinton campaign probably highlighted the issues of Michigan and Wisconsin , just that it was too late to do anything about it .

I am sure I am missing several other aspects – and some technical aspects are probably too boring for most of my usual readers – but I think I have given a fair idea of the thoughts I have on this topic . I am sure you will add more , or correct me in the comments .

Some changes in the polling and predictions industry is needed , but we just need to try to NOT throw the baby with the bath water . And while I am the biggest fan of Watson , I don’t really know for sure if it would have done better . Knowing what went wrong this time – I am sure this industry will use it to its advantage and reclaim its position quite quickly .

Parting thought – for all my pals who think AI will take over the world soon , this might be worth noting that for foreseeable future these models will need significant human help to be useful . It’s man AND machine , and we should stop obsessing about man VS machine .

2 thoughts on “Is data science doomed with Trump being elected ?”

Nice analysis.

What I found most interesting is that the respondents responses during pre election polling and exit polling did not always statistically align with the results in some states. I found this article interesting http://www.politico.com/story/2016/11/how-did-everyone-get-2016-wrong-presidential-election-231036 … mostly because it surmises that many respondents were embarrassed to tell the truth. I guess we need to factor in a polarization effect when a candidate or candidates are so decisive.

I would also be curious to study the enthusiasm effect as another leading indicator. I have no idea if this is factual, but the media routinely highlighted that Trump rallies were attended in the 10,000 range and Clinton rallies were attended in the 1,000 range. To me that could be an indicator for strong voter turnout if it was true.

I also think that the media fails to understand the ramification of the term “margin of error”. I noticed that many swing state polls had the candidates virtually tied with a large margin or error just before the election. Something like Trump 48% and Clinton 49% with a +/- 5% margin of error. That’s a big gap that should translate to “we have no idea who will win that state”.

LikeLike

Great analysis.
Despite best efforts, accuracy of poll predictions were around 70% in India. It becomes even more difficult in a tight contest where even minor variants in survey could make a big difference.
It also requires objective assessment from those not affected by the outcome. Media had substantially bet on HRC and they were looking everything from this bias (and it was extremely visible)

From India, Subramanian Swamy, R Vaidyanathan and MD Nalpat have been clearly outlining that Trump has better chance of winning due to the current political and economic situation (though they at times mentioned as close call) and they all mentioned that he would win just before election and reiterated before counting. I feel expert judgement / intuition can’t be ignored despite the best predictive analytics, as humans have to take the final call and calibrate responses in critical situations.

LikeLike