Ever since Trump won the election , the question I have faced the most from family and friends is “is predictive analytics dead?”. I also got asked if Watson would have picked the correct winner . The more savvy doubts were about how Clinton missed the trends in places like Wisconsin and Michigan .
Here are my thoughts – and pls treat them as my personal opinions only as always !
To begin with – the analytics was not all wrong , and did many things right . It also did many things wrong . Rather than saying data science is dead , I think all it really is that it’s cloudy and some work needs to be done to make it less cloudy .
The thing we forget the most about data science is that it is all about odds . When Nate Silver said Trump had 35% chance of winning – he meant exactly that ! Having about 2/3 chance of winning for Clinton should not have been interpreted as Clinton will win ! This problem is one I face every day with my clients too on all kinds of predictive scenarios . It’s not a binary thing as we like it to be in most cases .
That said , the predictive models all had given significant odds for Clinton and now we know something was wrong with them . So yes – data science on politics should absolutely take some significant blame for what they missed .
To begin with – All analytics about people are hard . I wrote about it few weeks ago here .
Models are based on history and assumptions to give them context . It’s not uncommon in this business for calibration to go out of whack – usually because context changes , but the model continues to depend on old assumptions . Since all public analysis of this election trended the same way – I guess we can safely say that “establishment thinking” about polls needs an overhaul .
Then there is the actual data itself that comes from polls and the bias ( like selection bias , confirmation bias etc ) that gets associated with it . I often post twitter polls to get a pulse on topics I care about – and I should know about the selection bias when I look at the results . People who collected and analyzed the data should have been way more careful about bias .
Pollsters need to know the markets they are polling . Respondents don’t always literally say what they mean . This is nothing new – any kind of market research would have run into this scenario and there are ways to get around it . When I have done collection and analysis about foreign markets using folks who are technical experts , but largely ignorant of those markets – I have always had poor results . I have a feeling that a lot of polling was “lazy” this time around in election season . For example – if your call list only has landline numbers , you won’t know what I have to say ( I haven’t had a land line for quite some time and I am hardly alone in that ).
Weather forecasting is something we are all familiar with since it’s been around for a long time . However , our ability to accurately predict beyond the next week or ten days is actually not that high . Little events can change weather big time. If we extend that thought to how the sex tapes and FBI actions all came back to back – we probably can have some sympathy for the statisticians who had to deal with the data .
Even if all the models worked well , late happening events – like FBI director’s two notes to Congress – don’t leave a lot of room to actually act on what the model tells you . We were recently working on predictive maintenance solution at a client . The maintenance VP was very clear that if all I can give him is a 2 day window with failure prediction , there isn’t a whole lot he can do to avoid down time . While I don’t know for sure – I wouldn’t mind making a small bet that analytics used by Clinton campaign probably highlighted the issues of Michigan and Wisconsin , just that it was too late to do anything about it .
I am sure I am missing several other aspects – and some technical aspects are probably too boring for most of my usual readers – but I think I have given a fair idea of the thoughts I have on this topic . I am sure you will add more , or correct me in the comments .
Some changes in the polling and predictions industry is needed , but we just need to try to NOT throw the baby with the bath water . And while I am the biggest fan of Watson , I don’t really know for sure if it would have done better . Knowing what went wrong this time – I am sure this industry will use it to its advantage and reclaim its position quite quickly .
Parting thought – for all my pals who think AI will take over the world soon , this might be worth noting that for foreseeable future these models will need significant human help to be useful . It’s man AND machine , and we should stop obsessing about man VS machine .
