I had a lengthy conversation with an old client yesterday on his plans to start some cool new projects . That is what triggered this blog
Most of you already know that I am not too big on categorizing this topic into predictive , prescriptive , cognitive and so on . I never cared about the old “is it analytics , is it BI , is it just reporting” debate either . All I care is whether data can be used to solve problems somehow . So when I say advanced analytics – all I mean is using data slightly more sophisticated than to report past performance .
On the software side – volume is no longer as big a challenge as it once was . Storage ( and RAM and CPU and …) is getting cheaper and compression is getting better . It’s still not trivial ( for example – when a semiconductor chip is produced , one optical scan can produce 10 to 30 TB of data per wafer . Even at current prices , that is expensive to store all of it) .
The hard parts are still velocity and variety . Everyone can eventually get to the same result – but competitive advantage is only for the first few who can see the result . Even within that set – only a small number can actually act of information quickly . Now if the raw data hits you really fast – there are real challenges .
Software ( whether it is app or database) is hardly optimized for read and write at the same time when the incoming data is variable . If you need to put a lot of data into your system at a high speed like say in some IOT scenarios – there are databases that are optimized for it . But those databases need extra fittings to make that data available to be analyzed in real time . There are others that can do sophisticated analysis , but they don’t always allow data to be put into it at the speed it arrives . Essentially a lot of compromises and data duplication are still daily struggles for many of us . Granted it is getting better – but it’s not there yet .
In many of my customers – even after all the software puzzles are solved , we hit a wall on network bandwidth . Cloud is sexy and all – but every hop takes a toll .
After we figure out the right way to put all the data into the places we need – then starts the analysis pieces . Between COTS and opensource , there is no shortage of software that can do the job . But the idea of democratized advanced analytics is still a distant dream .
There are many aspects to this problem
1. There just isn’t enough talent who understand statistics ( err data science I mean)to begin with
2. Generic data scientists won’t cut it . Sales data from an automobile company cannot be analyzed the same way as sales data for an aerospace company . Industry knowledge is key . That makes it even harder to get the right people on the job – and hence you need teams that have data scientists and industry experts for foreseeable future .
3. There is hardly any consistency in legal matters across the world . Now we also need lawyers in such teams to make this work in a way that no one goes to jail 😉
4. Legal does not mean ethical always (there is a surprise) . So now we need an ethicist (such people do exist) to help answer some questions on what data and what analysis is ethical . Then you might need an MBA to figure out the solution with all the constraints applied 😉
5. Even if you have all these people and all the right software , you still need to convince the customer that it’s a big production and it comes at a price , although for all the right reasons . Then you need to explain why it is not a good idea to get two data scientists from the body shop to create a model
6. Even if the customer sees the value , and spends the money – after the team shows the model , it could look really simple and customer will again ask “why didn’t I just hire two junior data scientists to get it done ?” . (The sausage making ( fitting the models) is not fun – it usually is darn tiring grunt work- to watch unless you are a data scientist yourself )
7. Neither Models nor data stay static . Unforeseen things can come up and there might not be a way to predict meaningfully using past data and analysis . For example – lookup why Nate Silver could not predict Donald Trump becoming GOP nominee
8. Every prediction comes with caveats – some trivial and some complex . Trivial ones can usually be ignored and an automatic action triggered ( like for example ABS kicking in a car when some conditions are met) . The complex ones need significant explanation – and that is not easy unless the recipient of the information understands some basic statistics . There are software vendors who claim their wares can make predictive analytics available to lay users . What they don’t do is explain what caveats apply when those users see results of their analysis .
9. Like with everything else, things go wrong all the time in analysis too . Complex analysis is really difficult to debug today
10. Even if all these challenges are over come , and you tell the customer there is a 90% chance of door 1 being the one to open to find the pot of gold , it could still be that door 2 was the right answer that time . So now you have to explain why that happens
I can go on , but I am sure you get the rough idea already . If not – buy me a beer and I will give you some examples of real life situations I have dealt with 🙂