Big data is the talk of the town in social media, and has picked up some interest amongst customers too.I had a series of big data conversations this week with customers, colleagues and friends and thought I will share some here. As always – these are just my personal ramblings, not my employer’s views.
In social media – “big” usually means close to petabytes or at least several tens of TB rushing at you from all over the place. At customer sites, the expectation seems to be much more modest – 50 to 100 TB is considered excessively big data even for some very large customers I know.
Cost of big data is bigger on all fronts compared to status quo volumes (and velocity and all other factors) of data in most shops. Storage is cheaper than few years ago, but it is not free – and when you talk about petabytes and all, it needs a LOT of storage. And then there is the multiples needed for HA/DR/Archiving etc. And this needs more data center space, cooling , power and so on.
What about the quality of data? As we know – poor quality is a big problem in all kinds of data related stuff. Quality becomes a bigger problem when volume and speed increases. Existing tools may be stretched to deal with that kind of data. But assuming tools can somehow do this – there is a question of the human effort to fix data. A lot of data projects fail to deliver value because no one owns data from business side. Big data will most probably make this problem worse, unless software improves by leaps and bounds in short order to make data quality a non issue. How many of us will hold our breath on that?
What about security? even with just 2 TB of structured data – there are companies who struggle to make sure everything is secure, and everyone is kept honest, and all the legal compliance is ensured. I have seen the amount of trouble they go through when status quo is changed (like an M&A , or even a small CRM system is introduced). Most of them are not equipped to deal with more data unless they beef up on more sophisticated governance, and probably more staff.
Some companies love BYOD and others do not. The ones who do not, frequently worry about support cost and security. Imagine the effort they have to go through if BYOD will happen in their companies, and they have to protect much larger data than they are used to?
We are right now in the middle of a small POC for a customer – and the data in the datawarehouse is miniscule compared to what “Big Data” can be. We are talking about something only like 150 to 200 million lines per cube. The data comes back at lightning speed from database to appserver. But the user did not see this speed from his iPad connected from a starbucks wifi via VPN. He did see some improvement, but not enough for a big WOW. And every drill down needs a roundtrip that also chokes up the network yet again. Essentially, the bottleneck moved from the DB/App server side to the network/client side. These networks will need serious upgrades in capacity to cope with big data. And the mobile software should be smart enough to use the processing power and memory of the device to minimize the use of bandwidth when it is not required. Carriers will probably need big upgrades too, and if big data catches on – we should start seeing different types of data plans from them, dissimilar to the rates that we see now when we buy tablets and smart phones.
Then there is the cost of licensing – and the models of licensing evolving. But if licenses are tied to the quantity of data that is processed/stored – then that adds up quickly. And even with sophisticated software – you need smart data analysts who can make use of it to generate value. These analysts – or architects, scientists, artists, artisans or whatever it is they are called this week – don’t come in big numbers, and they won’t be cheap either. And long term – I am not sure if this is given enough importance in universities.
The other side of the equation – the more important side, is the value that big data delivers. There is definitely value in big data – significant value – for sure. But it is not value that gets delivered overnight, and it is value that takes significant investment before reaping benefits. And this value will not be spread evenly across industries, or even companies across industries. So it is a decision that needs to be taken carefully. Given the cost, the insights from big data has to be not just “big” but “BIIIIGGGG” – for the investment to be worthwhile. And because it “can” deliver value does not mean it “will” – it is not a secret that several companies could not even make good use of much smaller quantities of structured data available to them readily all these years.
Several CXOs I have spoken to are willing to dip their toes despite the cost. And they are all trying to find out where it is that they can gain competitive advantage by jumping in. Several are interested in a cloud offering for big data – mostly from a cost point of view. This is an area where SIs and SW vendors and analysts et al need to do a better job in my opinion. There seems – in my limited visibility – a serious shortage of specific use cases to help companies make a business decision. There are a few – like in healthcare for example – where compelling arguments were made, and customers and vendors are partnering effectively. Given the investment needed for big data – evolutionary change might not make it look appealing to the buyers. It needs to be revolutionary . And as my ex-manager used to tell me – almost every project that pays for itself will get funded irrespective of the economy.
PS: If big data catches on big time, then we can seriously expect a boom for the tech stocks across the board since several companies will benefit from the vendor side. The economy – at least in history books – will probably thank big data for the good that it did 🙂
There is now a lot more data than ever before with the coming-of-age of sensors, public networks (e.g. Unique ID), GPS and other technologies. Consequently, storing, analyzing and managing peta-bytes of data is becoming important. However, traditional data-management techniques are unable to handle this volume and the new forms of data. Paradigms such as stream-processing, complex-event processing, and map-reduce will provide operational alternatives. Working with these paradigms will require re-training and re-tooling. Hope Is SAP HANA will do better for handling petabytes of data.
LikeLike