It rained heavily in Chandler yesterday and our front yard looked a whole lot nicer than the usual dusty appearance . That somehow made me think of the new look and feel of the world of data I live in as a professional .
For those of us who grew up implementing big data warehousing projects , it should not be a shock really to look back and figure out that most DW projects started because business had an analytics problem , but in the end 90% of effort was spent on the plumbing – the management of data (ETL, data modeling and so on) and only 10% on actual analytics (or even just basic reporting in many cases) .
This is true not just in design and build – it’s the case with supporting and maintaining the Data warehouse too . Companies have spent countless dollars on DW implementations and no one is truly happy about it . Yet, no one I know has any plans of fully replacing their DW implementations either (which of course is the right thing to do ).
Along came “big data” promising to make life better for everyone and setting very high expectations . Vast majority of customer executives that I speak to think of big data as an answer to their analytics solutions . Even amongst the CIO community , very few realize that most of the conversation they have heard is about the data management aspects ( 3V model is familiar to everyone and it’s about data management , not analytics). So in the past few years , I have seen several of my clients jump into big data initiatives to accelerate the realization of their analytics needs .
The fall from grace is rather rapid – mostly because of unrealistic expectations . To begin with the minimum requirement for big data projects in many cases is to meet the SLAs of their existing data warehouses and data marts . It doesn’t take too long to realize that ain’t gonna happen .
Then comes the dejavous realization that big data projects also need most of the time spent in ETL just like data warehouses did in past . Usually this leads to a quick reduction in scope of the projects – usually by eliminating some sources of data that are more complex or less clean , and of course this means analytics is compromised too .
Finally the reality of “data lakes need a lot of curation” kicks in . No company has enough man power to curate all the data that it needs for analysis . And at some point , the data lake just becomes a data dump with the idea that “curation can wait while we figure out what we need to analyze”. That is rarely practical – data scientists won’t always know the context of the data unless an expert curated it beforehand . And the world doesn’t have enough data scientists today to make them do data cleansing for most of their time .
Till such time as AI/Cognitive capabilities take the stress of curation away , I think analytics will continue to get short changed and the promise of big data ( and specifically data lakes) giving powerful analytics for busines users will not exactly work as advertised .
It’s not all gloomy though . Customers who start small with well defined analytics requirements have already started realizing benefits from their big data investments . They don’t take a “build, and they will come” approach . They just build intelligently as requirements come up and plan to have more comprehensive solutions down the road . They value business flexibility and agility over technical elegance . Many of them have taken the time to formulate a strategy and a roadmap on what they want to do – leading with analytics that satisfy specific business requirements and working back to data management , and not the other way around .
Of course we need both – but It’s time we put the horse (analytics) in front of the cart (data management).