Who killed Hadoop ?

Yesterday evening, while flying from PHX to JFK, I had a chance to read this excellent blog by Arun Murthy . If you have not read it yet – pls read it first before you read my rant below. As always – these are strictly my personal views .

silhouette photo of elephant during golden hour — Photo by RENATO CONTI on Pexels.com

First – I think Arun is probably the best person to write a blog like this and needs to congratulated for doing so. He has been part of the Hadoop story from the very beginning and continues to play a significant part in shaping its evolution. It also felt that it came straight from the heart – and extra points for starting lists from 0 and not 1 🙂

My own point of view was developed over the last couple of decades as a data geek who worked on a range of technologies on data management as well as analytics for a wide variety of clients across the world.

Data is largely an unsolved problem in the large enterprise world. Just when you think you have it under control, you realize that the problem got bigger and more complex. And you also realize the tech has improved and now you have more options on how to solve the bigger problems. This is one reason why most techies eventually use philosophy as a framework for explaining the evolutionary nature of their work.

It is hard to predict which way an exploratory project will go. This is great for developers as it gets their creative juices flowing. But that is not how enterprise CIOs think. They value high doses of stability and predictability , and very low doses of complexity. They very well know that the pundits will use terms like “legacy thinking” to shame them all the time. But their world comes with flat and declining budgets and there are always a lot of lights to be kept on. Within all those constraints – the good CIOs try to foster as much innovation as possible. And of all the innovation they have sponsored in the last decade or so – Hadoop definitely was top of the list.

World was ready for Hadoop . Classic datawarehousing had been pushed to its limits. Data warehouses became data dumps. Cost of maintaining those datawarehouses started driving everyone nuts. And Hadoop promised a solution for all these and more. Plus the open source nature gave all the geeks even more incentive to introduce it in their shops.

In my opinion, 4 things led to Hadoop’s alleged demise

Too many options for clients to choose from
Unskilled people implementing it
Multiple changes in market positioning
High operational complexity

Pretty soon – everyone ran into challenges. MapReduce was no longer sufficient to do most of what enterprises wanted to do. No worries there – Spark etc came up just at the right time and took over. The world realized that you just cannot run away from SQL even if you criticize it heavily. So many different SQL on Hadoop projects came into being and that did not always work in the way traditional IT shops expected. IT shops are not used to having tremendous choice in solving problems. When Cloudera and Hortonworks proposed different solutions to a problem – be it SQL, be in security or whatever – it became very confusing for the people who were trying to implement a long term solution in their shops. In short – “Hadoop is a philosophy” started getting interpreted as “there are no real best practices here – just keep experimenting” by a lot of clients. Just to keep it brief, I am skipping the divergent direction MapR took – and that story did not end well either.

Then came the question of skills. For enterprises to adopt technology faster – you need a lot of people with that skill. Much like how SAP market got flooded with poor skills when ERP was hot – Hadoop market did too . That had a direct effect on the quality of implementations. Many clients are still struggling with tech debt caused by using developers and architects who did not have good fundamentals in data management. Net net – hardly anyone replaced any data warehouses , and data lakes became the new data swamps. To be fair – the growth of classic data warehouses have been significantly curtailed since hadoop became mainstream.

Arun has already explained the “What is Hadoop?” question in great detail. So I will skip that entirely.

What also did not help a lot was the positioning of hadoop companies changed over time – perhaps to sustain the insane valuations in private markets. It swung from data management to analytics and ML to managing everything in cloud. When you try to do everything – even if it is a great problem to solve – it is hard to execute to perfection, and it confuses clients a great deal.

The last point that made it difficult for Hadoop was operational complexity. Data management is a lot of fun for developers. But the moment it is in production – rock solid operations is what keeps it going. Even for RDBMS based systems – DBAs and other Ops experts with all the mature tooling still spend significant time managing their landscapes. Hadoop ( to be fair – most NoSQL DB too ) did not prioritize ops sufficiently. In my view at least – this was perhaps the biggest miss and one I think Cloudera and others should urgently address. Every client I know will be grateful if managing Hadoop was significantly simplified – especially between on-prem and cloud.

All this said – I don’t think Hadoop is dead , or that it will die. It will continue to evolve and world of data management needs that innovation and open source communities to thrive. But if the four points I raised are not addressed – I seriously doubt Hadoop will reach its potential any time soon.