Vast amounts of time and resources are spent in the corporate world in what is called “root cause analysis” (or RCA for short). I have done my fair share – countless spreadsheets and PPT decks have been created by me too like most of you who are reading this . And from the time I was in engineering college to today , I have read a lot of books, white papers and blogs on this topic . I have asked my team to do root cause analysis countless times – at least occasionally in a mindless fashion.
And I have to say this – skills and luck are both needed to do a decent job . It’s more art and less science – and it will take a lot to convince me otherwise.
Here is something I once ran into at the beginning of my career . An incoming high value sales order EDI document failed to post in the ERP system . It was the second time in one week that this happened and the customer ordered a root cause analysis – and finally it was me who had to do it at the bottom of the hierarchy . It took me all of 10 minutes to find that the translation system had wrong mapping for a field or two . It took a day to work with that vendor to change the map . We didn’t see the problem for a good while after that . My ability to analyze was praised – and although I didn’t get a raise , I did get some lovely emails and all that .
And then at a party few days later, I was introduced to a VP at the customer. He thanked me profusely and said the business impact was pretty high for missing the order twice . He also asked me if I thought EDI was the wrong medium for an order of that high a value . His other question was whether the contract with my employer had any SLA’s based on business impact . I had never thought of that angle – and in all honesty I told him I can’t answer it . My boss picked it up from there – except, 15 years or so later I haven’t forgotten the incident .
1. There is seldom “A” root cause
I – and many others – assume there is this one clean reason why something (bad) happened . There is never one reason . It’s always a combination of reasons . ( I can’t resist the temptation of saying it is like a nested JSON structure in a document database than the near rows and columns of a relational table )
2. It’s not repeatable for the most part
The way I do root cause analysis is not how someone else does it . More often than not – you will get different causes if you ask two persons to do RCA .
And it will cost an arm and a leg to come up with MECE answers ( MECE is what consultants learn in their bootcamps – mutually exclusive, collectively exhaustive) . And even at that cost – it might not be MECE after all.
3. Asking “Why” 5 times doesn’t solve it either
Asking “Why” is necessary – but to the best I can jog my memory , I can’t remember one incident where I exactly asked it 5 times to get to the root cause(s). It is an arbitrary number that on occasion might be 5.
To begin with, you can’t ask the right “why” questions about things you don’t know at a deep enough level even if all data is available . When programs fail and we debug and look at logs – it makes sense only if we understand correctly what we are looking at . A C programmer might not be the best to analyze a Unix dump .
Asking too many “why” questions is also counter productive from a time and budget perspective – you can only bark up so many trees at a time before the cows come home
4. You can’t always determine cause and effect
What we find as root cause might be the effect of another cause . It almost always is the case . It’s like a recursive function which can’t get out of itself .
Over a few beers, a friend and I were once discussing the root cause for the poor performance of an application we had coded . I will spare you the many colorful details (all very logical if I might say so) – but by the time we left the bar , it was crystal clear that the reason for sub par performance of the code could be traced back to some traumatic childhood experiences of a common friend .
Parting comment – set the right expectations before you dive into root cause analysis . Make peace with the fact that most probably you can only treat some symptoms , and over time and with luck on your side – you will solve enough symptoms and some causes to make things work at a decent level .
RCA is a dish that always needs some salt – some times a grain , and at other times a pound . I will leave you with that .