The cost of precision in BI


What percentage of decisions need precise data right upfront ? My guess is less than 10% or even lower .

A big decision for an average person is purchasing a home . Having gone through that exercise in two countries – and knowing many people who have done this before and after me , I am convinced that the data needed to make that decision did not need much precision . Based on your financial position – you could judge affordability within a plus or minus range of some amount . Another factor was school district rating – how many of us will care if the score was “9 out of 10” vs “10 out of 10” ?

Decision making is progressive – you find a “cluster” based on some characteristics like location , price etc . As you narrow down – you cluster again – flooring , yard size etc comes into play ( but again you don’t need super precision – if I am looking for a 2500 sqft house , I won’t overlook a 2400 sqft house because it didn’t precisely meet my criteria ) . And then comes the ultimate short list that needs some precision , and a final decision that needs excellent precision since you need to pay it to the seller .

Some version of this process is followed in all decision making , including sales , marketing, purchasing etc . Apart from legal and financial – i think almost no business function needs the type of upfront precision that it has chased since the beginning of time .

Look at an average BI project in enterprise world – 90% of time is spent on plumbing data – designing schemas , defining exception workflows , writing transformations and so on . Remaining 10% is used to make reports useful to its audience . This is unavoidable because BI is very static in nature – even what is called as-hoc analysis is limited by schemas in back end . In short – even the best BI solutions cannot mimic how human beings make decisions .

The quest for extreme upfront precision is what works against BI being useful – ironic as it might sound . And BI has no chance of being seriously disrupted till it stops expecting tightly defined schemas on back end , and high precision right upfront in all cases .

Context is way more valuable than precision . That is how we make decisions eventually in real life . And context changes with time – which means BI has no chance to keep up given its hard dependency on static things . BI world needs to think in terms of real world entities – not in some arbitrarily defined data models .

Good news is that technology and data science have progressed enough to do that in (more or less) repeatable and cost effective ways . Bad news is that the world of BI won’t go to the promised land without blood curdling shrieks , kicking and screaming .

Keep calm – our world of BI is changing , hopefully for the better .

Published by Vijay Vijayasankar

Son/Husband/Dad/Dog Lover/Engineer. Follow me on twitter @vijayasankarv. These blogs are all my personal views - and not in way related to my employer or past employers

9 thoughts on “The cost of precision in BI

  1. It is the question of battle axe vs scalpel. While the battle axe will cut everything, it will do it with no precision. In opposition, is the use of the scalpel which only takes exactly what you target, but you may miss lots unidentified targets that might be real. In statistics this is Type I error – false positive and Type II error – false negatives.

    I find for BI and most business situations, I want to start with query or system that has too many false positives and then apply tests after that to weed out the false ones. Missing a positive condition that a narrower test would skip over is too expensive. So I design a series tests with ever increasing accuracy.

    Starting with a wide dragnet and running a series of increasingly accurate tests is one of the real advantages big data, data reduction frameworks, and in-memory databases. It is great seeing technology bringing these capabilities to fruition.

    Like

  2. Yes, cases on # of middle-wares hitting home runs are jaw dropping {-; Freezing what’s really necessary has always been #firstworldproblems with BI, 2nd came along self-service; when soon people got bored w/cursors, 3rd we worried on pooling unstructured analytics w/structured, 4th MDM which in most cases get administrative w/LOBs, 5th Indexing & making silos using archival concepts to reduce d/space. Moore’s law proved products/tech can grow so exponential that we can leave data governance worries remotely & “play” ad-hocly on reports with infinitely huge KPIs. What I see from your blog would be the next wave – irrelevancy optimzation. Most social apps today exist because they SWOT`ed these & tried to gamifiy w/these irrelevancies! Say, 300 KPIs in VBAP is in someway irrelevant for a supplier not for PoS, but unless products can prove to make ‘meaningful’ static from the data deluge, its imperative BI might get rusty; and, perhaps is the best reason devs quote is that *data* is not killing creativity, but the way we tell stories are! And, yes we can’t blame RFP writers; as they focus on u/cases for their SSoTs rather as opposed to what gets published to a CXO guy. I really see its a challenge ahead wherein some unsupervised algorithms can help determine inference, but its not too far ahead. However, the debate about whether these optimization *irrelevancies* is a consequence of data availability, a cause of it, or belongs unrelated, is possibly far from conclusion; since storage costs are also quite a problem a deal gets iced today. Storing data chunk on a MPP costs say $1, its almost $0.05 on few open-source Hive Instance, but customers aren’t ready for wait that few mins data gets scooped for a query! Forgoing the fact storing data is hard, M/learning scripts helps canning the data, plotting close fits, but end question as earlier mostly aligns w/budget constraints; PS. But there are parody cases like this, where I sat ROTFL when a MNC started designing a stupid landscape like Neteezas > Hadoop > HANA for a FMCG client. As rightly pointed its time we have to get pragmatic, not dogmatic!

    Like

  3. I think you’re right, but at this point a lot of that precision is based on the culture of the organization. If leaders aren’t prepared to stick to their guns when the numbers are “close enough,” proving that a single number on a spreadsheet is questionable can be enough to thwart really useful decisions based in nearly correct data.

    Like

    1. Culture is key – I agree Jamie .
      However , that won’t get fixed if we don’t have tools that can be used if culture were to change . I hope this doesn’t turn into a chicken and egg situation , but it well could .

      Like

      1. Great article,and as always, concur and nodding all the way. How important is it to be off by a few thousand dollars on a trend dashboard ranging in the billions….a rounding error…yet we had to spent countless hours and carry 10 decimal points…

        On the other side, accounts would say when it is 6 cents off, it could be one record off by -5 million, and the other off by +5million plus 6 cents…

        Have to say I feel it is wrong to be chasing the sesame seed….but on the fence in the end….

        Like

Leave a comment