Big Data Solutions – Do Questions Matter ?

I have Ray Wang to thank for this post. Off late, I have a serious problem of writers block. I just cant find a topic interesting enough to write about, and consequently have become a ratherirregular blogger – at least compared to last year. Any way – back to the topic of this post.

Ray tweeted this few minutes ago

A lot of BI blueprinting sessions from my consulting career flashed through my mind when I saw that. A key principle for a good BI system design is in finding out upfront most of the questions a user would ask the system, and then designing a solution around that. Unfortunately this is a blessing and a curse – while we can really optimize getting fast and accurate responses to predefined questions , this also curtails our ability to change our mind and ask different questions. More experienced BI experts will second guess other questions that users “may” ask and leave some room in design to cater for that, but it is clearly not a scalable way to do things.

Somehow, users were also trained along the way to agree to some lack of flexibility in BI systems. While the complaints never went away fully, most users think by now that it is normal for BI team to ask for some time to change the datamodels and create new reports and so on. It is a sort of “marriage of convenience” if you will – with tradeoffs understood by both sides.

So when we let go of “ordinary” data and embrace “big” data – what should change? I think we should use the big data momentum to make BI systems more intelligent than the rudimentary things it is capable of doing today. And this intelligence should be done with some business savvy. In other words both “B” and “I” of BI need some serious tweaking.

In my opinion, what should change right away is the expectation of business users needing to state most of their potential questions upfront at design time of the system . Or more clearly – the expectation should be significantly lowered, and business users should be allowed to ask more ad-hoc questions than they have done so far. Of course we can never guarantee full flexibility – so some subjectivity is necessary on where we draw the line. Just that the line should be drawn musch farther from where it is drawn today.

Accuracy of result for ad-hoc questions is not enough – the results should come back in a predictable and short time frame too. Ideally, all questions should come back with answers ( or a heads up to user that this is going to take longer ) within a predefined timeframe (say like 3 to 5 seconds or less).

Then there is the question of how the users ask these questions. SQL or NoSQL – querying languages do not provide democratic access to data. People should be allowed to ask questions in English ( or whatever language they use for business ). Some training might be needed for the system and for the users to understand the restrictions – but no user should be constrained with the need to know how things work behind the scenes. A minority of people should have the skills to educate the computer – the rest of us should not be burdened with that. Instead, the computers should be smart enough to tell them answers to what questions users ask.

There are very seldom exact answers to questions in business ( or life) – even apparently simple questions like “what is my margin in North America ? ” is ambiguous to answer. Most clients I have had have many different meanings to “margin” and “North America” and “My” within their organization. In real life, if these questions are asked of a human analyst, she will ask follow up questions to you to clarify and then provide an answer with necessary caveats. Why can’t systems do that? Wouldn’t life of users be vastly improved if systems answered problems like humans did, in a way humans understand? of course with more speed than humans 🙂

Big data or otherwise, there is always an issue of trust in the data from user’s perspective. Most analysts spend nearly as much time explaining how they arrived at their results, as they take for compiling and analyzing the data. The system goes through all the computation any way – even today in the non big data world. Why can’t our BI systems explain to the user how it arrived at the result all the way from source to target or backwards? Wouldn’t that increase productivity a lot?

When users ask questions – they usually will also combine it with external data (google, spreadsheets etc) before they take a final decision. Would it be possible for a BI system to present some useful contextual data to the questions from internet and intranet and allow the user to choose/combine what he needs?

And one last thing – if the system is intelligent enough to find answers, why can’t it have the smarts to also figure out the best possible presentation for the results? Today – we mostly have to predefine how output looks like visually. Why put that load on users? Can’t systems be smart enough to look at the question and the answers and figure out the best way to represent it to the user? This is not a “big data” problem – this should have been the case all along, but somehow never quite happened in a mainstream kind of way.

This is by no means an exhaustive list – I left out plenty of things like collaboration, predictive responses, closed loop BI and so on. I didn’t do so because they are unimportant, but only because of the boredom factor. These types of things are already happening to some extent, and hopefully will catch on more as time progresses.

So there you have it – its my birthday wishlist. And thanks again Ray for that much needed spark to blog again 🙂


Published by Vijay Vijayasankar

Son/Husband/Dad/Dog Lover/Engineer. Follow me on twitter @vijayasankarv. These blogs are all my personal views - and not in way related to my employer or past employers

4 thoughts on “Big Data Solutions – Do Questions Matter ?

  1. Hi Vijay –
    Agree with your analysis.

    I believe one of the challenge users facing with “Big Data” includes “what questions to ask”?
    Recently, I was taking to several smart MD’s with machine learning background (dangerous combination). In their view, they have plethora of clinical data, demographics data, waveform data (from monitors), etc. Now can the system tell them what might be the right questions to ask. According to them, the moment you define the question yourself, you become biased to an extent. And “confirmation bias” might set-in in further exploration.



  2. Vijay – This is a very interesting thought process.

    If the objective is to improve the speed and quality of the decision, the correct information needs to get to the decision maker? In my opinion, this is moving more and more towards dynamic input (ref “google” and “spreadsheets” in your article above. One of the ways to make this happen is to decouple the process of creating these inputs (todays BI world) and making the decision. This allows for improved data quality, improved availability of inputs as well the presentation piece you mentioned above 🙂

    For reference, I am including a link to the article I wrote on this, in case you are interested.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: