As always , just my personal views here – nothing official about it .
The world of data is roughly split into two – preparation of data (master data management, cleansing, quality, integration etc) AND use of data (Analytics , transactions etc ). I think one of the biggest reasons that companies never get a grip of their data is because these two sides of data are some lbw treated differently .
The whole idea of preparing data exists because it is to be used for something like analysis or transactions . When do we know for sure that data is bad ? When we cannot do transactions and analysis meaningfully with it .
How do we define quality of data ? One way is to do so in somewhat technical terms – like 20% of the data is duplicated , or 15% zip codes do not have valid values etc . This means nothing really to the people who have to use the data . If this is represented as “you will incur $500K of return orders because of wrong addresses” – a rational decision can be made whether to spend time , effort and money to fix it .
How do we find this $500K impact? By Analytics of course . What do we analyze to find this ? The transactions that happened in the past , and those that might happen in future. In essence – data prep, Analytics and transactions all exist in real life without hard boundaries . Boundaries were drawn by IT , by vendors and by analysts . And because these boundaries were drawn – the various parts of the data world are busy thinking how to create the next best visualization tool or next best data cleansing engine , instead of thinking about how to create the next best holistic decision making system . We brought the curse of incrementalism on ourselves – no one else is to blame .
With every passing day, more and more information is created outside our firewalls than inside of it . But look at the data models used in systems – they are all mostly defined by in house systems . So then why exactly are we surprised that it is like pulling teeth to consolidate external and internal data to make decisions ? Rigid data models based on tables and fields and so on are not going away – transactions and some operational Analytics still need those . But we need to start dealing with real world data as it exists – as entities that have evolving relationships with other entities . That needs to be a part of both the preparation of data and the use of data .
Does data need to be fixed before we make use of it ? I think we should move beyond a binary yes/no answer . In real life that is not how we make decisions . So why should our systems behave differently ?
Case in point – if a CEO asks HR “who is the best sales person in the company today?” , HR system can probably answer that saying “That would be Mike”. Then an HR analyst will need to look at the information to come up with a more realistic answer “It is really a toss up between Mike and Sara” . Why don’t we have system that gives answers like ” 50% sure it is Sara and 60% sure it is Mike” ?
HR probably needed to check with Sales colleagues to find this answer . Decision making is collaborative in real life – you can’t predefine rigid workflows for every possible question that an executive should ask . Yet, we see collaboration and data technologies evolve rapidly in parallel tracks without much of a convergence . Shouldn’t this change ?
No system can fix all data problems – but the approach to solving problems is still stuck in the past for most part . Humans don’t magically know all answers either . But over time , humans can see patterns in data and make intelligent guesses . Machines can do that too in many cases – and can crunch more data than humans can . It is time machines did a little more of the heavy lifting before asking humans for help .
In pockets , all these things are happening. Technology is not the biggest hindrance to making the world of data more real life like . It is the approach of creating these artificial boundaries that stands in the way the most .
President Reagan famously said “Tear down this wall ” in a different context . I would like to borrow his words and say “Tear down these data walls please !” .