Big data – You can start small, but start today !

I am back at Houston Airport after an excellent customer event where I did a key note on big data . One of the side conversations that happened today was on where to start big data programs . Jon Reed had nudged me couple of days ago on the same topic . So here are a few things that come to mind .

I am a big fan of starting small but starting soon . Technology lets you scale as you go . So don’t wait for the most amazing earth shattering use case . You will need some time to get a first hand feel of how things work in big data land .

There are many ways to skin the animal that big data is – but if you want the easiest way to do it , my suggestion is to start with corporate data and work your way into bringing real world entities into the mix to see how they work together usefully .

My favorite way of finding a big data problem – which allegedly is influenced by an interest in corporate finance in b school back in the day – is to start with your financial data reported to the street . This is a summary representation of what is happening in your company and if you compare it across periods – you can spot what needs most attention .

Lets say Accounts Receivable is the one you spotted , and of course verified with other sources of data to make sure it is worth exploring . Now – what makes AR ? It is all those customers who bought your wares who haven’t paid you yet . What do you know about those customers ? Have they paid you in time before ? How is their credit rating ? Did collections department make a hundred calls before they paid last invoice ? This needs more data – granular data that you probably need to load from somewhere else .

What is happening with those customers in social media ? Not just twitter and FB – what do financial analysts think of them ?

Can you combine the AR and collections info with social media input to assess the chance of you getting paid ? How about seeing if the customer is holding payments because of poor service ? Have you tried to analyze the trouble tickets and service documents and see any trends ?

You could keep narrowing down till you find something you didn’t know before . Once you figured out what influences other data have on the AR info you start with – you can map it to the transaction processing . Should you discount more ? Should you put a billing block on the customer ? Can you have an API that can make there decisions while a transaction is in process ?

That is a full closed loop – moving from a small set of data to a bigger set data , only to find what tiny insights you can find in the context of a given transaction . And once you have solved that – cast the net wider , and keep going after all the opportunities you can find .

The trick here is to not waste time chasing every fork in the road . You will get a lot mod false signals that you need to smartly move past . Fail fast and move on . And if everything looks ok on AR , try another part of your financial statement trend that stand out.

Another way I attempt this is to talk to the person in charge and the persons who are at the farthest end of the business process . Ignore the layers in the middle . Like the VP of Sales and the shipping clerk / Store manager / sales person etc . More often than not – their perspectives will be different . Start from summary data and see of you can relate the source data from each of the others to see impact . As you add more and more sources – you probably will run into interesting trends , either with the impact on summary or between sources themselves . Stop and analyze and ask the business users on possible explanations . If you smell a rat , don’t assume something is wrong . But if the actual business users smell a rat – you have enough validation usually to dig deeper.

There are many caveats here . Organizational and Political motivations can easily derail you if you are dealing with data from multiple sources . Constantly verify data with multiple sources – systems and people . It is as much an art as it is a science .

Also keep in mind that data can be interpreted in many ways to suit someone’s needs . And just because you see correlation doesn’t mean you can assume something is good or bad . This is why I repeat myself all the time that data interpretation needs to be verified by people in the business who understand the context .

And always remember to close the loop – end of the day , data is good only if you can act on it . So make sure that what the data tells you is factored into your business process – maybe you need to enhance a transaction , maybe you need to empower a call
Center agent way more than today , may be you need less approvals in your work flow . You haven’t done justice till the loop is closed . Even then – you will need to keep tweaking it as the world around us changes faster than we can keep up .

As technology improves and more machine learning and AI gets mainstream , the effort to do all of these will come down for sure . But then, the complexity of problems we will attempt to solve in mainstream business will also be orders of magnitude bigger . So – start small , but start today .

Got to run – my flight is boarding 🙂


Published by Vijay Vijayasankar

Son/Husband/Dad/Dog Lover/Engineer. Follow me on twitter @vijayasankarv. These blogs are all my personal views - and not in way related to my employer or past employers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: