Big data – You can start small, but start today !


I am back at Houston Airport after an excellent customer event where I did a key note on big data . One of the side conversations that happened today was on where to start big data programs . Jon Reed had nudged me couple of days ago on the same topic . So here are a few things that come to mind .

I am a big fan of starting small but starting soon . Technology lets you scale as you go . So don’t wait for the most amazing earth shattering use case . You will need some time to get a first hand feel of how things work in big data land .

There are many ways to skin the animal that big data is – but if you want the easiest way to do it , my suggestion is to start with corporate data and work your way into bringing real world entities into the mix to see how they work together usefully .

My favorite way of finding a big data problem – which allegedly is influenced by an interest in corporate finance in b school back in the day – is to start with your financial data reported to the street . This is a summary representation of what is happening in your company and if you compare it across periods – you can spot what needs most attention .

Lets say Accounts Receivable is the one you spotted , and of course verified with other sources of data to make sure it is worth exploring . Now – what makes AR ? It is all those customers who bought your wares who haven’t paid you yet . What do you know about those customers ? Have they paid you in time before ? How is their credit rating ? Did collections department make a hundred calls before they paid last invoice ? This needs more data – granular data that you probably need to load from somewhere else .

What is happening with those customers in social media ? Not just twitter and FB – what do financial analysts think of them ?

Can you combine the AR and collections info with social media input to assess the chance of you getting paid ? How about seeing if the customer is holding payments because of poor service ? Have you tried to analyze the trouble tickets and service documents and see any trends ?

You could keep narrowing down till you find something you didn’t know before . Once you figured out what influences other data have on the AR info you start with – you can map it to the transaction processing . Should you discount more ? Should you put a billing block on the customer ? Can you have an API that can make there decisions while a transaction is in process ?

That is a full closed loop – moving from a small set of data to a bigger set data , only to find what tiny insights you can find in the context of a given transaction . And once you have solved that – cast the net wider , and keep going after all the opportunities you can find .

The trick here is to not waste time chasing every fork in the road . You will get a lot mod false signals that you need to smartly move past . Fail fast and move on . And if everything looks ok on AR , try another part of your financial statement trend that stand out.

Another way I attempt this is to talk to the person in charge and the persons who are at the farthest end of the business process . Ignore the layers in the middle . Like the VP of Sales and the shipping clerk / Store manager / sales person etc . More often than not – their perspectives will be different . Start from summary data and see of you can relate the source data from each of the others to see impact . As you add more and more sources – you probably will run into interesting trends , either with the impact on summary or between sources themselves . Stop and analyze and ask the business users on possible explanations . If you smell a rat , don’t assume something is wrong . But if the actual business users smell a rat – you have enough validation usually to dig deeper.

There are many caveats here . Organizational and Political motivations can easily derail you if you are dealing with data from multiple sources . Constantly verify data with multiple sources – systems and people . It is as much an art as it is a science .

Also keep in mind that data can be interpreted in many ways to suit someone’s needs . And just because you see correlation doesn’t mean you can assume something is good or bad . This is why I repeat myself all the time that data interpretation needs to be verified by people in the business who understand the context .

And always remember to close the loop – end of the day , data is good only if you can act on it . So make sure that what the data tells you is factored into your business process – maybe you need to enhance a transaction , maybe you need to empower a call
Center agent way more than today , may be you need less approvals in your work flow . You haven’t done justice till the loop is closed . Even then – you will need to keep tweaking it as the world around us changes faster than we can keep up .

As technology improves and more machine learning and AI gets mainstream , the effort to do all of these will come down for sure . But then, the complexity of problems we will attempt to solve in mainstream business will also be orders of magnitude bigger . So – start small , but start today .

Got to run – my flight is boarding 🙂

Holding a tiger’s tail


Where I grew up , our elders used to advise us that we should never hold a tiger’s tail . The story is that if you do catch hold of a tiger’s tail – you will have to round and round with the tiger pretty much for ever . People will cheer for you as long as you don’t let go – the moment you let go , people will cheer even more because they get to watch the tiger eat you .

Whenever a big company CEO steps down, and I start reading the commentary in press and social media – I fondly remember the proverbial tiger and its tail .

Look at Steve Ballmer . He grew the company’s revenue and profit , yet the market cap nose dived and share price stopped swinging up . And now, the world wants Bill Gates to come back and do his magic . No one particularly worries about trivial thoughts like “if Gates indeed had a bright idea, he could have implemented it at MS any time Ballmer has been CEO”. Personally , I think the world needs Gates to continue his humanitarian pursuits way more than running MS . But I digress, the point is – Ballmer held the tiger’s tail and no glory came with it .

Look at IBM – Sam Palmisano figured out that revenue didn’t matter all that much , investors just wanted steady growth in EPS . He boldly announced a plan for $20 EPS by 2015 . And he didn’t miss a beat . Then he retired, Rommety took over and continued down the path at a pace Palmisano set . And she did miss a couple of beats and voila – there she is holding the tiger’s tail . I hope she finds a few good ways to get the great company to grow again .

Then there is Apple . Steve Jobs took it to almost stratosphere and made it extremely highly valued . Three back to back successes with iPod , iPhone and iPad . And profit margins out of the world . Top line and bottom line in excellent shape . And then he passed the baton to Tim Cook . Cook didn’t miss a beat on operations – and company still has awesome financials . Yet, since the next iThingy hasn’t come yet – there he is holding the tiger’s tail too.

Pretty much the same story at HP, Intel and pretty much everywhere else . Michael Dell tried to avoid the tiger by going private – but lets see how that plays out .

It doesn’t matter whether the company is in good financial shape or bad . Everything that had an upward trend at any point has to keep going up – for ever . Not one thing , not some things , not majority of things – everything . Otherwise CEOs have to tell upfront what exactly will go up – and then they usually get a pass for everything else . But then if they slip on the one thing they promised will go up – tiger eats them immediately . That is the deal .

The tiger will maul and eat them in public – not in some closed room . And the public that watches them will be liberal with unsolicited advice throughout the process – for ever . No breaks – no time outs .

With this well known background , I am amazed how many people still raise their hands to be counted when a CEO search is going on . More power to them !

Big Data Deployment – Planning is everything


When data becomes big – and it gives pretty cool insights of high value , the big hurdle facing customers will be their own deployment challenges. Where should this magic solution live ? And how exactly will we find out ?

Several factors play into this – and I am just mentioning a few that came to mind first.

1. Hardware is cheap-ish

Cheap is relative . When you have a million dollars to spend in a hard economy , would you buy hardware or will you do something else like hire more sales people , spend more on marketing etc ?

Would you buy or would you rent ? Or will you start small by renting and then buy when you need a scale that makes renting uneconomical ?

If you buy , are you going to buy cheap servers and live with extra redundancy? Or would you rather invest in fewer industrial strength servers with great HA and DR ?

2. Skills , or lack there of

Even if you have cheap hardware lying around , do you have skills and manpower to install and patch on all the machines ? Is it cheaper to hire/train internally or should you hire a consulting company to do your big data technical work ?

What about business users ? If big data tells them something new – are they empowered to act on it ? Or will a real time insight need a batch mode committee of people to act on ?

Does the business user have enough training to understand the context of what big data solution tells them ?

What is the minimum usability requirement ? (Not everyone is a data scientist – and majority of use cases will need stupid simple usability , ideally with little to no training )

3. Ever improving technology

Big data technology is benefitting from rapid innovation from open source world and commercial vendors . How much appetite do you have for keeping up with fast evolution of technology ?

Tactically , when will you replicate and when will you federate ?

4. Quantifying the value

Investments are worth only when value is greater than cost over a reasonable period of time . Cost is a straight forward calculation and so is OPEX vs CAPEX . But do you have the ability to quantify value and benchmark against the best in the industry ?

How does this play with existing strategies on BYOD , security and everything else that you have a strategy for ? Can they all work together ?

5. Platform and Applications

What will you buy and what will you build ? Do you have guidelines on deciding what factors will make you ask a vendor to create an app for you (and others) as opposed to building it yourself ? Do you have criteria for evaluating all the platform options ? Do you expect ERP like security for big data or will you relax it ?

6. Legal , ethical and privacy stuff

Are you aware of what the government thinks of your data ? Do you have ideas on how best to keep your big data solution legal and ethical ? Have you considered opt-in and opt-out scenarios for users ?

In short – there are a large number of deployment considerations for big data . The options available are increasing and improving almost every day . So definitely a good first step is to spend some time deciding on your big data strategy – while remaining pragmatic that your strategy will evolve over time , and probably at a rate faster than BI strategies etc of past .

In my opinion , accelerated value from big data is possible only if all or part of the solution is cloud based . A customer should not have to worry about the deep mechanics of big data – they should be focused on the quality of insights . The mechanics of this should be offloaded to a vendor you TRUST to partner with . Big data comes with big responsibilities – so choose the partner wisely and for long term .

Such a vendor should be able to shield customers from a lot of the flux – and at a cost that is cheaper than if you tried to do it yourself . Of course 100% cloud like deployment is not practical for many reasons as made obvious in the discussion above – but vast majority of big data landscapes will need to be cloud based if value realization had to happen at a big scale . So like it or not – plenty of hybrid solutions will crop up to support big data .

So what is the end game ? Wish I knew – but I do have a dream . A network of big data is my vision of an end game . A network where data is shared across a huge ecosystem where people collaborate securely on data without everyone having to keep a redundant copy and build custom solutions on top . Of course not all data can be shared – but in almost all industries I am familiar with , not even 5% of data of common interest is shared freely . Lets see how long it takes before such a network will show up in our lives – or maybe it never will , and I will have to find a new dream 🙂