Big Data : Platform vs Applications


 

It does not matter what vendors say about big data – end of the day, customers need to adopt and expand for big data to live up to its promise. If they don’t adopt – big data will just go the same way as many other passing fads that had the potential to change the world, but never quite did.

 

Big data is just data – we should not EVER forget that! And customers have had varying degrees of success with managing and using data in the past. So everything that applies to data – like ingestion, acceleration, quality, analytics etc also apply to big data. What is different is just the degree magnitude, complexity and predictability.

I like thinking of big data like how I think of Crude Oil.

Image

[image courtesy of SAP]

Crude is very valuable – but not in its native form. A lot of specialized processing, storage etc are needed before it shows up at the gas station where we empty our wallets to fill the tanks of our cars. We don’t need to worry about what happens to the oil till it reaches the gas station – we just need it in a way that we can use it.

Big data also should be thought about along these lines. It may be huge, fast and furious – but a user does not have to care about that. Management of big data should not be a head ache for the user.

So clearly, we need two things to make big data click.

1. We need a platform that can do the heavy lifting and shifting and all that .

2. We need applications that users can make use of without ever knowing a thing about the platform behind it

Lets talk about platforms first. The 3V ( or 4V or 16 V) model etc needs to be kept in mind when we think of platform. Heavy lifting is a given, and it needs to be done without breaking the bank. I strongly believe that the success of any platform is defined by the number of apps built on it, and the number of developers making a living out of it. If that ecosystem stickiness does not happen – the rest does not matter, and such a platform should not continue to exist.

1. We need to get a lot of data into the platform

The price of storage is generally coming down – and not all data is needed all the time. So the smart thing to do is to put data in a storage tier where price performance tradeoff is ok for you. If you don’t need the data in sub second time , you probably don’t need to store it in the most expensive storage tier. Platforms should be able to intelligently figure out where data should reside, with the idea that a human administrator can tweak it as needed.

2. Platform should have data quality and governance abilities

Most data warehouses have at least 3X or 4X duplication of data – and this applies to transactional and master data. This might be ok when we are talking about few TB of data. But when it is in the tens of petabytes, this is a serious issue to be dealt with. Big data will magnify data quality issues if not taken care of adequately.

3. Platform should have the ability to injest data at various speeds

When data is coming in fast and furious – the platform should be able to deal with it. Some of the high speed data might need immediate action – and this need to be treated differently from other data that comes at same speed, but dont need to be acted upon in real time. For example – stock market data loses relevance if you don’t act on it right away in many cases, but social media data can probably wait a little bit before someone makes sense of it and reacts. Another use case might need some social media data to also be responded to immediately. So the platform should be able to respond to all such use cases. Also, not all data needs to be ACID. Eventual consistency is probably ok for vast majority of data.

4. Platform should support different analytics requirements

Speed of response, type of analysis, degree of precision etc differs in each big data use case for analytics. A platform needs to be able to deal with all of these issues.

5. Platform should be able to evolve as technology improves

As better techniques, technology etc come up, the platform should be able to make use of it wihout disrupting users. This is especially true for big data given the speed at which innovation is happening on hardware, software and academics. It is a non trivial challenge – and the primary reason I believe that big data and cloud need to converge quickly.

6. Platform should have a mix of commodity resiliency and enterprise resiliency

Some parts of data needs high availability and disaster recovery (say billing data), but some others might not need it (say,like click streams) . So the platform should be able to provide appropriate resilliency according to the use case. HA and DR are not enough – similar principles apply to security, encryption etc.

7. Platforms should allow both read and write operations in an optimized fashion

When people think of big data – they mostly think of the read part, as in analytics. While this is close to reality, we should not forget that analysis is useful only when we can act on it. And acting on it usually needs the platform to do some writes as well. This should be accomplished without forcing the user to jump from one application to another .

8. Platforms should enable ease of building applications and extensions

All platforms should have this developer friendliness in mind – but when it comes to big data platform, it is not just technology friendliness that will cut it. These platforms also need to be data scientist friendly. While there is some over lap between technology developers and data scientists today – for the most part, these are distinct skills now and will take time to converge.

Of course it is not an exhaustive list, but hopefully I have hit most of the important aspects. So, lets move on to Applications.

Applications are the make or break of adoption. Applications are what users touch and feel and relate to. And hence, for big data to catch on – we do need to shield majority of users from the complexity of the platform side.

1. What characteristics makes a good app does not change just because an app is built on a big data platform.

2. Apps should aim to provide precision and context – not one or the other. For example, you need to know exactly how much is the amount to be collected from a customer for a sale. But this needs to be put in the context of other useful information like historical payment behavior, other large deals pending with the customer, social sentiment about the customer and so on.

3. Apps should be extensible as business environment evolves. Just as the platform should evolve when technology changes. This is also the main reason why big data needs both platform and applications and not one or the other.

4. Apps should be easy to deploy and consume. If big data eventually does not catch on – my bet will be on deployment difficulties as the root cause. And of course it is yet another reason why I like the idea of big data and cloud converging.

Ok , so that was way more than what I wanted to blog. But two back to back meetings got cancelled and I just took the liberty to make full use of that 🙂 

What on earth do you mean by CONTEXT ?


I have to thank Frank Scavo for making me think harder about what context means . I and several people I know use the term liberally , and perhaps not very consistently .

Here is my hypothesis –

Answer to every question has a core (which has great precision) and a context (less precise , but without it -core cannot be meaningfully interpreted).

1. Additional questions maybe needed to get context

If all I ask you on phone is “should I turn right or left to reach your office” , you probably will ask me something in return like “are you coming from north or south”. Without this additional information, you cannot help me . Right or left is a precise answer , what is on my right might be on your left or right . Without extra information – you cannot help me with a precise answer .

2. You can infer all or part of the context from historical information .

Maybe you know from your morning commute that I could never be driving from south side on that street given that side of road is blocked for construction . So you can give me a precise left or right answer without asking me anything further.

3. Context can change with time

Perhaps turning right will be the shortest distance to your office , yet you might ask me to turn left since you know rush hour traffic going on now will slow me down . If I had asked you two hours later – you could have given me the exact opposite answer , and still be correct .

4. Multiple things together might be needed to provide context

It is very seldom that one extra bit of information is all you need to make a determination . When I called you during rush hour , if it was raining – you might have asked me to take a left turn so that I will get covered parking and a shuttle to ride to your office . On a sunny day, you could have pointed me to an open lot from where I could have walked a short distance to reach you .

5. Context is progressively determined

As the number of influencing factors increase – you have to determine trade offs progressively to arrive at a useful context . You might know exactly all the right questions to ask to give me the best answer , but if you were pressed for time – you could have told me an answer without considering the entire context . It would have been precise, but probably of limited use to me .

6. Context is user dependent

If I reached your assistant instead of you , she probably would need a whole different context to be provided before she could tell me which way to turn . She might have never taken the route you take to work , and hence might not have seen southbound traffic is closed off . She might not have realized it is raining outside given she was in meetings all day .

If I am your vendor and you know I am coming there to make a pitch that you have limited interest in – you probably won’t think through all the contextual information . If I am your customer – maybe you will go outbid your way to tell me not just to turn right , but also that the particular turn comes 100 yards from the big grocery store I will find on my right .

7. More information does not always lead to better context

If I over loaded you with information – you probably could not have figured out all the trade offs in the few seconds you have before responding . Your best answer might not be optimal . And if you take very long to respond , I might pass the place to make the turn and then have to track back – making it needlessly harder for both of us .

8. Context maybe more useful that precision

Instead of giving me a precise left or right answer , you might tell me to park in front of the big train station and wait for your company shuttle to pick me up. That was not the precise answer to my question – but it still was more useful to me .

This was just a simple question with only two possibilities as precise answers . Think of a question in a business scenario . “How are our top customers doing?” is a common question that you can hear at a company . However , you can’t answer that question in any meaningful way without plenty of context .

The eventual precise answer is “good” or “bad”. What makes the question difficult is that it could mean a lot of different things .
1. What is a top customer ? Most volume ? Most sales ? Most profit ? Longest history with company ? Most visible in industry ? Most market cap?
2. Who is asking ? CMO and CFO might not have the same idea on what makes a top customer
3. How many should you consider as top customers amongst all your customers ?
And so on ..

Information systems in majority of companies do not have the ability to collect context of a question . And hence they may or may not give useful answers without a human user doing most of the thinking and combining various “precise” answers to find out a “useful” answer .

That is a long winded way of saying “context is what makes precision useful”.

Ok I am done – let me know if this makes any sense at all , and more importantly whether it resonates with your idea of what context means

Should “talent” move on to manager/leader roles ?


My pal Chris Paine wrote an excellent post http://www.wombling.com/hr/should-you-be-a-manager/ which was in response to a rant on “talent” I posted a couple of days ago. He definitely got me thinking on two counts. Why did my post get more attention than usual ? and should “talent” become managers ?

First part, I think I have a simple answer. It has nothing to do with brilliance of my writing – several people have similar views and frustrations about “man vs machine”, and hence identified with the post.

I do not consider myself as “talent” – I am convinced the world at large won’t miss me if I disappeared from the face of earth tomorrow. My family (and dogs) will, my close friends probably will – but that is about it. My employer ( and past employers too, if I had continued to work there) will carry on with hardly a roadblock . I am not special – and I generally think vast majority of us are not that special when it comes to the context of jobs/careers. However, I do know many at my current job, and in most past jobs who will qualify as “special”. They will be missed, even if they can be replaced.

The fact that they will be missed should not be confused with them being irreplaceable. Most people can and should be replaced. And this is by far the criteria that differentiates “talent” that can lead, and those that cannot.

As individual contributors – it is a no brainer what talent can do, and what they want to do. But what makes them successful as managers and leaders are very different . As a leader, they have to learn to let go of a lot of things that made them special as an individual contributor. That is not easy. In fact it is pretty darn hard.

As a leader, you cannot let go of your primary skills for two reasons
1. For the most part, you might have to switch back to being an individual contributor because you cannot stand being a manager, even if you were good at it
2. To hire and retain talent, you need to be at a certain intellectual level to relate to them.

However, you cannot let your own ideas take center stage and shadow the ideas of your team. You need a heightened sense of self awareness to pull this off. And you have to deal with your own frustrations and your team’s frustrations while shielding the team and organization from each other. Did I say it is pretty darned hard?

And like every other leader – You have to be a cheer leader for your team, you need to be their biggest fan, you need to do their PR, you occassionally will even need to be their mom if situation warrants it and of course you will need to kick their butts too as needed.

Remember I mentioned about being replaceable ? That is key – mark my words. If you don’t have a trusted wing-man, you are doomed. You won’t go any place worth going. What is worse – you will spend the rest of your life bitching and moaning that your strategy is perfect, but there is no one to execute. I have no sympathy for such leaders. It is almost always something they can change. Your job does not stop after defining strategy – you need to make sure your team has someone who will drive execution, if you are not going to drive execution yourself. And you should constantly be looking for ways to remove obstacles for the people driving execution. A strategy that cannot be successfully executed is a bad strategy. This is true whether you are the leader of the lowest level team in the food chain, or if you are the person running the company itself.

A common pitfall for “talent who chooses be managers” is that they value loyalty way over performance. This is the one area where I think non-talent managers have a slight advantage. I have a hypothesis about this. Talent can out think others by a few moves. So when they make a decision that they have to explain to others – either they need a team that is at their intellectual wavelength, or they need a team that is super loyal to them and will rush to conquer the hill without questioning. If neither condition is met, it will frustrate the leader to no end. And since humans like the path of least resistance – they just tend to value loyalty more than performance. Over time, they inadvertantly surround themselves with people who won’t stand up to them. From that point – it is a high speed race to the bottom. Plain for everyone else to see and keep away from being roadkill, except for the leader and his immediate team.

Absolute power corrupts absolutely – and when that power is vested with “talent who chose to be managers” – the evil effect is a few multiples stronger.

Does that mean talent should never become managers ? No – I do think the world can use some more “talent turned manager” people. Why? because when they put their hearts into it – we will see a type of management/leadership excellence that is not commonly seen . There are a few things that could enhance their chance of success.
1. Have a support system in place that will challenge them on merit, not loyalty.
2. Let them experience management sooner in career – it takes time to develop people management skills. Leading 5 people is not the same as leading 5000 people through many managers reporting to you. Don’t let them wake up one day at the deep end- ease them in.
3. Along the way, keep the option to return to individual contributor roles if management does not work

That’s it – what do you think, Chris ? Over to you !

PS: After a long time, I typed a blog on my PC – not iPhone 🙂