Will Open source win over proprietary LLM for enterprises ?


As always – these are just my personal thoughts.

One of the questions that most of my clients are debating about is whether Open source LLM will ever catchup and over take the closed source models. It is a question worthy of some debate given there are serious investment decisions to be made and long term implications of all sorts to be considered.

GPT 4 is considered state of the art on code generation, language understanding and other important things where there are decent benchmarks – and it’s closed source. That’s probably the big reason why there is even a debate on whether companies should bet on Opensource at all.

My current thinking is that open source will win the enterprise in the fairly near future.

Consumer and enterprise requirements are very different. A good example is my own preferences. For my day job – I have been using a MacBook for a long time. It does everything I need – and it is easy to switch to watching a YouTube video or play a game. I also code – though no longer for a living. I have a marked preference for Linux in enterprise applications (which is what I focus on). I see no conflict between using MacOS and Linux for two different needs that I have. I am not sure how well the analogy extends – but I think enterprise world will not make it a requirement to standardize on whatever becomes the standard in the consumer world.

One big limitation today is cost – which is largely about the GPU hunger of training the model. Not a lot of companies will spend $100M on training a model. But I have to imagine that two things will happen

1. The cost of compute will come down rapidly making it affordable for open source models to catchup

2. The AI research will figure out a less expensive way to train models, which in turn makes it viable for open source models to compete fair and square

In any case – there are companies like Meta who are on the open source camp who have the ability to spend $100M on their models.

Also – assuming the near future is largely about sticking to the transformer architecture but making much bigger models, I think LLM will hit a plateau anyway. The kind of historical jumps in capability that we say till GPT4 came out – it’s hard to imagine that continuing for auto regressive LLMs perpetually into the future. That’s another reason why I think open source will catchup just fine.

Enterprise world is one full of narrow use cases. What’s important there are things like accuracy, compliance , security and so on. While enterprises will absolutely benefit from multi modal AI – there is plenty of value that can be extracted with just text based models. Given that case – I would think that an open source model that is fine tuned ( and most probably with RAG ) is the more viable approach for most companies. Also given GDPR like use cases – they will need work arounds on how to implement things like “right to forget” , “data localization” etc.

So what is the future of closed source models in enterprise ?

I can imagine enterprise software vendors creating their own models from scratch at some point mostly for regulatory reasons and then for their business model reasons – they might choose to keep it closed source. That’s the same pattern as perhaps a large bank creating a purpose built model from scratch. When the compute becomes cheap – a lot of more experiments will happen and it will be fascinating to see what models emerge.

I think the debate about open vs proprietary models will be settled within the next year or two – and as of now I am betting on open source being the winner. but that might not be the biggest issue for enterprises in the context of enterprises.

I think the biggest problem will be about scaling LLM based solutions for an enterprise use case. A POC is easy enough to do – including fine tuning and RAG. But a 4 week POC usually cannot be scaled to an enterprise use case in a straight forward way. I need to organize my thoughts on that – but I plan to write about it hopefully soon.

How do you handle stress at work ?


I spoke with about 50 colleagues yesterday – quick calls to thank them for all they do and to wish them happy holidays. In many ways, 2023 was a stressful year for many of us and it came up in various forms in our conversations. I was asked by some of them on how I handle work pressure. After a couple of coffees in the morning, I thought I should offer some perspective on this topic.

1. Have safety valves – do not stew in pressure

You need some proven ways – note the plural; one might not be enough – to destress. For me those are – long walks, calling one of my trusted friends, and Carnatic music. There is a 6 mile circuit that I reserve for such walks and it invariably helps calm me down when I need it.

2. Put boundary conditions if you are constrained on time on high risk decisions

I have a mental model that tags the consequences of my decisions two ways – what’s the chance of the risk happening and what’s the impact it happens. Stress gets in the way only on decisions where the risks have a high chance of materializing and the impact is high if it does play out.

Pressure is inevitable when you have to make such decisions with imperfect information within limited time. If I have to take such a decision – I insist on reasonable boundary conditions like “ok let’s do this now – but here is how we will check if it works and we will stop in a month if it doesn’t trend like our hypothesis”

3. Form follows function

I prefer a rigorous debate than a well structured document as the basis of a decision. I do like documenting decisions once we make it though.

Similarly – some metrics have a way of hurting decisions as much as they help. People lose sight of the principle behind the metric and sometimes become a slave of ratios etc to make decisions. So I prefer asking a lot of first principles based questions before I make decisions – and it helps minimize stress because it makes everyone involved think logically.

4. WHY is the critical question, WHAT will follow

Leaders create stress unintentionally when they ask the team something without explaining why they are asking. The more senior you are – the more the risk of you creating unnecessary stress in the organization.

If you have taken the time to hire, train and manage performance of your team – you should have the confidence that if you explain a problem, your team can find solutions. Your job is to find the right question to ask and to explain why solving it is important. If you don’t have the confidence in your team – shift left and solve for the quality of your team as quickly as you can.

5. Eliminate and simplify

Not all problems need a solution with high quality and they don’t all warrant similar effort. Even a given problem can usually be decomposed to find what’s the critical part and what can wait. Eliminating the noise and simplifying the problem statement goes a long way in eliminating stress.

Don’t assume that the person asking has thought through all aspects before asking you. So you should feel validating whether that you think is the crux of the issue is what they also think. I am always grateful when I am challenged by my team when I ask a question. I am very comfortable asking clarifying questions to my boss as well.

6. Know the audience when you communicate

Even the best decisions can lead to additional stress if you don’t think through the answer from the recipient’s point of view. If I have to convert the same information to both the executives in my team and to new hires – I often will need to say it differently. Similarly the response to a client might look differently when it is addressed to the buyer vs someone who is the actual user.

I am convinced that more stress is caused by poor communication than poor decisions.

7. Keep shifting left and make as many things a routine as possible

I used to train dogs for high level competition. Before we start training – my dog and I will go through a certain routine which gets us both into the right frame of mind. Routine helps minimize stress and keeps us focused. High level sports people all have routines they follow.

I have routines in personal life – I wake up at the same time most days, make an espresso, do a two minute training session with my dog, and solve the daily Wordle and then go for my walk. It helps me get into the right frame of mind for rest of the day.

At work – processes don’t die a natural death. That’s a real curse. They don’t even evolve very much and people start hiding behind processes as a their safety thing. Even when they criticise these “bureaucratic processes” – they take it for granted that it’s sacrosanct. A massive amount of stress happens because people don’t kill irrelevant processes. Your job as a leader is to make sure that every process gets critically evaluated and either eliminate or work around the ones that don’t add value.

Similarly – the moment you have a stable solution, make it a routine so that people don’t need to waste their time thinking about it deeply every time. All I would caution is “premature optimization” kicking in. Especially in large firms – there is a tendency to institute process upfront in a top down way. It almost never works – the better idea is to experiment, evolve and then standardize the process.

The shift left often ends up with hard questions about your skills as a leader and whether you have the right team around you. Don’t sit and stew on those – make changes when you are convinced that there is a problem. Bad news doesn’t turn for the better with time without interventions.

8. Learn from everyone around you

There is no monopoly for good ideas. If the marketing team has a good process for recruiting – shamelessly steal it and adapt it for engineering teams. It is much better for one person to solve it and others to adapt than all functional heads stressing over solving common problems from scratch.

I am sure there are more things I should call out – but it’s time to drive to the gym, so I am not going to stress over the rest for now 🙂

Why I don’t worry about AGI … yet


The recent OpenAI drama triggered two debates on social media – one on corporate governance and the other on AGI. I was quite surprised – and amused – by the number of people who have jumped to the conclusion that AGI is already here or very close to being here.

I don’t think AGI is a near term thing at all. Also to be clear – I am a big fan of AI but I don’t think at all that AI needs to work exactly like a human (or better than a human) to be of massive value to our every day life. Similarly I don’t think we should sit around waiting for AGI to put some safeguards in place – less sophisticated AI still has massive chance to cause hurt because of the ease of global distribution of software.

There are a few reasons why I don’t think we will get to AGI by doing more of what we already do – like having bigger foundational models, even more compute , having even more training data and so on

To begin with – the basic idea of building an AI solution is to feed it a lot of data. For example – for language based models, training it on all of Wikipedia is a common first step. And that’s not nearly enough – on top of it, these models are fed millions of more tokens. Compare that to how a well educated human learns – no one reads the entire Wikipedia to get a PhD. Humans learn from a small amount of data . A highschool English teacher teaches critical thinking and analytical writing often based on just one book. We then can expand that to every other source of information we get later without needing explicit lessons. When we read a new book – we don’t need to think through every book we have read to form concepts. We are way more efficient in how we learn compared to a machine . But the way machines are taught – it doesn’t mimic how humans learn.

One counter argument is that a machine has a cold start while a human has the advantage of a long evolutionary history and hence some information is already present in our genes/brains . But even if that’s true – humans still didn’t have access to as much info as the machine readily has. Basically – we assimilate and store information differently to machines – and access it differently when we need it.

Humans can get started quickly with very little information. My daughter when she was three years old could recognize animals at the zoo based on the cartoons she had watched. She never confused a bear for something else because the red shirt on Winnie the Pooh was missing on the live bear 🙂 . She knew dogs and cats are animals – and naturally figured out that elephants and lions are animals too.

Also, humans can abstract information across modes of information without special training. Whether I see a sketch of a car, an actual car parked on the street or a car moving in a high speed chase in a movie – I know it’s a car and how it generally works. When I throw a ball up and it comes down – I can relate to the concept of gravity from my middle school lesson even though the example used was of an apple falling on Newton’s head. GenAI has started becoming multi-modal – but not in the way humans do. This is of course a simplistic way of looking at how a human thinks and acts – we have not yet quite figured out the details of how human brains work.

How do we find answers when we are faced with a question ? Let’s say you ask me what’s 121 squared. I don’t know on the top of my head – but I know how to calculate it , and I also know how to approximate it without a precise calculation. But if you ask me what’s 12 squared, I already know it on top of my head. AI only knows the latter way as far as I can tell. An orchestration of several computing techniques could potentially solve these kinds of problems – but learning from a sequence of tokens alone probably won’t get us there.

One last point on what “general” means in the context of intelligence. There are some things that a computer can do faster and more efficiently than a human can. If we can draw a boundary around the problem – like a game like chess or go – a computer has a higher chance to figure our optimal answers compared to us. B

Where humans excel is in generalizing as context changes. As AI research makes breakthroughs in how machines plan, set goals and think about objectives – I am sure we will see massive breakthroughs . And at that point – perhaps AGI might be something more of a reality. I am not an AI researcher – I am just a curious observer . I will happily change my mind as I get more information. But for now – I am not worried about AGI becoming a thing in near future.