As always – these are just my personal thoughts.
One of the questions that most of my clients are debating about is whether Open source LLM will ever catchup and over take the closed source models. It is a question worthy of some debate given there are serious investment decisions to be made and long term implications of all sorts to be considered.
GPT 4 is considered state of the art on code generation, language understanding and other important things where there are decent benchmarks – and it’s closed source. That’s probably the big reason why there is even a debate on whether companies should bet on Opensource at all.
My current thinking is that open source will win the enterprise in the fairly near future.
Consumer and enterprise requirements are very different. A good example is my own preferences. For my day job – I have been using a MacBook for a long time. It does everything I need – and it is easy to switch to watching a YouTube video or play a game. I also code – though no longer for a living. I have a marked preference for Linux in enterprise applications (which is what I focus on). I see no conflict between using MacOS and Linux for two different needs that I have. I am not sure how well the analogy extends – but I think enterprise world will not make it a requirement to standardize on whatever becomes the standard in the consumer world.
One big limitation today is cost – which is largely about the GPU hunger of training the model. Not a lot of companies will spend $100M on training a model. But I have to imagine that two things will happen
1. The cost of compute will come down rapidly making it affordable for open source models to catchup
2. The AI research will figure out a less expensive way to train models, which in turn makes it viable for open source models to compete fair and square
In any case – there are companies like Meta who are on the open source camp who have the ability to spend $100M on their models.
Also – assuming the near future is largely about sticking to the transformer architecture but making much bigger models, I think LLM will hit a plateau anyway. The kind of historical jumps in capability that we say till GPT4 came out – it’s hard to imagine that continuing for auto regressive LLMs perpetually into the future. That’s another reason why I think open source will catchup just fine.
Enterprise world is one full of narrow use cases. What’s important there are things like accuracy, compliance , security and so on. While enterprises will absolutely benefit from multi modal AI – there is plenty of value that can be extracted with just text based models. Given that case – I would think that an open source model that is fine tuned ( and most probably with RAG ) is the more viable approach for most companies. Also given GDPR like use cases – they will need work arounds on how to implement things like “right to forget” , “data localization” etc.
So what is the future of closed source models in enterprise ?
I can imagine enterprise software vendors creating their own models from scratch at some point mostly for regulatory reasons and then for their business model reasons – they might choose to keep it closed source. That’s the same pattern as perhaps a large bank creating a purpose built model from scratch. When the compute becomes cheap – a lot of more experiments will happen and it will be fascinating to see what models emerge.
I think the debate about open vs proprietary models will be settled within the next year or two – and as of now I am betting on open source being the winner. but that might not be the biggest issue for enterprises in the context of enterprises.
I think the biggest problem will be about scaling LLM based solutions for an enterprise use case. A POC is easy enough to do – including fine tuning and RAG. But a 4 week POC usually cannot be scaled to an enterprise use case in a straight forward way. I need to organize my thoughts on that – but I plan to write about it hopefully soon.
p..s. I should probably have clarified that the “guardrails” don’t really keep the dangerous/unwanted content out of those systems necessarily, rather they make it more difficult (though alas not impossible) for that content to be surfaced in responses and generated content.
LikeLike
Vijay, good stuff and important questions. I tend to agree with you about the promise of open source, but there are some caveats. A big one: will open source AI be able to keep up with a more stringent regulatory environment. Early versions of the EU AI Act raised some big concerns there, re: would only the deepest pockets be able to keep up with regulations? We’ll see. Of course some supposedly “open source” AI like Meta’s Llama have pretty damn deep pockets to keep up.
On the other hand I’d like to see a semantic debate on what constitutes “open source AI.” There is heated debate on whether Llama can be considered open source. In my view without some transparency into how these models are trained, how can they be considered open source? (Meta now refuses to explain how the latest version of Llama (Llama2) was trained, for example – is that a viable position for open source?) And can these open source LLMs be used without fear of the same type of lawsuits OpenAI and Microsoft are getting hit with? If you don’t know what’s in the training data, your legal team might not be too thrilled.
I think open source LLMs will benefit from particular strengths of use case. Example: Hugging Face’s Bloom has been trained on 46 languages and other LLMs have other specialty characteristics. I think the big open source appeal is: companies that have enough AI/data science/IT skills to embed open source AI into their own apps/processes. This is where open source licensing is very appealing.
But I think many companies will start by getting their gen AI from software vendors – either the familiar names or new startups. In those cases the appeal is you just need clean data, to be infused via RAG or other approaches. For example I did a use case with a smaller customer bot startup live in a production setting, with a customer facing bot. Their architecture allows them to choose between a number of LLMs with different strengths for different use cases. The customer doesn’t have to worry about that aspect. Other large enterprise software vendors have taken similar approaches. This means less concerns over open source and any questions on IP or liabilities – leave that to the external vendor 🙂
One final aspect is that not all these open source LLMs have the same “guardrails” as ChatGPT – the imperfect reinforcement learning based guardrails that keeps hateful and dangerous content out of those systems. Enterprises will need to be careful when evaluating open source LLMs that may not have such guardrails in place. Those guardrails are imperfect and sometimes counterproductive but it’s a point to be aware of. At any rate I think you are right that open source LLMs will be a major factor.
LikeLike