If the hype is to be believed, then GenAI is the answer to all questions these days, isn’t it? We liberally use GenAI every day even though we know that hallucinations are largely unavoidable.
Two of the most popular use cases in my line of work are
1. Code generation
2. Text summarization
None of us who work with GenAI tend to position it as “autopilot”. We know that it needs humans to be in charge. That’s why GenAI solutions are usually called “copilots” and “assistants” – it has fundamental issues on accuracy and reliability which need humans to rectify and sometimes just ignore. It could generate code that might be syntactically correct but not doing what it is intended for. It could write up a creative summary that makes up invalid data – like nonexistent citations and so on.
To be fair – humans make up stuff all the time too. We often act without thinking through stuff in any detail – but if asked to explain our action, we can usually come up with a convincing explanation. For example – I just returned from my walk. I could have taken a dozen different routes today and I have no clue why I chose the route I walked. But if you asked me – I could come up with some decent answers like “this is the route that has the least amount of traffic this time of the day”. This answer is a believable one for most people and they have no reason to believe that I just made it up looking backwards.
It is relatively straightforward to verify whether my explanation was factual or not if someone wanted to spend the time doing it – and it doesn’t need as much creativity as I needed to come up with a creative answer.
If we extend this concept back to GenAI – it’s not a stretch to see how it’s a lot more efficient (and quite valuable) to use the tools to validate code and text summaries, than creating them in the first place. It takes hardly any creativity to check if a citation is valid compared to creating a fake citation that comes across as realistic. Similarly – it’s a lot easier to create a comprehensive set of test cases for a given code base than creating the best code to solve a given problem.
When I explained this thought to a friend last week in India – the pushback I got was that he didn’t think a system that does a less than stellar job of creating code can be that good at testing.
I think this lack of trust is a bit misplaced
1. let’s say we are building a plane with the best engineers on the planet – half of them doing the build and the other half doing testing. Would we be satisfied with only the engineers who are qualified to build as testers? Or would we ask for engineers who are test specialists? And in any case will we trust it till a pilot actually flies the plane? The ability to build a great plane doesn’t translate directly to the ability to test it thoroughly – or vice versa. And there is no necessary constraint that the same person needs to be an expert in both
2. AI is a lot of more useful when boundary conditions are known – which is the case when all you need to do is validate specific things. In fact you can use a lot of deterministic techniques – and generally improve computing efficiency – when you have specific boundaries to the problem.
I absolutely think that over time GenAI will largely overcome its deficiencies on accuracy etc. But we don’t have to wait for that to make it useful for us – and validation use cases might be one such high value pattern.
I am curious to hear your thoughts about this. Pls leave a comment if you could.