I spent half this week at CES 2017 in Las Vegas !
To say the least, it puts the “enterprise” side shows to shame in number of people it attracts, variety of solutions it offers and how boldly the future is thought about. It did not take any time to see that the future is all about AI – and how expansive the definition of AI has become.
There were bots of all flavors there – but voice was the major interaction media, and it was hard to walk the floor without hearing “hey Alexa” type conversations . Also noticed a lot of VR and AR. I walked away thinking voice will rule the consumer world for a while, and between VR and AR – I will bet on AR having more widespread use. While VR based video games are indeed cool – putting on something on your head to use technology makes me wonder how many will actually use it. Like 3D televisions – where you need special glasses, and hardly anyone uses it that I know.
The generation of products using AI that I saw (admittedly I only saw a small fraction of the HUGE show) barely scratched the surface of what is possible. If I think of what I saw with my engineering hat on , it is something like this
- Human voice or text waking up the AI service ( “hey Jane” )
- A natural language based request ( “When is my next meeting” )
- Voice to text translation as needed
- Intent and entity extraction ( me, my calendar, current time, read entry)
- Passing it to a structured API ( calendar.read ) and get a response
- Convert output to a string ( “your next meeting is in 2 hours with Joe” )
- Text to voice translation
- Keep the context for next question ( “is there a bridge number or should I call Joe’s cell?” )
This is easy stuff in general – there are plenty of APIs that do stuff, and many are RESTful. You can pass parameters and make them do stuff – like read calendar, switch a light on , or pay off a credit card. If you are a developer – all you need is imagination to make cool stuff happen. How fun is that !
Well – there are also some issues to take care of. Here are 5 things that I could think of in the 1 hour in the middle seat (also in the last row, next to the toilet) from Vegas back home.
Like say security – you might not want guests to voice control all devices in your house for example (which might not be the worst they can do, but you know…). Most of the gadgets I saw had very limited security features . It was also not clear in many cases on what happens to data security and privacy. A consistent privacy/security layer becomes all the more important in the AI driven world for all APIs.
Then there is Natural language itself. NLP itself will get commoditized very quickly. Entity and intent extraction are not exactly trivial – but its largely a solvable problem and will continue to get better. The trouble is – APIs don’t take natural language as input – we still need to pass unstructured>structured>unstructured back and forth to make this work. That is not just elegant – and it is not efficient even when compute becomes negligibly cheap. Not sure how quickly it will happen, but I am betting on commonly used API’s should all have two ways of functioning in future – an NLP input for human interaction, and a binary input for machine to machine interaction (to avoid any needs to translate when two machines talk to each other) . Perhaps this might even be how the elusive API standardization will finally happen 🙂
If all – or most – APIs have an easy NLP interface, it also becomes easy to interoperate. For example – if I scream “I am hungry” to my fridge, it should be able to find all the APIs behind the scenes and give me some options and place an order and pay for it. And my car or microwave should be able to do the same as well and I should not have to hand code every possible combination . In future APIs should be able to use each other as needed and my entry point should not matter as much in getting the result I need.
Human assistants get better with time. If an executive always flies by American Air, when she tells her assistant to book a flight, the assistant does not ask every time back “which airline do you prefer” or “should I book a car service also to take you to the meeting when you land”. The virtual assistants – or pretty much any conversational widget – I saw this week had any significant “learning” capability that was demonstrated. While I might enjoy having a smart device today since it is a big improvement from my normal devices – I will absolutely tire of it if it does not get smarter over time. My fridge should not just be able to order milk – it should learn from all the other smart fridges and take cues from other data like weather . In future, “learning” should be a standard functionality for all APIs – ideally unsupervised.
The general trend I saw at CES was about “ordering” a machine to do something. No doubt that is cool. What I did not see – and where I think AI could really help – is in machines “servicing” humans and other machines. For example – lets say I scream “I am hungry” to my fridge. The fridge has some food in it that I like and all I need is to put it in the oven. So fridge tells the oven to start pre-heating – and gets no response in return ! Telling me “the oven is dead” is a good start – But the intelligent fridge should be able to place a service order for the oven, as well as offer me an option to order a pizza to keep me alive for now. APIs should be able to diagnose ( and ideally self heal ) themselves and other APIs in future – as well as change orchestration when a standard workflow is disrupted.