Your chatbot is leveling up, but who’s keeping the data up to date for our LLMs?
Cambridge Dictionary’s word of the year has been announced as hallucinate and, no, sadly, it’s not symbolic of the unfathomable rise in the cost of our weekly food shops. AI is here and the chatbot world is advancing – and is doing so faster than you can utter: “Can I speak to a real person?”
Thanks to the leading rallies of the likes of Google and Microsoft, gone soon will be the days of the frustrating and unhelpful answers from old customer-facing bots with those static and glitching preprogrammed responses.
On team Microsoft, OpenAI’s GPT LLMs kick-started a revolution, boosted by its Copilot technology. While the other vendors scrambled in the chat boom, Google was quick to follow, bringing its generative AI workflow on the VertexAI platform with PaLM and Codey to the table.
But as the word of the year suggests, there is still some way to go before the untrustworthy reputation following AI’s rise to fame is cut adrift and, you guessed it, it’s all down to data – having the right, best-quality, reasonable shelf-life data and the most mindful tools to cook it up into something digestible for users.
Chatting with bots in real-life
For far too long, we’ve been forced into teeth-pulling communication with unhelpful, though try-hard, chatbots. But, at a recent Chatbot Summit London event, the big vendors gathered to show just how far the technology has come since the days of outcasted Clippy.
Microsoft demonstrated how the cruise line, Holland America, can create chatbots (aka Power Virtual Agents) with no code in sight by leveraging generative AI with Copilot guiding the way. Automatically generating answers to customer questions, responses are formed from the firm’s real-time data sources across its public website and documents uploaded to its AI capabilities page. Everything from ship plan layouts to the latest destination port activities are available in seconds for end users.
Integrating with custom action plugins, with still no need for any hard coding from Holland America, the bots can even utilize the GPT power conversation engine and build an action plan for making customers a last-minute booking to the Bahamas. It goes as far as to loop in a weather plugin to offer up the forecast.
The bot answer changes are automatic, because it’s software as a service, and are customer facing in minutes. There’s no infrastructure to host – Henry Jammes, Microsoft
Henry Jammes, conversational AI principal PM, Microsoft, says, “There’s no LLM model training needed. There’s no need to try to anticipate every conceivable user question and build a reply for it by hand and there is no need to continue to keep syncing content with your knowledge sources. So, if your store opening hours or return policy change and the website or knowledge source gets updated, the bot answer changes are automatic, because it’s software as a service, and are customer-facing in minutes. There’s no infrastructure to manage or host.
“We actually have companies who have built internal bots in a matter of weeks, just by pointing the bots to their SharePoint sites, uploading their employee handbook, policy manuals. They authored a few topics around the key topics that they wanted to manage. In the future, you can imagine authenticating a user and dynamically expanding the corpus with their own specific contents, like their own insurance policies, their billing statements, contracts, etc. and enable multi-turn chat for your data.”
Microsoft now boasts over 100 Copilots and Jammes states that within just three years from now, a third of work experiences will be conversational AI-enabled and 80 percent of company and direction will be fronted with digital consumers and virtual agents. He also estimates that 750 million apps will need to be built by 2025, with 50 percent of digital work expected to be automated with the current technology available today.
All you want is not a gibbering chatbot to tell you a ‘step one, step two’, but rather ‘give me the right person, right away to understand’ – Kevin Lee, BT Group
Elsewhere, BT Group’s chief digital officer, Kevin Lee, shares the telecom’s journey to building AIMEE, an in-app messaging and automated assistant powered by generative AI, developed through BT Group’s partnership with Google Cloud and available to its EE customers via Android app and web.
Lee gives the example of how BT Group’s chatbot has developed, growing from a “limited skillset” at its 2021 launch, to digesting over 1.5 million live customer transcripts, to then powering over 3.7 million customer conversations in 2023, with the goal of reaching beyond 400 million in the coming years.
Speaking of one customer request in particular, Lee shares the narrative of how AIMEE would respond to a message from a customer about her son’s “smashed” phone, overcoming the archetypal over-baked and robotic niceties. For this situation, Lee says, it was important that AIMEE was “able to do a context switch”, to realize that there was “a complex, highly emotive context to ‘smash my phone’ – who would be smiling? You’d be angry, quite annoyed, quite frustrated. All you want is not a gibbering chatbot to tell you a ‘step one, step two’, but rather ‘give me the right person, right away to understand’.”
In this case, AIMEE gathers just enough information from the customer to book them a time slot at the nearest store to the customer’s zip code and gets them an appointment the same day.
The likes of AIMEE surely shows promise for just how far chatbot technology has come, as so much of this human interaction business, especially in business settings, comes down to knowing how the user wants to be approached.
One of the problems with a chatbot is that it’s static. I don’t think you can, at least not yet, design them to really target the recipient they’re actually talking to, this individual – Elizabeth Stokoe PhD, London School of Economics and Political Science and Loughborough University
As Elizabeth Stokoe PhD, researcher and professor of social interaction, London School of Economics and Political Science and Loughborough University shares, it’s about looking beyond the facts and figures to collect that human data that no one’s actually tracked very much – the interesting small things that make a difference to how engaged the person is and how willing they are to talk.
“I think one of the problems with things like chatbots is that each of the things that it’s doing is kind of static,” Stokoe explains. “I don’t think you can, at least not yet, design them enough to really target the recipient that they’re actually talking to, this individual, and take the evidence of what you just said and tweak its responses again and again, where everything is agile, because that’s how humans do it. So, for me, my optimistic self is saying we need to leverage more of the stuff that we know from conversational research… we learn from those principles and put them into a much better, agile conversational product.”
Stokoe’s research has led her to analyze recordings of real interactions, from sorting car breakdowns and dinner reservations to more life-threatening instances of suicide crisis negotiation and domestic abuse.
“Some of these things are great in theory, but as soon as you start to look at them actually working in practice, you start to see that it’s not even whether or not these chatbots and digital twins can actually do it, it’s more that I’m not sure that we’re looking at humans enough to see what they’re actually doing to make those products really effective,” says Stokoe.
“An even more powerful example of this is I’ve been doing some work on the victims of domestic violence calling 999 when maybe the perpetrator is co-present with them in the building. The victim cannot ask for the police, but they have to sound like they’re genuinely calling the police and not just a nuisance call. Somehow conveying to the call taker without using any of your obvious words that they actually are genuinely in trouble. People do it amazingly well. But if you get ChatGPT to roleplay the police dispatcher, it just treats the caller as a hoax caller and it just cannot pick up on the nuance.”
Moreover, it also comes down to how effectively the bots’ finer details can be tested, traced and updated, and what an enterprise measures as a quality outcome. Guardrails for data protection are already being put into place and domain-specific LLMs are helping companies keep their data private. The big vendors are also ensuring customer data has citations attached to find its source, with Microsoft and Google saying they are investing large sums into governance and responsible AI to check for any harmful content, non-inclusive language and hallucinations.
But, though the data is locked in, who is responsible for making sure the AI algorithms are pulling up-to-date and unproblematic data? Plus, going beyond the saying “garbage in, garbage out”, what happens when something nuanced needs to change within the inner system workings, deeper than just updating opening times or FAQs?
Being factual is not enough. It needs to be factual to your own documents and data – Yariv Adan, Google
Who is responsible for keeping the LLM data up to date?
As Google’s senior director of product management, lead for cloud conversational AI, Yariv Adan, explains: “Being factual is not enough. It needs to be factual to your own documents and data. We actually even have specialized coders for specific industries, for example, for medical or for security that actually understand the domain deeply. We’re hosting the model, the entire stack work for it all, our fine-tuning, our adapter models, all the debugging tools. We’re of course connecting it to search and other technologies that for every statement they validate that is true and they can actually give a citation of sorts to support it inside or outside your organization.”
One generative AI risk specialist, who wished to remain unnamed, urged companies to check and hold onto the copyright wording on any AI dealings, with the likes of Microsoft and Adobe saying they will deal with any legal proceedings that arise from the use of the technology.
Self regulation has never worked in human history. When you go about dealing mindfully and carefully with new technology, I think this is something that we need to think of now – Max Ahrens PhD, University of Oxford and Alan Turing Institute
Max Ahrens PhD, NLP researcher, University of Oxford and Alan Turing Institute, says it all comes down to ensuring alignment: “Alignment makes it make sure AI has the same preferences. Right now, we’re at a stage where AI has no consciousness. It’s just replicating patterns according to focusing systems.
“If technology develops the way it has over the last four years, we will get to a point where we have quite high levels of autonomous agency and automation and we need to think about having some checks and balances in place, especially specifically for the financial sector or any type of highly regulated and highly sensitive data. We do need third parties to do the assessment to say, alright, we shake the model, we do the stress tests, we try to shed a light on the situation and we say whether this model passed it or not, because you should never mark your own homework, like Open AI shouldn’t say ‘we did the test, it’s fine’.
“Self-regulation has never worked in human history. When you go about dealing mindfully and carefully with new technology, I think this is something that we need to think of now, as we’re actually in the process of scaling this out. What means good, what means bad – and that conversation needs to happen now.”
For Ahrens, the answer is to have an autonomous third party who can test the more in-depth AI models, so that they’re certified and allowed to operate in society, in order to avoid the risk of an unsuitable or outdated use case.
“You might even think of a certain insurance around this,” he says. “The same way there’s a professional liability insurance for humans if they do something wrong with the job, you just want to make sure that this risk that is inherently there – no system is perfect – that you allocate it correctly.”
Admin tasks might still be bots’ best bet
The models and data fueling these chatbots have no doubt come a long way, but what’s still most useful for this technology for now is organizing simple administrative tasks – with many of them across sectors remaining unchanging and less risky for this new tech.
Google’s Adan explains, “A lot of flows the bots do are repetitive across everywhere. They are definitely one of the most repetitive across industries; everyone needs to authenticate. Many industries need to deal with bills or check out status or make payments or do calendars. So, all of these were actually already built for you in very safe and controlled ways, very robust, and you don’t need to build that.”
The ability to create unique tasks is also offered, and Adan offers the example of ordering pizza or coffee. In this instance, with limited possibilities and outcomes, there is less need to start training and aligning from scratch. Plus, as Adan reveals, “these bots are much faster and more accurate than any human” at doing those tasks.
For Ahrens, it’s a similar case, as he says, “I think the first wave will not be pure automation and replacement, it will be augmentation. Most of what you see, lots of the services are actually to augment a human, to take the tedious tasks away, so they have more time to do actually meaningful tasks, which makes me very, very hopeful that this entire evolution / revolution of generative AI will come to the betterment of humanity and now it’s for us, specifically for the enterprise, for corporates to alter generative AI strategies to make that holistic and to think about it globally.
“We used to have very small, very intelligent models where we put in this prior knowledge and then we just said, ‘oh, no, it’s through the transformer model – no prior knowledge needed – you just learn from data’. And I think our next step will be where we need to still feel our values, a constitutional AI where we design the alignment first.”
The future will no doubt see ongoing advances in this already much-improved technology. But when it comes to choosing the best provider now, BT Group’s Lee says it comes down to doing your research and spreading out your options.
Lee explains: “What we have done over the last six months as a company is to go out, with our ego at our doorstep, to talk to all our partners, not one but multiple LLM partners out there in the world. Why? Because there is no such one LLM today where you can put all your eggs in that basket. Sometimes you can get in this tunnel vision in a good way – to build that technical depth, building, briefing and iterating it, but only when you look at the bigger world, do you see how it is changing.”
As the tools businesses use to harness data continue to develop, it’s becoming increasingly important to also improve the tools used to monitor and measure successful outcomes. Yes, as Lee says, you can’t afford to put all your eggs in one basket, but also, vendors, organizations and regulators must ensure they can assess these GenAI ingredients against the same criteria so that proper decisions can be made on costs, responsible AI and governance.
In our current appetite for advancement, best practice now can quickly become out of date in six months’ time – and one thing that’s consistent here? No-one wants to ingest something past its use-by date.