For the past few months, we at Sanlam Studios have been hard at work developing a fully generative conversational ‘Credit Coach’ that will be able to assist people with all things related to credit. We’ve been through several internal iterations, making leaps and bounds in usability and functionality, and our internal testing went well.
The relevant penetration and security testing was passed and we’ve set up all the right cloud environments and now we’re ready to go live. We’ve become surprised by an all too common reaction to our go-live plans - namely that taking something fully generative into the public, under a well-known and trusted brand, is somewhat crazy.
While developing this agent, we have learned that most corporate use cases being explored in the space are either internally facing or highly ‘menu-driven’, which results in far more conversational control.
Internally Facing AI Agents
As the name suggests, these would be agents designed to assist existing employees of a company in doing their jobs. This would include agents behaving like “co-pilots”, agents automating proposed social media replies, and agents that can be a source of immediate knowledge in a customer servicing-type environment.
It is not a trivial undertaking to build an effective internal agent, but the stakes are naturally lower, for a few reasons.
The users of the agent can be expected to use the agent in good faith and should have some existing knowledge that enables them to ask effective questions. It’s unlikely an internal user of a system is going to try to engineer inappropriate answers for the LOLs.
If the AI agent does make a mistake, or create a hallucination, there is still a ‘human’ filter through which it must pass before there are real-world consequences from this error. The skilled human users would be expected to catch the vast majority of faux pas that emerge from the AI agent.
Menu Driven Chat-Bots
When it comes to client-facing chatbots, the existing standard is a fully menu driven conversation architecture. This means that a user is presented with a set of distinct options, and their choice will dictate the next step of options presented to them.
Any of us who have experienced these bots in the wild can agree that the experience is at best underwhelming, and at worst infuriating. This is especially true if you, as a user, have a non-standard query or request.
These chatbots can not reply in a human-like way and lack the ability to understand and handle any ‘free text’ request you may make. On the plus side, however, these chatbots cannot hallucinate, cannot give you incorrect information and will always talk in a sanitised manner. It’s no doubt that these types of bots are ‘safer’ to deploy.
Why are we launching ours?
We know that this AI agent we’ve built is not going to be perfect. We understand that there are some real risks we take with something fully generative, including:
Creating hallucinations
Going off topic into areas it really shouldn’t
Being susceptible to creative prompt engineering
Becoming abusive or crossing ethical boundaries
Dealing with private information
Ambling into the domain of regulated financial advice
Being unable to understand what a user is asking
We have put in extensive efforts to manage these risks, but we also understand that they cannot be eliminated altogether. In this space, we believe that a goal of “perfect” means that you will never launch anything. Ever.
Our internal heuristic has become that the benchmark of our AI Agent is a human being. Fortunately, people are fallible. They can give incorrect information, they can become emotional and say things in the moment that they shouldn’t, they can accidentally give advice and they can be manipulated.
This logic does make us feel better, but one aspect is still bugging us - and that is speed. While our agent may take over 30 seconds to return an answer to a complex question, we’re used to better speeds thanks to ChatGPT. A human agent wouldn’t be able to compete with the speed of our AI Agent, and yet it still feels slow. So if you experience that…we know. We know. And trust us, it’s necessary given the cognitive tasks we’re asking of our agent. And we’re working on it.
What do we hope to achieve?
In designing and building our AI Credit Coach we have made dozens of conscious and unconscious assumptions - predominantly around expectations on how people will interact with the coach and how we want the coach to respond across different presented scenarios.
However, the reality is that we simply do not know how people will really use the coach, what kinds of questions they may ask, how they will talk or how they will respond to the coach’s replies. For success in this space we absolutely need to know this so that our future designs and iterations can adapt to real world usage in order to drive better consumer experiences and better consumer outcomes.
There is only one way to learn this lesson, and this is by letting the agent loose, with our hopes held high but expectations muted.
People may hate talking to it, may land up in linguistic dead ends and may not be able to achieve their goals with it. We hope not, but we do need to know this is happening if and when it might.
We have put in an extensive monitoring and assessment capability, which will be the topic of another article later. Being able to see how people and our coach are interacting with each other is going to be crucial to learn and improve rapidly in this space.
So in advance, we apologise for any coach misbehaviour. Please understand, it’s just a child.