Evaluating what makes a great chatbot – objective and subjective factors

Skyline view of Paris with Eiffel Tower in background.

As the need for automation increases, are chatbots living up to their potential? In his second insight piece on the use of AI chatbots, Kaveer Beharee, CEO at Ubiquity AI looks at how to evaluate chatbots and intelligent conversational agents for commercial use and why communication as much as technical expertise will be key to wider adoption.

The mass adoption of artificial intelligence (AI) technologies hinges on one attribute more than any other: usefulness.

The branch of AI that immediately stands out in its ability to bring companies/governments closer to their consumers/constituents to solve complex problems in a post-pandemic world is that of AI chatbots and their ability to fulfil core tasks, such as addressing customer queries, executing transactions, and providing timely and reliable access to real-time information.

Accurate fulfilment of consumer/constituent queries, great user experience, reliability of service and resolved needs proved really difficult during the various Covid-19 pandemic lockdowns, judging by countless stories of clogged call centres and stories of people unable to get proper advice or help.

As different countries restore their services and move towards a ‘new normal’, chatbots present a significant opportunity for governments and companies at this time. Chatbot technology offers a pertinent way both to improve and automate engagement functions, and enhance relationships, while also dealing with social distancing and other challenges all governments and businesses are having to navigate through.

Commercial chatbots have been around for a few decades now, but despite surveys from Oracle and other tech providers showing significant commercial interest in chatbots, market adoption is low. While well-designed chatbots will increase engagement, customer reach, save money and execute complex tasks, poorly-designed chatbots can be extremely frustrating and present major risks, including reputational risk.

What makes a good chatbot a good chatbot?

One of the most important frameworks that savvy developers use to measure the efficacy of their bots is PARADISE (PARAdigm for DIalogue System Evaluation). This is a general framework for evaluating spoken dialogue agents, which has been discussed and explored in many theses and post-doctoral studies such as this one from AT&T Labs, titled PARADISE: A Framework for Evaluating Spoken Dialogue Agents.

The PARADISE framework has two broad metrics:

  • The first part seeks to objectively measure task efficacy – maximising task success relative to dialogue costs.
  • The second part focuses on subjective user ratings around ease of use, friendliness of the chatbot, how naturally the chatbot comes across, clarity and the user’s propensity to use the chatbot again.

The first part of the framework (the objective measure) fundamentally probes the business case supporting the need (or lack thereof) for a chatbot. The fundamental question here is: does the chatbot fulfil tasks more effectively and more cheaply than our current processes?

I am happy to stick my neck out and state that in almost all cases, well-designed chatbots will be cheaper compared to other direct engagement channels. Chatbot costs refer to both efficiency costs, including the system costs of successfully executing a task, and qualitative costs, relating to aspects such as incorrect responses, re-prompts and reputational costs.

However, a chatbot’s ability to effectively fulfil tasks is a complex issue and arguably the main reason companies are not rushing their chatbots out to market. During our lockdown, I was very pleased to find two of my service providers – my mobile carrier and my health insurer – both rolled out chatbots.

I used their chatbots just once and won’t be using these chatbots any time soon. Apart from neither bot fulfilling reasonably standard queries, which they supposedly designed for, I found it much easier and less frustrating just picking up the phone and speaking to an agent on the other end of line.

So, what makes a good chatbot a good chatbot? There is no doubt that task efficacy and cost issues are main drivers for commerce but objective factors are only part of what makes a successful chatbot for the commercial market.

What makes a chatbot work for humans?

This leads me to the second part of the PARADISE framework, which centres around usability. For any chatbot designed to replace or augment human roles, retention on the platform should be a company’s top priority.

In my experience as a chatbot developer, the first objective, task-oriented evaluation is reasonably simple. You design a chatbot for a task, be it transactional, informational, etc, and develop that capability until the bot is proficiently executing those tasks.

The subjective measures are markedly more complex to bed down.

The first significant hurdle in maximising usability is ease of communications. The main consideration here is: is your chatbot designed to adapt to your users or are your users required to adapt to the chatbot?

In almost all cases I have come across, chatbots are built around the latter. This is problematic for the future of chatbots. If chatbots cannot engage effectively and pleasantly, your chatbot will simply not be appealing.

This underscores the need for digital communication professionals who understand how to design chatbot conversations and communications that work for the human at the other end of the line. With the rise of AI technology, I predict there will be a marked increase in demand for human-machine conversation design skills, especially once companies and developers realise that the future of chatbots does not hinge on technical expertise but rather communication science and those proficient in natural language technologies.

Technical communicators, in particular, are well suited to fulfil the following critical chatbot development tasks, which include:

  1. Chatbot training – although more sophisticated developers use highly automated deep-learning to train their chatbots on language, most chatbots are trained using supervised machine learning or intent tagging (matching statements to an AI model’s intent files).
  2. Conversation architecture – while this is a reasonably new concept, communicators will play a critical role in developing a chatbot’s conversational architecture. While this is a technical consideration, it is also a usability consideration driven by conversational bots, which also seeks to maximise efficacy at the least possible cost.
  3. Anticipate and enhance information retrieval from natural language user inputs.
  4. Evaluate the chatbot’s efficacy.

I advise that any business seeking to adopt cost-cutting AI technologies, such as chatbots and intelligent conversational agents, should carry out a full evaluation before implementation. In my experience the most successful chatbots are the ones that bring the often-missing skills of natural language processing and linguistics skills to the table – and don’t stop at machine learning and technical development.


Kaveer Beharee, AI entrepreneurKaveer Beharee is a management consultant and the CEO and founder of Ubiquity AI – a tech start-up specialising in robotic process automation (RPA) based on AI chatbots that trigger processes through natural language conversations with customers and other stakeholders.

You can connect with him via Instagram (@kaveerbeharee), Twitter (@ubiquityAI) or LinkedIn.

Firehead is the strategic partner for Ubiquity AI in Europe. Get in touch with Ubiquity for more information.

Image: (CC)  Tumisu/Pixabay


CJ Walker

Related Posts

Call to action

Delving into the Pros and Cons of Frontier AI, part 2

Part 2 in a series of 5 In this five-part series, Firehead takes a look Frontier AI and the impact it is having on technical communicators’ roles and the future labour market for digital communication Is Automation Anxiety Rational? I…...

CJ Walker

Delving into the Pros and Cons of Frontier AI, part 1

Part 1 in a series of 5 In this five-part series, Firehead takes a look Frontier AI and the impact it is having on technical communicators’ roles and the future labour market for digital communication What does Frontier AI mean…...

CJ Walker

Unleash Your Writing Potential!

Firehead is thrilled to introduce the second installment of our three-part series on modern technical communication – Writing and Design for Modern Technical Communication. Whether you're a seasoned writer or just starting your journey, this course is designed to elevate…...

CJ Walker
Hands hovering over laptop keyboard.