Chatbot Rank is an annual research of customer experiences with chatbots, conducted by Markswebb since 2021. This coincides with the rise of virtual assistants as a primary channel for client communication. Over four waves of research have already been conducted, with the latest Chatbot Rank release on November 26, focusing on chatbots in mobile banking. This article delves into the extensive analytics underpinning the ranking of banking chatbots.
This is one of Markswebb's most exciting studies. The pandemic and the rapid advancement of AI technologies present complex and unconventional challenges for chatbot development teams, and for us, they pose equally intricate research tasks.
A key outcome of the chatbot research, as with many of our projects, is the ranking. On one hand, rankings are beneficial: they provide a clear, quantitative measure of how effectively and efficiently each chatbot meets client needs, highlighting areas for improvement in the customer experience. However, there is another side. A public ranking inevitably reveals that some perform better than others. This leads to questions about potential biases, calculation methods, the source of the figures, and the overall methodology for comparing chatbots.
What’s inside:
Contents
Each Markswebb research begins with a foundational list of participants selected based on our comprehensive market insights. The goal is to include the most popular chatbots that shape the general perception of customer experience quality in these services. For the Chatbot Rank 2024, this list features major banks ranked by the volume of personal deposits and loans, along with the top three performers from the 2023 study.
Any bank or fintech company operating in a country in Eastern Europe and using chatbots to automate customer inquiries can join the study independently to assess their chatbot's competitive position.
Each study has a "freeze date" - a fixed point after which updates to digital services are not considered. This ensures equal conditions for all participants. For the Chatbot Rank 2024, the freeze date was September 16 at 10:00.
The study helps product teams address questions such as:
The Chatbot Rank primarily evaluates customer inquiries related to card products, current accounts, and deposits. It also encompasses tasks beyond product usage, such as managing personal data, connected services, complaints, and negative feedback - essentially, the most frequent types of interactions between users and banks via mobile apps.
The study’s objective is to assess the quality of the customer experience when using chatbots in mobile banking, with a particular focus on user perceptions of conversational interfaces.
From a methodological standpoint, our chatbot study aims to solve two key tasks.
First, we establish principles for an ideal conversational user interface (CUI) and evaluate the participants against them. This set of principles is designed to comprehensively encompass all potential interactions between the customer and the chatbot.
Second, we define the scope of inquiries a chatbot should be able to handle in the current context (i.e., during the research period).
Markswebb's approach to chatbot evaluation is versatile and applicable beyond the banking sector. The principles and rules for effective communication and interface usability are universal. The block of intents evaluated can be adapted based on the market being analyzed.
We understand how valuable your time is, and the work of Markswebb analysts is focused on saving it for product teams. Each study represents thousands of hours invested by our researchers, distilled into clear, actionable analytics.
The evaluation system was developed using the following process:
Measuring the impact of CUI principles on user attitudes toward chatbots. Each principle is visualized through two dialogue cards simulating interactions: one where the principle is adhered to (control group) and one where it is not (test group).
Unmoderated testing on a selected sample includes:
An annual review and enhancement process ensures the system remains relevant. The update results in the creation of a research checklist that includes:
Now, let’s delve into the core of the system - the foundational CUI principles.
The evaluation system consists of three blocks, collectively providing a score between 0 and 100. Each block contributes a specific weight to the overall rating.
The evaluation process provides insights into a chatbot through three key aspects, enabling a detailed comparison of chatbots from different banks, not as a whole but by focusing on specific features. Each block of the system will be explored further.
To understand the methodology, here’s a glossary of terms:
Weight: 50% of the overall score
This block evaluates how effectively a chatbot can resolve user tasks. It assesses the feasibility of intents within the chatbot and evaluates the approach to solving tasks. The most user-friendly solutions—such as offering personalized product suggestions directly in the chat instead of redirecting users to a catalog—are rated higher. Task completion success is measured using predefined criteria.
For each intent group, the evaluation considers the most optimal responses for the specific task. For instance, in navigation tasks, the chatbot should interpret the user’s intent and guide them to the appropriate service section instead of merely providing information in the chat. Conversely, for complaints, transferring the user to a human operator has less impact on the chatbot’s score, as negative feedback is traditionally handled by live support agents.
Queries with multiple intents should not hinder task completion. Users expect to communicate freely without focusing on precision or brevity, and their queries may include several sub-questions simultaneously. In such cases, the chatbot’s key responsibility is to recognize all aspects of the query and provide responses to each.
Failure to handle complex queries may force users to rephrase their questions or, in the worst-case scenario, turn to a live operator, deeming the chatbot ineffective. Therefore, it is crucial for chatbots to possess sufficient functionality to interpret and process complex requests, ensuring users can access the necessary information without unnecessary effort.
Take a look at how evaluation results have evolved over the years:
Weight: 45% of the overall score
This block evaluates the quality and efficiency of the chatbot’s interaction with users. It focuses on the chatbot’s ability to understand, respond appropriately, and clearly convey information.
The evaluation is based on a set of principles and rules that are universal and independent of the market where the evaluation system is applied. Each rule is verified using specific criteria, with at least one verification criterion assigned to each rule.
Example task from the "Effectively respond to negativity" group:
The chatbot considers the user’s emotional state and demonstrates care when needed. This principle applies to critical situations, such as when a user is upset and ready to file a complaint or faces an urgent issue requiring immediate action, like card blocking. The chatbot must respond appropriately to negative statements, ensuring such feedback is acknowledged rather than ignored.
Weight: 5% of the overall score
This block evaluates the chatbot's development in terms of interface usability. Interface convenience refers to the accessibility of features that enable comfortable viewing, input, and export of information within the chatbot—extending beyond its communicative capabilities. It includes UI and UX features as well as the chatbot’s integration with the company's systems.
Example task from the "Information export convenience" group:
Users can save or share chat transcripts or files sent by the chatbot (e.g., clickable phone numbers within the chat or the ability to copy a message from the bot or a consultant).
Interface convenience is assessed using specific criteria, focusing on the overall user experience with the chatbot rather than on individual intents. These criteria are universal and apply across all markets where the evaluation system is implemented.
Effectiveness is measured by whether the chatbot offers the most efficient solution. The chatbot must recognize if it can address the user's query and, if not, escalate the dialogue to a human operator. Rules under this principle include avoiding repetitive responses to the same question within a single conversation.
This principle covers language and feedback rules, ensuring the chatbot’s responses and behavior mimic those of a human consultant. One rule involves proactively updating the user on the status or resolution of their request.
Users dislike rephrasing queries to fit a chatbot's rigid “mechanical” language. Adaptive chatbots enhance user comfort by understanding queries with typos and allowing users to revisit any step in task completion.
Answers should use plain, jargon-free language. This principle includes rules such as keeping greeting messages, operator transfer notifications, and clarification questions brief.
Helpful answers should exclude unnecessary details, ensuring users don’t need to seek additional sources. Rules under this principle include addressing each part of a multi-question query individually.
Some services conceal that users are speaking to a bot to save resources, which can frustrate users when discovered later. This principle includes rules such as clearly indicating when an operator joins the conversation.
This principle emphasizes not requesting information the bank already has or was previously provided by the user and retaining conversational context after a pause.
Ensuring a seamless user experience across communication channels is key. Rules include automatically transferring to an operator if a query cannot be understood, notifying the user about wait times, and offering assistance if an operator is unavailable.
Acknowledging emotional cues is crucial for building rapport. The chatbot should recognize negative emotions and respond empathetically, particularly in critical situations. The principle includes a rule that chatbots must not ignore negative feedback but instead show understanding and readiness to act.
We’ve covered Markswebb's 9 CUI principles in a dedicated article — read more for detailed insights.
Discover practical examples of Markswebb’s 9 CUI principles and see how they enhance chatbot interactions. Learn how to create more natural, effective communication and build user trust today!
This section includes:
User queries are grouped into thematic blocks. For example, one block focuses on questions requiring personal information specific to the user, which the bank already possesses. An example of such a query is: "What is the interest-free period on my credit card?"
Chatbot performance is assessed by identifying the best solution and filtering out less effective ones. Each block has its own set of response types. For instance, in the block "Providing non-personalized information," possible chatbot responses include:
This section exclusively evaluates the application of CUI principles. It is filled out after completing the scenario-based evaluation, allowing researchers to leverage their experience with the chatbot. Most CUI rules can be assessed based on existing chat transcripts, with only a few requiring additional queries.
This section is designed to avoid penalizing chatbots for lacking certain features entirely. It includes only a few criteria for verification.
Here’s an example of a checklist fragment for chatbot evaluation using the Markswebb assessment system:
During the checklist evaluation, we document how the chatbot responds to user queries and whether it adheres to the CUI principles throughout the interaction.
For each scenario, we test the chatbot’s response. If the chatbot fails to understand the query on the first attempt, we make two additional attempts using three pre-prepared phrasings (the same for all banks and in the same order). This year, we also track which attempt yields a relevant response. If the chatbot fails to respond after three attempts, the "Task resolution by the chatbot" block is marked as "No."
In cases where the chatbot offers multiple solutions (e.g., providing both a hyperlink and a navigation path), we select the most effective one. For example, if the chatbot provides a hyperlink to the relevant section and describes the navigation path, the "Yes" value is assigned to the criterion: "The chatbot provided a hyperlink to the section," as this is the superior option.
Once the checklist is completed, we calculate the score for each chatbot.
Additionally, we save screenshots of all interactions documented during the checklist evaluation. This creates a robust repository of customer journeys and best practices, highlighting the most effective implementations. A selection of these solutions is presented in the summary report, while the full collection of implementation references for individual chatbot elements is available in the comprehensive report.
We’ve reviewed the Markswebb evaluation system—its structure and functionality. It’s essential to note that its effectiveness lies in its adaptability to market dynamics. As we’ve mentioned, chatbots are one of the fastest-evolving sectors in fintech, and with each research wave, we adjust the evaluation system to reflect new trends. The 2024 wave introduces three major changes.
The evaluation process remains the same: the chatbot is tested with Query 1. If it fails to understand or redirects the user to an operator, Query 2 and Query 3 are used until a relevant response is obtained.
We now factor in how many attempts it took for the chatbot to provide a relevant response:
Example:
Testing the intent: "Find where in the app you can transfer money from card to card" using these queries:
The chatbot failed to understand the first query but responded to the second, describing the navigation path. This implementation achieves 60% of the maximum score for the intent. However, since the chatbot only responded to the second query (requiring two attempts), the score is reduced: 60% × 0.8 = 48%.
To improve the objectivity of query formulation, we used ChatGPT to generate randomized variations of queries for the Chatbot Rank 2024 study. This method reduces subjectivity and introduces diversity akin to real-life user interactions with chatbots.
How it works:
This approach mimics realistic customer interactions and ensures more varied and robust testing.
Chatbot Rank offers a clear understanding of how internet banking chatbots influence the digital customer experience, highlighting the current state and future trends. The study provides precise analytical insights and actionable recommendations based on rigorous data, enabling banks to make informed decisions and accelerate progress toward their desired customer experience goals.
In summary, we empower product teams to drive improvements and create solutions that better align with business objectives and the expectations of modern customers.
Markswebb’s research is transparent at every stage - we’re here to answer all your questions and welcome your feedback. Join our social communities and subscribe to our channels to stay connected.
We respond to all messages as soon as possible.
We’ve evolved dozens of successful financial services and are eager to prove that our expertise can be implemented in other industries and around the world. Have a look at our success stories!