Imagine this: You’re excited to roll out a funky new enhancement to your conversational AI system. But shortly after deployment, customers start complaining about issues they never encountered before. As you dig deeper, you realize that your new code has unintentionally broken something that was previously stable. How could this have been prevented? The answer lies in robust regression testing.
What is Regression Testing?
Regression testing involves re-running previously developed tests to ensure that existing functionalities still work as expected after new code changes or additions This practice helps verify that recent updates haven’t negatively impacted the existing software. Just to be clear, this applies to bots whether they’re LLM-powered, using more traditional NLU, or taking a hybrid approach. The exact tools and techniques may vary, but the principles (and the reasons you need to test) remain the same. Regression testing can be performed manually or through automation, and as you add new tests to accompany each code update, you gradually build up a comprehensive test suite over time.
Why is Regression Testing Essential?
1. Ensuring Product Quality
Regression testing is crucial for maintaining high product quality. Incremental changes, especially in complex systems like those involving Natural Language Processing (NLP), can sometimes lead to unexpected outcomes. For example, modifications in training data might cause an NLP model to suddenly misinterpret user intents it previously handled well. By running thorough regression tests, you can identify and address these drops in performance early.
2. Cost-Effective Maintenance
Manual testing can be incredibly time-consuming and prone to human error, making it impractical for regular and thorough checks. Automated regression testing reduces maintenance costs by enabling frequent, reliable testing without the need for extensive human intervention.
What Should Regression Testing Cover?
Regression testing should cover various aspects of your conversational AI system:
- Intent Recognition: Ensuring the bot accurately identifies user intents.
- Behavioural Paths: Verifying all possible conversational paths, including:
- UI features (e.g., display of images, accessibility of links, rendering of buttons)
- API integrations
- Handover processes to human agents
- Expected Response: Testing static text responses is very simple, but bot responses might be formed from static text mixed with personalised variables which also need checking. In cases where Gen AI is being used to generate responses you can use a similarly trained LLM to check the bot output is aligned with your desired logic.
How to Build Effective Regression Tests
Tests can be built manually or using automated tools. For example, tools like Cyara can automatically explore all possible bot paths based on button inputs. However, for more conversational dialogues, for example involving open-ended questions, manual specification is often required. For complex integrations, such as those involving websites or user account details, custom-coded tests might be necessary.
At their simplest, test scripts define the expected response you should get from a bot when given a particular query from a user. Automated testing enables you to cover a myriad of variations for ways people express their query as well as checking that bot answers which vary depending on context (such as the date, customer location, loyalty level or similar). In this example we can see a test script which is set up to see if an identified user (the test includes parameters for their “firstName” and “id”) gets the right answer when they ask if their delivery is delayed. By specifying all the parameters used in business logic to create the answer AND simulating the “spoofTodayDate” date alongside the “deliveryDetails” parameter we can customise the test for complicated real-world scenarios, such as when delivery has been shifted from its normal schedule and is now due in three days.
When we actually run the test, we get results like this, showing the response the bot provided for each of the inputs. If the bot response had not matched the expected text shown in the test script, then the test would have failed.
Best Practices for Regression Testing
- Incorporate Testing in Development: Every new feature should be accompanied by automated tests covering as much or the functionality as possible. Simple additions like new FAQs may need minimal testing, while complex flows with branching behaviour (for example for different user groups or incorporating parameters such as cutoff logic or other conditional decision points) will require multiple tests.
- : Before deploying new changes, conduct comprehensive tests in development and test environments. Peer testing and user acceptance testing (UAT) from stakeholders can provide additional assurance. However, while your (human) testers might be able to reasonably thoroughly test your new code, they won’t be able to verify that there haven’t been any other undesired effects to all your other intents or flows. That’s why you need to always ensure that your regression tests pass before moving to production.
- Monitor Continuously: You can also use automated tests for live monitoring. For example, you can set up alerts for if a feature stops working, or spot that an API is no longer returning expected responses (e.g. article lookup failures or certificate expiry).
Benefits of Regression Testing
Automated regression testing can simulate a wide range of variables, providing confidence in your system’s performance under diverse conditions. It ensures that even scenarios difficult for humans to test, such as date-specific behaviours or complex user-account interactions, are thoroughly checked.
Risks of Neglecting Regression Testing
Failing to perform regular regression testing can lead to poor user experiences which risk alienating your users, may cause damage to your company’s reputation, and potentially incur higher costs due to more frequent or severe issues which need escalation. We’ve helped out a banking customer who’d not previously had regression testing or any regularly performance measurement, and by implementing these was able to discover (and fix!) scenarios where the wrong behaviour had been being surfaced to customer for more than a year!
Comprehensive Testing Strategy
Effective regression testing is even stronger when part of a broader testing strategy, including:
- Security Testing: Ensuring the system’s defences against threats.
- Robotic Process Automation (RPA) Testing: Verifying automated workflows.
- A/B Testing: Comparing different versions to optimize performance.
- Load Testing: Assessing system performance under heavy loads.
- Full UX/User Journey Testing: Ensuring seamless interaction with the bot and other channels like websites or phone systems.
Incorporating robust regression testing practices into your conversational AI development process is essential for maintaining a high-quality, reliable product. By proactively identifying and addressing potential issues, you can enhance user satisfaction, protect your brand’s reputation, and reduce maintenance costs. Remember, a new feature isn’t complete until it has been thoroughly tested and proven to work seamlessly with your existing codebase!