Skip to main content
Use A/B testing to compare two versions of your agent side by side with real traffic. Instead of promoting a change to all callers at once, you split traffic between the current live version (control) and a variant, then compare results to decide which performs better.
A/B testing is available for projects using the standard deployment pipeline. You must have at least one live deployment to start a test.

How it works

When you start an A/B test, live traffic is split between two agent deployments:
  • Control (A) — your current live version
  • Variant (B) — a previous or alternative version you want to compare
Callers are assigned to a group at the start of their conversation and stay in that group for the entire call. Assignment is based on a hash of the caller, so the same caller gets the same experience if they call back during the test.

Start an A/B test

1

Go to Deployments

Navigate to Deployments > Environments in the sidebar.
2

Select a variant

Choose the version you want to test against the current live deployment. This becomes the variant (B group).
3

Configure the traffic split

Set the percentage of traffic routed to each version. Both versions run simultaneously on the live environment.
4

Start the test

Confirm and launch. Traffic begins splitting immediately.
You cannot promote a new version to live while an A/B test is active. Stop the test first, then promote.

Review results

During and after a test, you can compare performance between the two groups:
  1. Go to Analytics > Conversations to review calls from the test.
  2. Filter conversations by A/B test group to see which version each caller experienced.
  3. Compare key metrics across groups — containment rate, handoff rate, CSAT, and conversation length.
Each conversation is tagged with the test group it belongs to, so you can identify which version handled each call.

Stop a test

  1. Go to Deployments > Environments.
  2. Select Stop A/B test from the active test.
  3. All traffic returns to the current live version.
After stopping, review the results to decide whether to promote the variant to live or discard it.

Best practices

  • Run tests long enough to get meaningful data — short tests with few calls can produce misleading results. Aim for a statistically significant sample before making decisions.
  • Test one change at a time — if you change multiple things between versions, you won’t know which change drove the difference.
  • Monitor during the test — check conversation review regularly to catch issues early. If the variant is performing significantly worse, stop the test.
  • Use test sets before A/B testing — validate your changes with the test suite before exposing them to live traffic. A/B testing is for measuring impact, not catching bugs.

Environments

Understand the deployment pipeline and how versions move through environments.

Compare versions

Review differences between versions before starting a test.

Conversations

Review individual conversations and filter by A/B test group.
Last modified on April 10, 2026