Bandit vs A/B Test Simulator

Watch a multi-armed bandit beat a 50/50 split — live.

An A/B test splits traffic evenly the entire run, even after a clear loser emerges. A multi-armed bandit shifts traffic toward winners as evidence accumulates — exploring less the more confident it gets. Thompson sampling does this by drawing from each variant's Beta posterior, naturally exploring uncertain options and exploiting confident ones. Set the “true” click-through rates below and watch both strategies play out on the same traffic.

Variants

The bandit and A/B test don't know these — only you do.
A
% CTR
B
% CTR
C
% CTR
D
% CTR

A/B test

10,000 impressions

Equal 25/25/25/25 split, all the way through

Variant
Impr.
Clicks
CTR
A
Headline A
25.0% of traffic
2,500
47
1.88%
B
Headline B
25.0% of traffic
2,500
73
2.92%
C
Headline C
25.0% of traffic
2,500
90
3.60%
D
Headline D
25.0% of traffic
2,500
126
5.04%
Total clicks
336
Overall CTR
3.36%
$ wasted
$164

Thompson sampling bandit

10,000 impressions

Shifts traffic toward variants the data favors

Variant
Impr.
Clicks
CTR
A
Headline A
3.4% of traffic
339
8
2.36%
B
Headline B
2.8% of traffic
276
5
1.81%
C
Headline C
29.9% of traffic
2,990
135
4.52%
D
Headline D
63.9% of traffic
6,395
323
5.05%
Total clicks
471
Overall CTR
4.71%
$ wasted
$29

A/B test allocation over time

% of impressions sent to each variant, over the run

0%25%50%75%100%010,000 impressions
A · Headline A
B · Headline B
C · Headline C
D · Headline D

Bandit allocation over time

% of impressions sent to each variant, over the run

0%25%50%75%100%010,000 impressions
A · Headline A
B · Headline B
C · Headline C
D · Headline D

Cumulative budget wasted ($)

A/B test
Bandit

Money “left on the table” vs always serving the optimal variant. Lower is better.

$0$41$83$124$165010,000 impressions
Verdict

The bandit saved you $135 (82%) compared to the A/B test.

It did that by routing 64% of impressions to D · Headline D — the variant with the highest true CTR (5.0%). The A/B test only sent it 25%, by design. That's 5.7× less wasted spend.

This is exactly what TrafficLoopback automates on your owned traffic — no manual significance tests, no fixed splits, just continuous reallocation toward whatever is actually working.

How this works

  • Each impression is a Bernoulli trial: a click happens with probability equal to that variant's true CTR. Both strategies see the same kind of noisy world.
  • The A/B test gives every variant the same share of traffic for the whole run. Total clicks = sum of Bernoulli draws.
  • The bandit starts each variant at a Beta(1, 1) prior. Before every impression, it draws a sample from each variant's posterior and serves the one with the highest sample. Then it updates: alpha += click, beta += no-click.
  • Regret = (best true CTR × impressions) − clicks actually earned. Multiply by revenue per click and you get the dollars left on the table by serving losing variants.

Skip the math. Let a bandit run your tests for you.

TrafficLoopback runs Thompson-sampling bandits on your owned traffic, so you find winning ad creatives without manual significance tests. Free during beta.

Start free in beta