A controlled experiment comparing two versions of a design to determine which performs better against a defined metric. Users are randomly split between version A and version B, and results are analyzed for statistical significance before drawing conclusions.
Common contexts
- Testing two checkout button labels to see which drives more completions
- Comparing a tabbed layout versus a single-scroll page for a pricing section
- Running headline variants on a landing page to improve sign-up rate
Use when
Use A/B testing when you have a single, clearly defined hypothesis and enough traffic to reach statistical significance within a reasonable timeframe — typically 1,000+ unique visitors per variant per week. It's most effective after qualitative research has already identified the problem and given you a specific directional guess.
Avoid when
Avoid A/B testing when traffic is too low — you'll run the experiment for months and still end up with inconclusive results that feel like data but carry none of the certainty. It also wastes engineering and design time when the change is so small it wouldn't meaningfully affect the user experience either way.
If you can't articulate the reason you expect variant B to win before the test starts, you're not running an experiment — you're guessing with extra steps.
Real-world examples
- Google tested 41 shades of blue for its toolbar links in 2009, picking the top performer that added an estimated $200M in annual revenue.
- Amazon runs thousands of simultaneous A/B experiments, including CTA copy, product image layout, and review placement.
- Netflix continuously A/B tests thumbnail artwork for titles, finding that personalised images can increase click-through rates by over 20%.