What Is an A/B Test?
An A/B test (also called a split test) is a controlled experiment in which two versions of a web page, email, button, headline, or any digital element are shown to separate, randomly assigned groups of users simultaneously. By measuring which version drives more conversions, you get statistically grounded evidence to guide product and marketing decisions — instead of relying on HiPPO opinions (Highest Paid Person's Opinion).
A typical A/B test has a Control (A) — your current version — and a Variant (B) — the challenger version with a specific change. Traffic is split evenly, usually 50/50, and you measure the primary metric (often conversion rate) for both groups. When the difference is large enough relative to random chance, you declare a winner.
How to Calculate Statistical Significance
Statistical significance is calculated via a two-proportion Z-test. Given two conversion rates p₁ (control) and p₂ (variant), and sample sizes n₁ and n₂:
A p-value below 0.05 means there's less than a 5% probability the observed difference happened by chance — equivalent to 95% confidence. This calculator also reports Bayesian probability (P(variant beats control)) via Monte Carlo simulation, which many practitioners find more intuitive.
Bayesian vs Frequentist A/B Testing: Which Should You Use?
Both approaches have advantages. Here's a practical comparison:
What Is a Minimum Detectable Effect (MDE)?
The Minimum Detectable Effect (MDE) is the smallest relative improvement in your primary metric that your test is statistically powered to detect. It's the threshold below which any real effect would be invisible within your sample size.
MDE is directly tied to sample size: a smaller MDE requires a larger sample. Most product teams set MDE at 10–20% relative improvement. Setting it at 1–2% would require millions of visitors per variant, which is rarely feasible outside very high-traffic properties.
Practical tip: Start by asking “What lift would make this change worth shipping?” If a change only matters at 20%+ uplift, set your MDE to 20% and run a smaller, faster test.
How Long Should You Run an A/B Test?
Run your test until you collect the required sample size — calculated from your baseline CVR, MDE, and confidence target. Do not stop when you see significance (that's “peeking,” which inflates false positives by 2–5×).
Additional rules of thumb:
- Minimum 7 days, even if you hit sample size faster. Day-of-week effects are real — Monday visitors convert differently than weekend visitors.
- Maximum 4–6 weeks. Beyond this, seasonal drift, competitor activity, and novelty effects contaminate results.
- Avoid running during anomalies: product launches, PR spikes, major sale events — these bias your traffic composition.
- Use the Duration calculator in the Reverse Calculator tab to get exact days based on your daily visitor count and MDE.
A/B Test Sample Size Calculator: How Many Visitors Do You Need?
Sample size is calculated using the Fleiss formula, which accounts for your baseline conversion rate, desired MDE, significance level (α), and statistical power (1-β):
Example required sample sizes per variant (95% confidence, 80% power):
| Baseline CVR | 5% MDE | 10% MDE | 20% MDE |
|---|---|---|---|
| 1% | 158,500 | 40,300 | 10,400 |
| 2% | 82,400 | 21,000 | 5,400 |
| 3% | 56,300 | 14,300 | 3,700 |
| 5% | 34,800 | 8,800 | 2,300 |
| 10% | 17,400 | 4,400 | 1,150 |
Per variant. Double for total visitors in a 50/50 split A/B test.
Common A/B Testing Mistakes
A/B Testing Benchmarks by Industry
Typical conversion rates and expected test outcomes by industry:
| Industry | Avg CVR | Typical Uplift | Min Sample |
|---|---|---|---|
| E-commerce | 2–4% | 5–20% | ~5K/variant |
| SaaS (trial signup) | 3–7% | 10–30% | ~4K/variant |
| B2B Lead Gen | 1–3% | 15–40% | ~8K/variant |
| Media / Content | 3–6% (click) | 5–15% | ~6K/variant |
| Mobile App | 10–25% (engagement) | 5–15% | ~2K/variant |
| Fintech / Lending | 1–3% | 10–25% | ~8K/variant |
Benchmarks based on industry analyses. Your actual numbers depend on traffic quality, offer, audience, and change type.
Frequently Asked Questions
What is A/B testing?
A/B testing is a controlled experiment where two versions of a digital element are shown to random user groups to determine which drives more conversions. Statistical analysis determines if the difference is real or due to chance.
What is the difference between Bayesian and frequentist A/B testing?
Frequentist uses p-values: "Given no effect, how likely is this data?" Bayesian directly estimates: "What is the probability variant B beats control?" Bayesian is more intuitive for business decisions. This calculator supports both.
What is statistical significance in A/B testing?
Statistical significance means the observed difference is unlikely to be random. At 95% confidence (p < 0.05), there's only a 5% chance the result is a false positive. Most A/B testing best practices recommend 95% as the minimum threshold.
How many visitors do I need for an A/B test?
Use the Reverse Calculator tab: enter your baseline CVR and desired MDE. For example, detecting a 10% lift from a 3% baseline CVR requires ~14,300 visitors per variant at 95% confidence, 80% power.
What is a minimum detectable effect (MDE)?
The MDE is the smallest relative improvement your test is statistically powered to detect. A 10% MDE at a 3% CVR means you're testing for a change from 3.0% → 3.3% CVR. Smaller MDE = larger required sample size.
What is peeking in A/B testing?
Peeking means stopping a test early when you see significance before collecting the required sample size. This inflates false positives dramatically — checking 5 times during a test can raise your false positive rate from 5% to 22%.
Can I test more than 2 variants?
Yes. This calculator supports A/B, A/B/C (3-way), and A/B/C/D (4-way) tests. Each variant is compared against the Control (A). Note that each variant needs its own independent sample size, so 4-way tests require 4× the traffic.
How is revenue impact calculated?
Revenue impact = (variant CVR - control CVR) / control CVR × monthly revenue × 12. If you provide an Average Order Value, it uses visitor-based calculation: monthly visitors × CVR difference × AOV × 12.
Last updated: March 2026. Statistical formulas based on standard two-proportion Z-test and Beta distribution Bayesian analysis. Sample size via Fleiss formula.