How many observations do you need? Power analysis finds the Goldilocks number—large enough to detect real effects, small enough not to waste resources.
In any study—clinical trial, A/B test, or marketing experiment—you estimate population parameters using a sample. The core question: how many observations do you need?
Too small: You lack statistical power to detect real effects. Time, money, and ethical capital are wasted on inconclusive results.
Too large: You waste resources and may detect statistically significant but practically meaningless differences.
Power analysis balances four interconnected parameters. Adjusting one changes the required sample size:
Type I error risk: probability of a false positive.
Standard: 0.05 (5% risk). Lower α → larger sample needed.
Probability of detecting a real effect when one exists.
Standard: 0.80 or 0.90. Higher power → larger sample needed.
MCID: smallest difference that matters practically.
Smaller difference to detect → much larger sample needed.
Noise level: natural data fluctuation.
More noise → larger sample needed to separate signal.
For a two-sample t-test with equal groups, the sample size per group is:
Since Cohen’s d = δ / σ, a smaller raw difference (δ) or a larger standard deviation (σ) both shrink d, which increases the required sample size. This is why noisy data and small effects are so expensive to study.
Many introductory texts present the sample size formula using Z-scores from the normal distribution. That is an approximation. Here is why the t-distribution is the correct choice—and what changes.
The normal (Z) distribution assumes you know the true population standard deviation σ. In practice, you never do—you estimate it from the sample. That extra uncertainty means the test statistic follows a t-distribution, not a normal.
The t-distribution is wider than the normal, especially for small samples. This means t-critical values are larger than Z-critical values, which means you need more observations to achieve the same power.
The t-distribution’s shape depends on degrees of freedom (df). For a two-sample t-test with n observations per group:
As df grows (larger samples), the t-distribution converges to the normal. At df ≥ 120, the difference is negligible. At df = 10, it matters a lot.
At n=10 per group, using Z instead of t underestimates the required sample size by ~7%. At n=50, the error drops to ~1%. Our calculator uses the exact t-distribution with iterative convergence, so it is accurate at all sample sizes.
Need to look up a critical value by hand? Use our reference tables:
Set your parameters below. The sample size updates automatically using the exact t-distribution with iterative convergence.
This chart shows how the required sample size changes across a range of effect sizes, holding your current α and power constant. The red dot marks your current d.
Power is the probability of rejecting the null hypothesis when the alternative is true. The visualization below shows two t-distributions (not normal—notice the heavier tails):
Blue curve: the null t-distribution (H0: no effect).
Green curve: the alternative t-distribution (HA: true effect = δ).
Red dashed line: the t-critical value. Anything to the right is rejected.
Green shaded area: this is power—the probability of correctly rejecting H0.