How to derive sample size formula for A/B testing from scratch?
Simply math. Math is sometimes self-explaining : )
We assume throughout that the sample sizes are large enough that it is safe to assume the mean statistics have a normal distribution by the Central Limit Theorem.
Derive the sample size from the power calculation
Consider X_i and Y_i as the observed values of the metrics of interest in the test and control groups respectively. Let X and Y represent the means of X_i and Y_i. Note that while we assume the normality of X_i and Y_i for simplicity in the following derivation, it is not strictly necessary. What matters is the normality of X and Y, which can be ensured through the Central Limit Theorem.
Appendix
When we are interested in count metrics like daily active users. It’s straightforward to define the Z statistics as
Now, let’s consider a special case when the metric is a ratio.