A beginner’s guide to understanding A/B test performance through Monte Carlo simulations | by Ida Johnsson, PhD | Aug, 2023

Ida Johnsson, PhD
Towards Data Science

This tutorial explores how covariates influence A/B testing precision in a randomized experiment. A properly randomized A/B test calculates the lift by comparing the average outcome in the treatment and control groups. However, the influence of features other than the treatment on the outcome determines the statistical properties of the A/B test. For instance, omitting influential features in the test lift calculation can lead to a highly imprecise estimate of the lift, even if it converges to the true value as the sample size increases.

You will learn what RMSE, bias, and size of a test are and understand the performance of an A/B test through generating simulated data and running Monte Carlo experiments. This kind of work is helpful to understand how the properties of the Data Generating Process (DGP) influence A/B test performance and will help you take this understanding to run A/B tests on real-world data. First, we discuss some basic statistical properties of an estimator.

Root Mean Square Error (RMSE)

RMSE (Root Mean Square Error): RMSE is a frequently used measure of the differences between values predicted by a model or an estimator and observed values. It’s the square root of the average squared differences between prediction and actual observation. The formula for RMSE is:

RMSE = sqrt[(1/n) * Σ(actual – prediction)²]

RMSE gives a relatively high weight to large errors because they are squared before they are averaged, which means the RMSE should be more useful when large errors are undesirable.


In statistics, the bias of an estimator is the difference between this estimator’s expected value and the true value of the estimated parameter. An estimator or decision rule with zero bias is called unbiased; otherwise, the estimator is said to be biased. In other words, a bias occurs when an algorithm consistently learns the same incorrect thing by failing to see the accurate underlying relationship.

For instance, if you are trying to predict house prices based on features of the house, and your predictions are consistently $100,000 below the actual price, your model is biased.


Source link

This post originally appeared on TechToday.