A little-known technique for turning quantile regression predictions into a probability distribution.
When we train regressive models, we obtain point predictions. However, in practice we are often interested in estimating the uncertainty associated to each prediction. To achieve that, we assume that the value we are trying to predict is a random variable, and the goal is to estimate its distribution.
There are many methods available to estimate uncertainty from predictions, such as variance estimation, Bayesian methods, conformal predictions, etc. Quantile regression is one of these well-known methods.
Quantile regression consists in estimating one model for each quantile you are interested in. This can be achieved by the use of an asymmetric loss function, known as pinball loss. Quantile regression is simple, easy to understand, and readily available in high performing libraries such as LightGBM. However, quantile regression presents some issues:
- There is no guarantee that the order of the quantiles will be correct. For example, your prediction for the 50% quantile could be greater than the one you get for the 60% quantile, which is absurd.
- To obtain an estimate of the entire distribution, you need to train many models. For instance, if you need an estimate for each point percent quantile, you have to train 99 models.
Here’s how quantile matching can help.
The goal of quantile matching is to fit a distribution function given a sample of quantile estimates. We can frame this as a regression problem, so the curve doesn’t have to perfectly fit the quantiles. Instead, it should be “as close as possible”, while keeping the properties which make it a distribution function.
Specifically, we are interested in estimating the inverse cumulative distribution function: given a…
This post originally appeared on TechToday.