Beyond the Bell Curve: An Introduction to the t-distribution | by Egor Howell | Sep, 2023


Discover the origins, theory and uses behind the famous t-distribution

Egor Howell
Towards Data Science
Photo by lil artsy: https://www.pexels.com/photo/person-about-to-catch-four-dices-1111597/

The t-distribution, is a continuous probability distribution that is very similar to the normal distribution, however has the following key differences:

  • Heavier tails: More of its probability mass is located at the extremes (higher kurtosis). This means that it is more likely to produce values far from its mean.
  • One parameter: The t-distribution has only one parameter, the degrees of freedom, as it’s used when we are unaware of the population’s variance.

An interesting fact about the t-distribution is that it is sometimes referred to as the “Student’s t-distribution.” This is because the inventor of the distribution, William Sealy Gosset, an English statistician, published it using his pseudonym “Student” to keep his identity anonymous, thus leading to the name “Student’s t-distribution.”

Let’s go over some theory behind the distribution to build some mathematical intuition.

Origin

The origin behind the t-distribution comes from the idea of modelling normally distributed data without knowing the population’s variance of that data.

For example, say we sample n data points from a normal distribution, the following will be the mean and variance of this sample respectively:

Where:

  • is the sample mean.
  • s is the sample standard deviation.

Combining the above two equations, we can construct the following random variable:

Here μ is the population mean and t is the t-statistic belongs to the t-distribution!

See here for a more thorough derivation.

Probability Density Function

As declared above, the t-distribution is parameterised by only one value, the degrees of freedom, ν, and its probability density function looks like this:



Source link

This post originally appeared on TechToday.