Beyond the Bell Curve: An Introduction to the t-distribution | by Egor Howell | Sep, 2023

Discover the origins, theory and uses behind the famous t-

Egor Howell
Towards Data Science
Photo by lil artsy: https://www.pexels.com/photo/person-about-to-catch-four-dices-1111597/

The t-distribution, is a continuous distribution that is very similar the normal distribution, however has the following key differences:

  • Heavier tails: More of its probability mass is located at the extremes (higher kurtosis). This means that it is more likely to produce values far from its mean.
  • parameter: The t-distribution has only one parameter, the degrees of freedom, as it’ used when we are unaware of the population’s variance.

An interesting fact about the t-distribution is that it is sometimes referred to as the “Student’s t-distribution.” This is because the of the distribution, William Sealy Gosset, an English statistician, published it using his pseudonym “Student” to keep his identity anonymous, thus leading to the name “Student’s t-distribution.”

Let’s go over some theory behind the distribution to some mathematical intuition.

Origin

The origin behind the t-distribution comes from the idea of modelling normally distributed without knowing the population’s variance of that data.

For example, say we sample n data points from a normal distribution, the following will be the mean and variance of this sample respectively:

Where:

  • ̄ is the sample mean.
  • s is the sample standard deviation.

Combining the above two equations, we can construct the following random variable:

Here μ is the population mean and t is the t-statistic belongs to the t-distribution!

See here for a more thorough derivation.

Probability Density

As declared above, the t-distribution is parameterised by only one value, the degrees of freedom, ν, and its probability density function looks like this:

Source link