Multinomial Naive Bayes Classifier | by Yoann Mocquin | Mar, 2024

A complete worked example for text- classification

Yoann Mocquin
Towards Data Science

In this new post, we are going to try to understand how works and provide examples with and .

What we’ll see:

  • What is the multinomial : As opposed to Gaussian Naive Bayes classifiers that rely on assumed Gaussian distribution, multinomial naive Bayes classifiers rely on multinomial distribution.
  • The general to create classifiers that rely on Bayes theorem, together with the naive assumption that the input features are of each other given a target class.
  • How a multinomial classifier is “fitted” by /estimating the multinomial probabilities for each class — using the smoothing trick to handle empty features.
  • How the probabilities of a new sample are computed, using the log-space trick to avoid underflow.

All by author.

If you are already familiar with the multinomial distribution, you can move on to the next part.

Representation of 2 multinomial distributions (with 10 parameters). Those represent the probability that a given word appears in a text review.

The first important step to understand the Multinomial Naive Bayes classifier is to understand what a multinomial distribution is.

In simple words, it represents the probabilities of an experiment that can have a finite number of outcomes and that is repeated N times, for example, like rolling a dice with 6 faces say 10 times and counting the number of times each face appears. Another example is counting the number of occurence each word in a vocabulary appear in a text.

You can also see the multinomial distribution as an extension of the binomial distribution: except for tossing a coin with 2 possible outcomes (binomial), you roll a dice with 6 outcomes (multinomial). As for the binomial distribution, the sum of all the probabilities of the possible outcomes must sum to 1. So we could have:

Source link