Multinomial Naive Bayes Classifier | by Yoann Mocquin | Mar, 2024

A complete worked example for

Yoann Mocquin
Towards Data Science

this new , we are going try to understand how multinomial naive Bayes classifier works and provide working examples with and scikit-learn.

What we’ll see:

  • What is the multinomial distribution: As opposed to Gaussian Naive Bayes classifiers that rely on assumed Gaussian distribution, multinomial naive Bayes classifiers rely on multinomial distribution.
  • The general approach to create classifiers that rely on Bayes theorem, together with the naive assumption that the input features are independent of each other given a target class.
  • How a multinomial classifier is “fitted” by learning/estimating the multinomial probabilities for each class — using the smoothing trick to handle empty features.
  • How the probabilities of a new sample are computed, using the log-space trick to avoid underflow.

All by author.

If you are already familiar with the multinomial distribution, you can move on to the next part.

Representation of 2 multinomial distributions (with 10 parameters). Those represent the probability that a given appears in a text review.

The first important step to understand the Multinomial Naive Bayes classifier is to understand what a multinomial distribution is.

In simple words, represents the probabilities of an experiment that can have a finite number of outcomes and that is repeated N times, for example, like rolling a dice with 6 faces say 10 times and counting the number of times each face appears. Another example is counting the number of occurence each word in a vocabulary appear in a text.

You can also see the multinomial distribution as an extension of the binomial distribution: except for tossing a coin with 2 possible outcomes (binomial), you roll a dice with 6 outcomes (multinomial). As for the binomial distribution, the sum of all the probabilities of the possible outcomes must sum to 1. So we could have:

Source link