Naive Bayes

Overview

It assumes conditional independence between every pair of features given the value of the class. This is why this method is called "naive".

The Bayes theorem states:

P(yx1,...,xn)=P(y)P(x1,...,xny)P(x1,...,xn)P(y|x_1, ..., x_n) = \frac{P(y)P(x_1,...,x_n|y)}{P(x_1,...,x_n)}

When assuming conditional independence:

P(yx1,...,xn)=P(y)P(x1,...,xny)P(x1,...,xn)P(y|x_1, ..., x_n) = \frac{P(y)P(x_1,...,x_n|y)}{P(x_1,...,x_n)}

So, we can use the following classification rule:

y^=argmaxyP(y)i=1nP(xiy)\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)

Advantages

  • NB classifiers have worked quite well in many real-world situations, famously document classification, sentiment analysis and spam filtering.

  • They require a small amount of training data to estimate the necessary parameters.

  • They can be extremely fast compared to more sophisticated methods.

Disadvantages

  • Although NB is considered as a decent classifier, it is known to be a bad estimator. So their output probabilities are not to be taken too seriously.

Types of NB classifiers

Last updated