Naive Bayes

Sources:

Overview

It assumes conditional independence between every pair of features given the value of the class. This is why this method is called "naive".

The Bayes theorem states:

P(y|x_1, ..., x_n) = \frac{P(y)P(x_1,...,x_n|y)}{P(x_1,...,x_n)}

When assuming conditional independence:

P(y|x_1, ..., x_n) = \frac{P(y)P(x_1,...,x_n|y)}{P(x_1,...,x_n)}

So, we can use the following classification rule:

\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)

NB classifiers have worked quite well in many real-world situations, famously document classification, sentiment analysis and spam filtering.
They require a small amount of training data to estimate the necessary parameters.
They can be extremely fast compared to more sophisticated methods.

Although NB is considered as a decent classifier, it is known to be a bad estimator. So their output probabilities are not to be taken too seriously.

Last updated 5 years ago

Was this helpful?