Anomaly detection methods

Depending on the presence or lack of labels, there are two approaches to face yourself to an anomaly detection problem.

As an unbalanced classification problem

When we are given a set of observations with labels that indicate whether each point is an anomaly or not, this can be seen as a binary classification problem. So we can use any classifier we like. The only issue here is that anomalies are by definition rare events, so you’ll have to deal with class imbalance.

How to deal with imbalanced datasets?chevron-right

As an unsupervised problem

In this scenario, we are given a set of points without class labels. Some of them are anomalies and some aren’t, but you don’t know which is which. The goal here is to operationalize the intuitive idea that anomalies are different from the typical data point.

Outlierschevron-right
circle-exclamation

Basic methods

Z-Score

Z-scorechevron-right

IQR

IQRchevron-right

Multivariate anomaly detection

Clustering

DBScan clustering

DBSCANchevron-right

Other clustering techniques

These clustering techniques may be used to detect instances that are far away from clusters.

kMeanschevron-rightGaussian Mixture Modelchevron-right

Tree-based approach

Isolation Forest

It is an unsupervised learning algorithm that belongs to the ensemble decision trees family. It explicitly isolates anomalies instead of profiling and constructing normal points and regions by assigning a score to each data point.

circle-check

Robust Random Cut Forest (RCF)

Last updated