# Anomaly detection methods

{% hint style="info" %}
*Sources & credit:*

* [*What machine learning technique is usually used to solve anomaly detection? (Quora)*](https://www.quora.com/What-machine-learning-technique-is-usually-used-to-solve-anomaly-detection)
* [*How to use machine learning for anomaly detection and condition monitoring (Vegard Flovik)*](https://towardsdatascience.com/how-to-use-machine-learning-for-anomaly-detection-and-condition-monitoring-6742f82900d7)
* [*5 Ways to Detect Outliers/Anomalies That Every Data Scientist Should Know (Will Badr)*](https://towardsdatascience.com/5-ways-to-detect-outliers-that-every-data-scientist-should-know-python-code-70a54335a623)
* [*A Brief Overview of Outlier Detection Techniques (Sergio Santoyo)*](https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561)
* [*Intuitively Understanding Variational Autoencoders (Irhum Shafkat)*](https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf)
* [*DBSCAN: What is it? When to use it? How to use it?*](https://medium.com/@elutins/dbscan-what-is-it-when-to-use-it-how-to-use-it-8bd506293818)
* [*Best clustering algorithms for anomaly detection*](https://towardsdatascience.com/best-clustering-algorithms-for-anomaly-detection-d5b7412537c8)
  {% endhint %}

Depending on the **presence or lack of labels**, there are two approaches to face yourself to an anomaly detection problem.

## As an unbalanced classification problem

When we are given a set of observations with labels that indicate whether each point is an anomaly or not, this can be seen as a **binary classification problem**. So we can use any classifier we like. The only issue here is that **anomalies are by definition rare events**, so you’ll have to deal with class imbalance.

{% content-ref url="/pages/-LpPeg\_xPD8s8k1gNVrF" %}
[How to deal with imbalanced datasets?](/iron-data-science-notebook/ml-datascience/frequent-questions/how-to-deal-with-imbalanced-datasets.md)
{% endcontent-ref %}

## As an unsupervised problem

In this scenario, we are given a **set of points without class labels**. Some of them are anomalies and some aren’t, but you don’t know which is which. The goal here is to operationalize the intuitive idea that anomalies are different from the typical data point.

{% content-ref url="/pages/-LpZkbu2vyI\_0tRCDZMH" %}
[Outliers](/iron-data-science-notebook/ml-datascience/statistics/outliers.md)
{% endcontent-ref %}

{% hint style="warning" %}
Under construction
{% endhint %}

### Basic methods

#### Z-Score

{% content-ref url="/pages/-LruqjjFeK9L7FfeyehY" %}
[Z-score](/iron-data-science-notebook/ml-datascience/statistics/z-score.md)
{% endcontent-ref %}

#### IQR

{% content-ref url="/pages/-LruqWgMqFkOsDOmd6PR" %}
[IQR](/iron-data-science-notebook/ml-datascience/statistics/iqr.md)
{% endcontent-ref %}

### Multivariate anomaly detection&#xD;

### Clustering

#### DBScan clustering

{% content-ref url="/pages/-LrFl7\_TAU6TVKDRN-fp" %}
[DBSCAN](/iron-data-science-notebook/ml-datascience/machine-learning-algorithms/unsupervised-learning/clustering/dbscan.md)
{% endcontent-ref %}

#### Other clustering techniques

These clustering techniques may be used to detect instances that are far away from clusters.

{% content-ref url="/pages/-LrAIr3kElc-\_RG5edB0" %}
[kMeans](/iron-data-science-notebook/ml-datascience/machine-learning-algorithms/unsupervised-learning/clustering/kmeans.md)
{% endcontent-ref %}

{% content-ref url="/pages/-LpEqnsseO1f7azm-MI7" %}
[Gaussian Mixture Model](/iron-data-science-notebook/ml-datascience/machine-learning-algorithms/unsupervised-learning/clustering/gaussian-mixture-model.md)
{% endcontent-ref %}

### Tree-based approach

#### Isolation Forest

It is an unsupervised learning algorithm that belongs to the **ensemble decision trees** family. It **explicitly isolates anomalies** instead of profiling and constructing normal points and regions **by assigning a score to each data point**.

{% hint style="success" %}
This algorithm **works great with very high dimensional datasets** and it proved to be a very effective way of detecting anomalies.
{% endhint %}

#### Robust Random Cut Forest (RCF)

{% embed url="<https://youtu.be/yx1vf3uapX8>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-datascience/frequent-questions/anomaly-detection-methods.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
