The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • As an unbalanced classification problem
  • As an unsupervised problem
  • Basic methods
  • Multivariate anomaly detection
  • Clustering
  • Tree-based approach

Was this helpful?

  1. ML & Data Science
  2. Frequent Questions

Anomaly detection methods

PreviousRidge vs LassoNextHow to deal with imbalanced datasets?

Last updated 3 years ago

Was this helpful?

Sources & credit:

Depending on the presence or lack of labels, there are two approaches to face yourself to an anomaly detection problem.

As an unbalanced classification problem

When we are given a set of observations with labels that indicate whether each point is an anomaly or not, this can be seen as a binary classification problem. So we can use any classifier we like. The only issue here is that anomalies are by definition rare events, so you’ll have to deal with class imbalance.

As an unsupervised problem

In this scenario, we are given a set of points without class labels. Some of them are anomalies and some aren’t, but you don’t know which is which. The goal here is to operationalize the intuitive idea that anomalies are different from the typical data point.

Under construction

Basic methods

Z-Score

IQR

Multivariate anomaly detection

Clustering

DBScan clustering

Other clustering techniques

These clustering techniques may be used to detect instances that are far away from clusters.

Tree-based approach

Isolation Forest

It is an unsupervised learning algorithm that belongs to the ensemble decision trees family. It explicitly isolates anomalies instead of profiling and constructing normal points and regions by assigning a score to each data point.

This algorithm works great with very high dimensional datasets and it proved to be a very effective way of detecting anomalies.

Robust Random Cut Forest (RCF)

What machine learning technique is usually used to solve anomaly detection? (Quora)
How to use machine learning for anomaly detection and condition monitoring (Vegard Flovik)
5 Ways to Detect Outliers/Anomalies That Every Data Scientist Should Know (Will Badr)
A Brief Overview of Outlier Detection Techniques (Sergio Santoyo)
Intuitively Understanding Variational Autoencoders (Irhum Shafkat)
DBSCAN: What is it? When to use it? How to use it?
Best clustering algorithms for anomaly detection
How to deal with imbalanced datasets?
Outliers
Z-score
IQR
DBSCAN
kMeans
Gaussian Mixture Model