The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • Inertia
  • Silhouette score

Was this helpful?

  1. ML & Data Science
  2. Machine Learning Algorithms
  3. Unsupervised learning
  4. Clustering

Clustering metrics

PreviousClusteringNextkMeans

Last updated 5 years ago

Was this helpful?

Sources:

Theses metrics evaluate how good is the clustering structure with no need for external information. Clustering evaluation metrics may belong to one of these types:

  • Intercluster distance

  • Intracluster distance

  • Hybrid (combines both)

Inertia

  • Or within-cluster sum-of-squares criterion.

  • Tells how far away the points within a cluster are.

  • The range of the score is: [0,+∞)[0, +\infty ) [0,+∞). So, the lowest is better.

∑i=0nmin⁡μj∈C(∣∣xi−μj∣∣2)where μj is the centroid of each cluster and xi a data point.\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2) \\ \text{where } \mu_j \text{ is the centroid of each cluster and } x_i \text{ a data point.}i=0∑n​μj​∈Cmin​(∣∣xi​−μj​∣∣2)where μj​ is the centroid of each cluster and xi​ a data point.

Silhouette score

  • It give information about the inter-cluster distances and the intra-cluster distances.

  • Tells how far away the instances in one cluster are, from the instances of another cluster.

  • The range of the score is [−1,1][ -1, 1][−1,1]. The highest is better.

K-Means Clustering: From A to Z (Towards Data Science)
Clustering performance evaluation (Scikit Learn)
The Silhouette Loss Function: Metric Learning with a Cluster Validity Index
ML | Inter-cluster and Intra-cluster distances (Geeks for Geeks)