Clustering metrics

Sources:

K-Means Clustering: From A to Z (Towards Data Science)
Clustering performance evaluation (Scikit Learn)
The Silhouette Loss Function: Metric Learning with a Cluster Validity Index
ML | Inter-cluster and Intra-cluster distances (Geeks for Geeks)

Theses metrics evaluate how good is the clustering structure with no need for external information. Clustering evaluation metrics may belong to one of these types:

Intercluster distance
Intracluster distance
Hybrid (combines both)

Inertia

Or within-cluster sum-of-squares criterion.
Tells how far away the points within a cluster are.
The range of the score is: $[0, +\infty )$ . So, the lowest is better.

\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2) \\ \text{where } \mu_j \text{ is the centroid of each cluster and } x_i \text{ a data point.}

Silhouette score

It give information about the inter-cluster distances and the intra-cluster distances.
Tells how far away the instances in one cluster are, from the instances of another cluster.
The range of the score is $[ -1, 1]$ . The highest is better.

PreviousClustering NextkMeans

Last updated 5 years ago

Was this helpful?