Clustering metrics

Theses metrics evaluate how good is the clustering structure with no need for external information. Clustering evaluation metrics may belong to one of these types:

  • Intercluster distance

  • Intracluster distance

  • Hybrid (combines both)

Inertia

  • Or within-cluster sum-of-squares criterion.

  • Tells how far away the points within a cluster are.

  • The range of the score is: [0,+)[0, +\infty ) . So, the lowest is better.

i=0nminμjC(xiμj2)where μj is the centroid of each cluster and xi a data point.\sum_{i=0}^{n}\min_{\mu_j \in C}(||x_i - \mu_j||^2) \\ \text{where } \mu_j \text{ is the centroid of each cluster and } x_i \text{ a data point.}

Silhouette score

  • It give information about the inter-cluster distances and the intra-cluster distances.

  • Tells how far away the instances in one cluster are, from the instances of another cluster.

  • The range of the score is [1,1][ -1, 1]. The highest is better.

Last updated