Metrics

In regression

TODO

In classification

The confusion matrix

Confusion Matrix is a performance measurement for machine learning classification where output can be two or more classes.

  • True positive

    • Predicted 1 \Rightarrow Actual 1

  • False positive

    • Predicted 1 \Rightarrow Actual 0

    • This is a Type-1 Error

  • False negative

    • Predicted 0 \Rightarrow Actual 1

    • This is a Type-2 Error

  • True negative

    • Predicted 0 \Rightarrow Actual 0

Cost matrix

A cost matrix (error matrix) is also useful when specific classification errors are more severe than others. The Classification mining function tries to avoid classification errors with a high error weight. The trade-off of avoiding 'expensive' classification errors is an increased number of 'cheap' classification errors.

Accuracy

From all the total samples, it measures the well-classified rate.

Accuracy=TP+TNTotalAccuracy = \frac{\text{TP} + \text{TN} }{Total}

Precision

From all the predicted positives, it measures the rate actually classified as positive.

It is a good measure to determine, when the costs of FP is high. For instance, email spam detection.

Recall

From all the actual positives, it measures the rate classified as positives values. As well. it is called Sensitivity or True Positive Rate (TPR).

It is a good metric to select our best model when there is a high cost associated with FN. For instance, in fraud detection or sick patient detection.

Minimizing False Positives

Specificity

From all the actual negative values, it measures the rate classified as negative.

Specificity=TNTN+FPSpecificity = \frac{\text{TN}}{\text{TN} + \text{FP}}

False Positive Rate (FPR)

From all the actual negative values, it measures the rate misclassified as negative. It the negated probability of specificity.

FPR=1Specificity=FPTN+FP\text{FPR} = 1-Specificity = \frac{\text{FP}}{\text{TN} + \text{FP}}

F-Score

F1-score might be a better measure to use if we need to seek a balance between Precision and Recall AND there is an uneven class distribution (large number of Actual Negatives).

F1=2PrecisionRecallPrecision+Recall\text{F1} = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

Last updated