Metrics
In regression
TODO
In classification
The confusion matrix
Confusion Matrix is a performance measurement for machine learning classification where output can be two or more classes.
True positive
Predicted 1 Actual 1
False positive
Predicted 1 Actual 0
This is a Type-1 Error
False negative
Predicted 0 Actual 1
This is a Type-2 Error
True negative
Predicted 0 Actual 0
Cost matrix
A cost matrix (error matrix) is also useful when specific classification errors are more severe than others. The Classification mining function tries to avoid classification errors with a high error weight. The trade-off of avoiding 'expensive' classification errors is an increased number of 'cheap' classification errors.
Accuracy
From all the total samples, it measures the well-classified rate.
Precision
From all the predicted positives, it measures the rate actually classified as positive.
It is a good measure to determine, when the costs of FP is high. For instance, email spam detection.
Recall
From all the actual positives, it measures the rate classified as positives values. As well. it is called Sensitivity or True Positive Rate (TPR).
It is a good metric to select our best model when there is a high cost associated with FN. For instance, in fraud detection or sick patient detection.
Minimizing False Positives
Specificity
From all the actual negative values, it measures the rate classified as negative.
False Positive Rate (FPR)
From all the actual negative values, it measures the rate misclassified as negative. It the negated probability of specificity.
F-Score
F1-score might be a better measure to use if we need to seek a balance between Precision and Recall AND there is an uneven class distribution (large number of Actual Negatives).
Last updated