The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • Overview
  • Understanding the probability curves
  • Dealing with multiclass models

Was this helpful?

  1. ML & Data Science
  2. Frequent Questions

How does a ROC curve work?

PreviousPrecision vs RecallNextRidge vs Lasso

Last updated 3 years ago

Was this helpful?

Source:

Overview

  • ROC (Receiver Operating Characteristics) is a probability curve.

  • AUC (Area Under the Curve) represents the degree or measure of separability.

  • AUC-ROC is a performance measurement for classification problems at various thresholds settings.

  • Range: [0, 1]

    • Best: 1

    • Worst: 0.5

    • Inverse: 0

A ROC curve plots the TPR on the y-axis versus the FPR on the x-axis. The TPR is the recall and the FPR is the probability of a false alarm.

Understanding the probability curves

When two distributions overlap, we introduce type 1 and type 2 errors. Depending upon the threshold, we can minimize or maximize them. A threshold equal to 0.5 will imply the metric we give an equal weight to the sensitivity and specificity of the model.

In the ideal situation, the distribution curve of the positive class is equal to the distribution of the negative one.

When AUC is approximately 0.5, the model has no discrimination capacity to distinguish between positive class and negative class.

When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa.

To make this clear:

  • Smaller values on the x-axis of the plot indicate lower false positives and higher true negatives.

  • Larger values on the y-axis of the plot indicate higher true positives and lower false negatives.

Dealing with multiclass models

Using the One-vs-All methodology, we can plot N AUC-ROC curves for the given N classes.

When we decrease the threshold, we get more positive values thus it increases the and decreasing the . Similarly, when we increase the threshold, we get more negative values thus we get higher specificity and lower recall.

Finally, we can quantify a model’s ROC curve by calculating the total .

Area Under the Curve (AUC)
Understanding AUC-ROC curve (Sarang Narkhede)
How and When to Use ROC Curves and Precision-Recall Curves for Classification in Python (Jaon Browniee)
Beyond Accuracy: Precision and Recall (Will Koehrsen)
recall
specificity
Image courtesy: My Photoshopped Collection