The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page

Was this helpful?

  1. ML & Data Science
  2. Computer Vision
  3. Object Detection
  4. One-Stage detectors

YOLO

PreviousOne-Stage detectorsNextYOLO v2

Last updated 8 months ago

Was this helpful?

YOLO: Algorithm for Object Detection Explained [+Examples] ()

A comprehensive review of YOLO architectures in Computer Vision: From YOLOV1 to YOLOV8 and YOLO-NAS ()

You Only Look Once (YOLO) achieved SoTA results in real-time detectors in 2015. A family of detectors emerged since then:

The YOLO architecture is simple:

  • 24 convolutional layers and two end FC layers.

  • The first 20 layers are pre-trained on Imagenet with half-resolution images

  • Then it's trained on detection with full-resolution images

  • YOLO divides the input image into a S × S grid

  • and predicts B bounding boxes of the same class, along with its confidence for C different classes per grid element

  • Each bbox prediction consists of five values: Pc, bx, by, bh, bw (Pc is the confidence score)

  • Uses 1 × 1 Conv layers to reduce the number of feature maps and keep the #parameters relatively low

  • The output of YOLO is a tensor of S × S × (B × 5 +C) optionally followed by non-maximum suppression (NMS) to remove duplicate detections.

  • In the original YOLO paper, the authors used

    • the PASCAL VOC dataset that contains 20 classes (C = 20)

    • a grid of 7 × 7 (S = 7)

    • and at most 2 classes per grid element (B = 2)

  • YOLO uses NMS (non-maximum suppression) to improve accuracy.

Pros & Cons

  • Much faster than the existing object detectors allowing real-time performance.

  • However, the localization error was larger compared to SOTA methods such as Fast R-CNN.

    • It could only detect at most two objects of the same class in the grid cell, limiting its ability to predict nearby objects.

    • It struggled to predict objects with aspect ratios not seen in the training data.

    • It learned from coarse object features due to the down-sampling layers.

V7Labs
arXiv