Faster R-CNN

Main contributions with respect to Fast R-CNN,

RPN or Region Proposal Network: NN to generate proposals for different anchor boxes.
Uses anchor boxes (combination of scale and aspect ratio)
CNN weights are shared by the RPN and the Fast R-CNN.

The Faster R-CNN works as follows:

The RPN generates region proposals. These regions are generated using a trainable network, that could be customized for each detection scenario. And uses the same convolutional layers used in the Fast R-CNN network.
For all region proposals in the image, a fixed-length feature vector is extracted from each region using the ROI Poolinglayer.
The extracted feature vectors are then classified using the Fast R-CNN.
The class scores of the detected objects in addition to their bounding-boxes are returned.

How are anchor boxes created?

The RPN works on the output feature map returned from the last convolutional layer shared with the Fast R-CNN. A sliding window of size NxN passes through the feature map. For each window, several candidate region proposals are generated. These proposals are not the final proposals as they will be filtered based on their “objectness score”.

Normally, they are defined by two parameters. Combining these parameters we can obtain K number of anchor boxes:

Scale
Aspect Ratio

For each window, K proposals are generated and a feature vector of equal size is extracted. Then, it's fed into two FC layers:

The cls layer generates the objectness score for every region proposal as a binary classifier (background vs object).
The reg layer returns a 4-D vector with the bbox of the region.

Given the Objectness Score, Faster R-CNN computes these two classes based on it:

PreviousFast R-CNN NextOne-Stage detectors

Last updated 11 months ago

Was this helpful?