← ML Research Wiki / 1804.02767

YOLOv3: An Incremental Improvement

Joseph Redmon University of Washington, Ali Farhadi University of Washington (2018)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
Computer vision
SOTA Claim
Yes
Code
Reproducibility
7/10

Abstract

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP 50 in 51 ms on a Titan X, compared to 57.5 AP 50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at

Summary

This paper presents YOLOv3, an updated version of the YOLO object detection system. The authors discuss various design changes aimed at improving the accuracy and speed of the model without significantly increasing its size. YOLOv3 is reported to run at 22 ms for a resolution of 320×320 with a mean Average Precision (mAP) of 28.2. The paper outlines YOLOv3's architecture, including improvements such as a new feature extractor dark named 'Darknet-53', which incorporates shortcut connections and achieves better performance in fewer operations compared to existing models. YOLOv3 emphasizes multi-scale bounding box predictions and employs various losses for bounding box and class predictions. Additionally, the paper discusses limitations in detecting smaller objects and the performance of YOLOv3 compared to other detection models, notably that it is faster and more accurate than models like SSD and similar to RetinaNet on certain metrics. The authors also share experimented ideas that did not yield positive results. The paper concludes with a reflection on the ethical implications of deploying object detection technologies.

Methods

This paper employs the following methods:

  • YOLO
  • Darknet-53

Models Used

  • YOLOv3
  • Darknet-53
  • ResNet-101
  • ResNet-152
  • RetinaNet

Datasets

The following datasets were used in this research:

  • COCO
  • Open Images Dataset

Evaluation Metrics

  • mAP
  • AP 50
  • AP 75
  • AP S
  • AP M
  • AP L

Results

  • YOLOv3 runs at 22 ms for 320×320 resolution with 28.2 mAP
  • AP 50 of 57.9 on COCO dataset
  • 3.8× faster than RetinaNet

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 1
  • GPU Type: Titan X

Keywords

YOLOv3 object detection convolutional neural networks Darknet-53 multi-scale prediction

Papers Using Similar Methods

External Resources