← ML Research Wiki / 1804.02767

YOLOv3: An Incremental Improvement

Joseph Redmon University of Washington, Ali Farhadi University of Washington (2018)

Paper Information

arXiv ID

1804.02767

Venue

arXiv.org

Domain

Computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP 50 in 51 ms on a Titan X, compared to 57.5 AP 50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at

Summary

This paper presents YOLOv3, an updated version of the YOLO object detection system. The authors discuss various design changes aimed at improving the accuracy and speed of the model without significantly increasing its size. YOLOv3 is reported to run at 22 ms for a resolution of 320×320 with a mean Average Precision (mAP) of 28.2. The paper outlines YOLOv3's architecture, including improvements such as a new feature extractor dark named 'Darknet-53', which incorporates shortcut connections and achieves better performance in fewer operations compared to existing models. YOLOv3 emphasizes multi-scale bounding box predictions and employs various losses for bounding box and class predictions. Additionally, the paper discusses limitations in detecting smaller objects and the performance of YOLOv3 compared to other detection models, notably that it is faster and more accurate than models like SSD and similar to RetinaNet on certain metrics. The authors also share experimented ideas that did not yield positive results. The paper concludes with a reflection on the ethical implications of deploying object detection technologies.

Methods

This paper employs the following methods:

YOLO
Darknet-53

Models Used

YOLOv3
Darknet-53
ResNet-101
ResNet-152
RetinaNet

Datasets

The following datasets were used in this research:

COCO
Open Images Dataset

Evaluation Metrics

mAP
AP 50
AP 75
AP S
AP M
AP L

Results

YOLOv3 runs at 22 ms for 320×320 resolution with 28.2 mAP
AP 50 of 57.9 on COCO dataset
3.8× faster than RetinaNet

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 1
GPU Type: Titan X

Keywords

YOLOv3 object detection convolutional neural networks Darknet-53 multi-scale prediction

Papers Using Similar Methods

External Resources

Funding: Office of Naval Research, Google
References: 20
Influential Citations: 2532

YOLOv3: An Incremental Improvement

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers