Joseph Redmon University of Washington, Ali Farhadi University of Washington (2018)
This paper presents YOLOv3, an updated version of the YOLO object detection system. The authors discuss various design changes aimed at improving the accuracy and speed of the model without significantly increasing its size. YOLOv3 is reported to run at 22 ms for a resolution of 320×320 with a mean Average Precision (mAP) of 28.2. The paper outlines YOLOv3's architecture, including improvements such as a new feature extractor dark named 'Darknet-53', which incorporates shortcut connections and achieves better performance in fewer operations compared to existing models. YOLOv3 emphasizes multi-scale bounding box predictions and employs various losses for bounding box and class predictions. Additionally, the paper discusses limitations in detecting smaller objects and the performance of YOLOv3 compared to other detection models, notably that it is faster and more accurate than models like SSD and similar to RetinaNet on certain metrics. The authors also share experimented ideas that did not yield positive results. The paper concludes with a reflection on the ethical implications of deploying object detection technologies.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: