← ML Research Wiki / 2304.00501

A COMPREHENSIVE REVIEW OF YOLO ARCHITECTURES IN COMPUTER VISION: FROM YOLOV1 TO YOLOV8 AND YOLO-NAS PUBLISHED AS A JOURNAL PAPER AT MACHINE LEARNING AND KNOWLEDGE EXTRACTION

Juan R Terven Universidad Autónoma de Querétaro Facultad de Informática, Instituto Politecnico Universidad Autónoma de Querétaro Facultad de Informática, Diana M Cordova-Esparza Universidad Autónoma de Querétaro Facultad de Informática (2023)

Paper Information
arXiv ID
Venue
Machine Learning and Knowledge Extraction
Domain
Computer Vision
SOTA Claim
Yes
Reproducibility
4/10

Abstract

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications.We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers.We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model.Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

Summary

This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection architecture from YOLOv1 to YOLOv8 and YOLO-NAS. It describes the innovations and contributions in each iteration, emphasizing the balance between speed and accuracy in real-time object detection applications like robotics and autonomous vehicles. The paper outlines the major architectural changes, training techniques, and metrics used for evaluation throughout the YOLO family, focusing particularly on the Average Precision (AP) metric. Applications within diverse fields such as agriculture, security, medical diagnostics, and traffic management are highlighted, showcasing the versatility of YOLO models. The discussion also covers limitations, expected trends in research, and the future directions for YOLO architecture, including potential expansions into new domains.

Methods

This paper employs the following methods:

  • YOLO
  • YOLOv1
  • YOLOv2
  • YOLOv3
  • YOLOv4
  • YOLOv5
  • YOLOv6
  • YOLOv7
  • YOLOv8
  • YOLO-NAS

Models Used

  • YOLO
  • YOLOv1
  • YOLOv2
  • YOLOv3
  • YOLOv4
  • YOLOv5
  • YOLOv6
  • YOLOv7
  • YOLOv8
  • YOLO-NAS

Datasets

The following datasets were used in this research:

  • PASCAL VOC 2007
  • PASCAL VOC 2012
  • Microsoft COCO
  • Objects365

Evaluation Metrics

  • Average Precision (AP)

Results

  • Increased Average Precision (AP) across YOLO versions
  • Significant improvements in speed and accuracy over iterations
  • Diverse applications in various fields such as agriculture, security, and healthcare
  • Introduction of novel architectures and training techniques enhancing real-time detection capabilities

Limitations

The authors identified the following limitations:

  • Trade-offs between speed and accuracy
  • Localization errors with overlapping objects or small objects
  • Dependence on dataset quality for training and evaluation

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

YOLO Object Detection Deep Learning Convolutional Neural Networks Transformers Real-time Detection Neural Architecture Search

Papers Using Similar Methods

External Resources