← ML Research Wiki / 2304.00501

A COMPREHENSIVE REVIEW OF YOLO ARCHITECTURES IN COMPUTER VISION: FROM YOLOV1 TO YOLOV8 AND YOLO-NAS PUBLISHED AS A JOURNAL PAPER AT MACHINE LEARNING AND KNOWLEDGE EXTRACTION

Juan R Terven Universidad Autónoma de Querétaro Facultad de Informática, Instituto Politecnico Universidad Autónoma de Querétaro Facultad de Informática, Diana M Cordova-Esparza Universidad Autónoma de Querétaro Facultad de Informática (2023)

Paper Information

arXiv ID

2304.00501

Venue

Machine Learning and Knowledge Extraction

Domain

Computer Vision

SOTA Claim

Yes

Reproducibility

4/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications.We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers.We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model.Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

Summary

This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection architecture from YOLOv1 to YOLOv8 and YOLO-NAS. It describes the innovations and contributions in each iteration, emphasizing the balance between speed and accuracy in real-time object detection applications like robotics and autonomous vehicles. The paper outlines the major architectural changes, training techniques, and metrics used for evaluation throughout the YOLO family, focusing particularly on the Average Precision (AP) metric. Applications within diverse fields such as agriculture, security, medical diagnostics, and traffic management are highlighted, showcasing the versatility of YOLO models. The discussion also covers limitations, expected trends in research, and the future directions for YOLO architecture, including potential expansions into new domains.

Methods

This paper employs the following methods:

YOLO
YOLOv1
YOLOv2
YOLOv3
YOLOv4
YOLOv5
YOLOv6
YOLOv7
YOLOv8
YOLO-NAS

Models Used

YOLO
YOLOv1
YOLOv2
YOLOv3
YOLOv4
YOLOv5
YOLOv6
YOLOv7
YOLOv8
YOLO-NAS

Datasets

The following datasets were used in this research:

PASCAL VOC 2007
PASCAL VOC 2012
Microsoft COCO
Objects365

Evaluation Metrics

Average Precision (AP)

Results

Increased Average Precision (AP) across YOLO versions
Significant improvements in speed and accuracy over iterations
Diverse applications in various fields such as agriculture, security, and healthcare
Introduction of novel architectures and training techniques enhancing real-time detection capabilities

Limitations

The authors identified the following limitations:

Trade-offs between speed and accuracy
Localization errors with overlapping objects or small objects
Dependence on dataset quality for training and evaluation

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

YOLO Object Detection Deep Learning Convolutional Neural Networks Transformers Real-time Detection Neural Architecture Search

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 160
Influential Citations: 44

A COMPREHENSIVE REVIEW OF YOLO ARCHITECTURES IN COMPUTER VISION: FROM YOLOV1 TO YOLOV8 AND YOLO-NAS PUBLISHED AS A JOURNAL PAPER AT MACHINE LEARNING AND KNOWLEDGE EXTRACTION

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers