← ML Research Wiki / 2405.14458

YOLOv10: Real-Time End-to-End Object Detection

Ao Wang School of Software Tsinghua University, Hui Chen [email protected] BNRist Tsinghua University, Lihao Liu School of Software Tsinghua University, Kai Chen School of Software Tsinghua University, Zijia Lin [email protected] School of Software Tsinghua University, Jungong Han [email protected] Department of Automation Tsinghua University, Guiguang Ding [email protected] School of Software Tsinghua University (2024)

Paper Information

arXiv ID

2405.14458

Venue

Neural Information Processing Systems

Domain

computer vision

SOTA Claim

Yes

Code

Available

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

example, our YOLOv10-S is 1.8× faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8× smaller number of parameters and FLOPs.Compared with YOLOv9-C, YOLOv10-B has 46% less latency and 25% fewer parameters for the same performance.Code and models are available at https://github.com/THU-MIG/yolov10.

Summary

The paper introduces YOLOv10, a new generation of real-time end-to-end object detectors which improve upon previous YOLO versions by addressing issues related to non-maximum suppression (NMS) and model architecture inefficiencies. It discusses a new dual assignment strategy for NMS-free training that enhances model performance while reducing inference latency. Furthermore, it implements a holistic efficiency-accuracy driven design that changes the classification head, downsampling strategies, and utilizes large-kernel convolutions and partial self-attention to achieve state-of-the-art performance and efficiency. Experiments show significant improvements in accuracy and latency over prior models like YOLOv9 and RT-DETR across various model scales, validating the proposed methods.

Methods

This paper employs the following methods:

NMS-free training
dual assignments
holistic efficiency-accuracy driven design
lightweight classification head
spatial-channel decoupled downsampling
rank-guided block design
large-kernel convolution
partial self-attention

Models Used

YOLOv10-N
YOLOv10-S
YOLOv10-M
YOLOv10-B
YOLOv10-L
YOLOv10-X
YOLOv9-C
RT-DETR-R18
RT-DETR-R101

Datasets

The following datasets were used in this research:

COCO

Evaluation Metrics

AP

Results

YOLOv10-S is 1.8× faster than RT-DETR-R18 under similar AP on COCO
YOLOv10-B has 46% less latency and 25% fewer parameters than YOLOv9-C for the same performance
YOLOv10-S / X are 1.8× / 1.3× faster than RT-DETR-R18 / R101 under similar performance
YOLOv10 exhibits highly efficient parameter utilization

Limitations

The authors identified the following limitations:

Performance gap compared to original one-to-many training with NMS observed in small models
Further exploration needed to reduce performance gap for future versions

Technical Requirements

Number of GPUs: 8
GPU Type: NVIDIA 3090

Keywords

YOLOv10 real-time object detection end-to-end detection NMS-free training model efficiency

External Resources

Funding: National Natural Science Foundation of China, Beijing Natural Science Foundation
References: 81
Influential Citations: 52

YOLOv10: Real-Time End-to-End Object Detection

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers