Ao Wang School of Software Tsinghua University, Hui Chen [email protected] BNRist Tsinghua University, Lihao Liu School of Software Tsinghua University, Kai Chen School of Software Tsinghua University, Zijia Lin [email protected] School of Software Tsinghua University, Jungong Han [email protected] Department of Automation Tsinghua University, Guiguang Ding [email protected] School of Software Tsinghua University (2024)
The paper introduces YOLOv10, a new generation of real-time end-to-end object detectors which improve upon previous YOLO versions by addressing issues related to non-maximum suppression (NMS) and model architecture inefficiencies. It discusses a new dual assignment strategy for NMS-free training that enhances model performance while reducing inference latency. Furthermore, it implements a holistic efficiency-accuracy driven design that changes the classification head, downsampling strategies, and utilizes large-kernel convolutions and partial self-attention to achieve state-of-the-art performance and efficiency. Experiments show significant improvements in accuracy and latency over prior models like YOLOv9 and RT-DETR across various model scales, validating the proposed methods.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: