← ML Research Wiki / 2402.13616

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang Institute of Information Science Academia Sinica Taiwan National Taipei University of Technology Taiwan, I-Hau Yeh [email protected] National Taipei University of Technology Taiwan, Hong-Yuan Mark Liao [email protected] Institute of Information Science Academia Sinica Taiwan National Taipei University of Technology Taiwan Department of Information and Computer Engineering (2024)

Paper Information
arXiv ID
Venue
European Conference on Computer Vision
Domain
Computer vision, Deep learning
SOTA Claim
Yes
Code
Reproducibility
8/10

Abstract

Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth.Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed.Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost.This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions.We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives.PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights.In addition, a new lightweight network architecture -Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed.GELAN's architecture confirms that PGI has gained superior results on lightweight models.We verified the proposed GELAN and PGI on MS COCO dataset based object detection.The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution.PGI can be used for variety of models from lightweight to large.It can be used to obtain complete information, so that train-fromscratch models can achieve better results than state-of-theart models pre-trained using large datasets, the comparison results are shown in Figure1.The source codes are at: https://github.com/WongKinYiu/yolov9.

Summary

This paper presents the YOLOv9 object detection framework, which introduces programmable gradient information (PGI) to enhance information retention in deep networks, addressing issues of information loss and unreliable gradients. It proposes a new lightweight architecture, Generalized Efficient Layer Aggregation Network (GELAN), which utilizes conventional convolution techniques to improve parameter efficiency over state-of-the-art depth-wise convolution approaches. The effectiveness of PGI and GELAN is demonstrated through experiments on the MS COCO dataset, where YOLOv9 outperforms existing real-time object detection methods in terms of accuracy and computational efficiency. Key contributions include a theoretical analysis of current architectures, the introduction of PGI allowing the application of auxiliary supervision to various network sizes, and the development of GELAN to combine speed and accuracy. Results indicate that YOLOv9 achieves superior performance with reduced parameters and computational costs compared to previous models.

Methods

This paper employs the following methods:

  • Programmable Gradient Information (PGI)
  • Generalized Efficient Layer Aggregation Network (GELAN)

Models Used

  • YOLOv9
  • YOLOv7
  • RT DETR

Datasets

The following datasets were used in this research:

  • MS COCO

Evaluation Metrics

  • AP

Results

  • YOLOv9 surpasses existing real-time object detectors in all aspects
  • GELAN improves parameter utilization compared to depth-wise convolution-based designs
  • YOLOv9 shows performance improvements with 49% fewer parameters and 43% less computation while improving accuracy

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

YOLOv9 Programmable Gradient Information GELAN lightweight models reversible architectures Deep neural networks

Papers Using Similar Methods

External Resources