← ML Research Wiki / 2406.09414

Depth Anything V2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao, Tiktok, Lcm Depthfm (2024)

Paper Information

arXiv ID

2406.09414

Venue

Neural Information Processing Systems

Domain

Not specified

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Figure 1: Depth Anything V2 significantly outperforms V1 [89] in robustness and fine-grained details.Compared with SD-based models [31, 25], it enjoys faster inference speed, fewer parameters, and higher depth accuracy.

Summary

This paper introduces Depth Anything V2, a foundational model for monocular depth estimation (MDE) that addresses challenges in producing robust and fine-grained depth predictions in complex scenes. To achieve this, the authors propose a novel training strategy by utilizing synthetic images with precise labels and enhancing the model's capabilities through large-scale pseudo-labeled real images. They critically analyze the limitations of conventional real datasets, such as label noise and lack of detail, arguing for the superiority of synthetic data in certain contexts. The paper presents a new evaluation benchmark, DA-2K, which is designed to overcome the limitations of existing benchmarks by providing high-resolution images with precise sparse depth annotations. The results demonstrate significant improvements over previous models, confirming the effectiveness of the proposed methods and the importance of integrating large-scale unlabeled real data in MDE tasks.

Methods

This paper employs the following methods:

Monocular Depth Estimation (MDE)
Discriminative Models
Generative Models
Knowledge Distillation

Models Used

Depth Anything V1
DINOv2
Marigold

Datasets

The following datasets were used in this research:

Hypersim
Virtual KITTI
ImageNet-21K
Objects365
Open Images V7
Places365
BDD100K
Google Landmarks
SA-1B
DIML

Evaluation Metrics

AbsRel
δ1
F1-score

Results

Depth Anything V2 outperforms previous models in depth accuracy and inference speed.
Achieved a competitive score of 83.6% in the Transparent Surface Challenge
Significantly better performance on the proposed DA-2K evaluation benchmark compared to other models.

Limitations

The authors identified the following limitations:

Heavy computational burden due to the use of 62M unlabeled images.

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 102
Influential Citations: 54

Depth Anything V2

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers