← ML Research Wiki / 2401.10891

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao, Tiktok, Zju (2024)

Paper Information

arXiv ID

2401.10891

Venue

Computer Vision and Pattern Recognition

Domain

Computer Vision, Deep Learning, Semi-Supervised Learning

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

† project lead ‡ corresponding author https://depth-anything.github.io Figure 1.Our model exhibits impressive generalization ability across extensive unseen scenes.Left two columns: COCO [36].Middle two: SA-1B [27] (a hold-out unseen set).Right two: photos captured by ourselves.Our model works robustly in low-light environments (1st and 3rd column), complex scenes (2nd and 5th column), foggy weather (5th column), and ultra-remote distance (5th and 6th column), etc.

Summary

This paper introduces Depth Anything, a foundation model designed for monocular depth estimation (MDE) capable of producing accurate depth information from any image in diverse conditions, leveraging a large scale of unlabeled data. The authors advocate the use of monocular unlabeled images due to their cost-effectiveness and the broad scene coverage they provide. The model is trained using both a self-trained setup, where unlabeled images receive pseudo labels from a pre-trained MDE model and joint learning with labeled images. The authors propose an innovative approach that enhances the learning process by challenging the student model with difficult optimization targets and leveraging semantic presences from pre-trained networks. Empirical results show that Depth Anything significantly outperforms MiDaS and other models in zero-shot settings across various datasets, demonstrating its effectiveness in both relative and metric depth estimation tasks as well as semantic segmentation.

Methods

This paper employs the following methods:

Self-training
Feature alignment
Challenging optimization targets

Models Used

MiDaS
DINOv2
ZoeDepth

Datasets

The following datasets were used in this research:

SA-1B
Open Images
BDD100K
None specified

Evaluation Metrics

AbsRel
δ1
mIoU

Results

Depth Anything surpasses MiDaS in zero-shot capability.
Model demonstrates strong generalization on unseen datasets.
Achieved improvements in relative depth estimation metrics (AbsRel, δ1).

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Depth Estimation Unlabeled Data Foundation Models Self-Training Semantic Priors Vision Transformers

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 89
Influential Citations: 101

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers