ML Research Wiki / Benchmarks / Multi-modal Classification / VGG-Sound

VGG-Sound

Multi-modal Classification Benchmark

Performance Over Time

📊 Showing 2 results | 📏 Metric: Top-1 Accuracy

Click "Edit" next to any result to modify it, or add a new result at the bottom. All changes will be reviewed before going live.

Yellow rows = Pending edits Green rows = Pending new results

Model	Paper	Top-1 Accuracy	Date	Actions
CAV-MAE (Audio-Visual)	Contrastive Audio-Visual Masked Autoencoder	65.90	2022-10-02
UAVM	UAVM: Towards Unifying Audio and Visual Models	65.80	2022-07-29

Rank	Model	Paper	Top-1 Accuracy	Date	Code
1	CAV-MAE (Audio-Visual) 📚	Contrastive Audio-Visual Masked Autoencoder	65.90	2022-10-02	📦 yuangongnd/cav-mae
2	UAVM 📚	UAVM: Towards Unifying Audio and Visual Models	65.80	2022-07-29	📦 YuanGongND/uavm

2022

CAV-MAE (Audio-Visual)

yuangongnd/cav-mae

2022

UAVM

YuanGongND/uavm