Consists of more than 210k videos for 310 audio classes.
Variants: VGG-Sound, VGGSound
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Multi-modal Classification | CAV-MAE (Audio-Visual) | Contrastive Audio-Visual Masked Autoencoder | 2022-10-02 |
Multi-modal Classification | UAVM | UAVM: Towards Unifying Audio and … | 2022-07-29 |
Recent papers with results on this dataset: