UnAV-100

Name: UnAV-100
Published: 2023-03-22
License: https://creativecommons.org/licenses/by/4.0/

Dataset Information

Modalities

Videos, Audio

Languages

English

Introduced

2023

License

https://creativecommons.org/licenses/by/4.0/

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events covering 100 event categories. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. We believe our UnAV-100, with its realistic complexity, can promote the exploration on comprehensive audio-visual video understanding.

Variants: UnAV-100

Associated Benchmarks

This dataset is used in 1 benchmark:

audio-visual event localization - Metrics: mAP, [email protected]

Recent Benchmark Submissions

Task	Model	Paper	Date
audio-visual event localization	UnAV	Dense-Localizing Audio-Visual Events in Untrimmed …	2023-03-22
audio-visual event localization	ActionFormer	ActionFormer: Localizing Moments of Actions …	2022-02-16

Research Papers

Recent papers with results on this dataset:

External Links:

UnAV-100

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview