AVA

Name: AVA
Published: 2018-01-01
License: CC BY 4.0

Atomic Visual Actions

Dataset Information

Modalities

Videos

Introduced

2018

License

CC BY 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the video clips has been exhaustively annotated by human annotators, and together they represent a rich variety of scenes, recording conditions, and expressions of human activity. There are annotations for:

Kinetics (AVA-Kinetics) - a crossover between AVA and Kinetics. In order to provide localized action labels on a wider variety of visual scenes, authors provide AVA action labels on videos from Kinetics-700, nearly doubling the number of total annotations, and increasing the number of unique videos by over 500x.
Actions (AvA Actions) - the AVA dataset densely annotates 80 atomic visual actions in 430 15-minute movie clips, where actions are localized in space and time, resulting in 1.62M action labels with multiple labels per human occurring frequently.
Spoken Activity (AVA ActiveSpeaker, AVA Speech). AVA ActiveSpeaker: associates speaking activity with a visible face, on the AVA v1.0 videos, resulting in 3.65 million frames labeled across ~39K face tracks. AVA Speech densely annotates audio-based speech activity in AVA v1.0 videos, and explicitly labels 3 background noise conditions, resulting in ~46K labeled segments spanning 45 hours of data.
Image Source: https://www.researchgate.net/profile/Paolo_Napoletano/publication/309327222/figure/fig1/AS:419620126248965@1477056642346/Sample-images-from-the-Aesthetic-Visual-Analysis-AVA-database-sorted-by-their-aesthetic.png

Variants: AVA v2.1, AVA-ActiveSpeaker, AVA-LAEO, AVA-Speech, AVA v2.2, AVA-Kinetics

Associated Benchmarks

This dataset is used in 1 benchmark:

Node Classification - Metrics: mAP

Recent Benchmark Submissions

Task	Model	Paper	Date
Node Classification	ASDNet [ASDNet_ICCV2021]	Learning Long-Term Spatial-Temporal Graphs for …	2022-07-15
Node Classification	TalkNet [tao2021someone]	Learning Long-Term Spatial-Temporal Graphs for …	2022-07-15
Node Classification	UniCon [zhang2021unicon]	Learning Long-Term Spatial-Temporal Graphs for …	2022-07-15
Node Classification	MAAS-TAN [MAAS2021]	Learning Long-Term Spatial-Temporal Graphs for …	2022-07-15

Research Papers

Recent papers with results on this dataset:

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (2022) -

External Links:

AVA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview