ActivityNet

Name: ActivityNet
Published: 2015-01-01
License: Unknown

Dataset Information

Modalities

Videos

Introduced

2015

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The ActivityNet dataset contains 200 different types of activities and a total of 849 hours of videos collected from YouTube. ActivityNet is the largest benchmark for temporal activity detection to date in terms of both the number of activity categories and number of videos, making the task particularly challenging. Version 1.3 of the dataset contains 19994 untrimmed videos in total and is divided into three disjoint subsets, training, validation, and testing by a ratio of 2:1:1. On average, each activity category has 137 untrimmed videos. Each video on average has 1.41 activities which are annotated with temporal boundaries. The ground-truth annotations of test videos are not public.

Source: Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection

Variants: ActivityNet, ActivityNet-1.3, ActivityNet-1.2, ActivityNet-GZSL (cls), ActivityNet-GZSL(main)

Associated Benchmarks

This dataset is used in 6 benchmarks:

Visual Question Answering (VQA) - Metrics: ClipMatch@1, ClipMatch@5, Contains, ExactMatch, Follow-up ClipMatch@1, Follow-up ClipMatch@5, Follow-up Contains, Follow-up ExactMatch
Action Recognition - Metrics: mAP
Video Retrieval - Metrics: text-to-video R@1, text-to-video R@5, text-to-video R@10, text-to-video R@50, text-to-video Mean Rank, text-to-video Median Rank, video-to-text R@1, video-to-text R@5, video-to-text Mean Rank, video-to-text Median Rank, video-to-text R@10, video-to-text R@50
Zero-Shot Action Recognition - Metrics: Top-1 Accuracy
Action Recognition In Videos - Metrics: mAP
Zero-Shot Video Retrieval - Metrics: text-to-video R@1, text-to-video R@5, text-to-video R@10, video-to-text R@1, video-to-text R@5, video-to-text R@10

Recent Benchmark Submissions

Task	Model	Paper	Date
Zero-Shot Video Retrieval	GRAM	Gramian Multimodal Representation Learning and …	2024-12-16
Video Retrieval	GRAM	Gramian Multimodal Representation Learning and …	2024-12-16
Video Retrieval	InternVideo2-6B	InternVideo2: Scaling Foundation Models for …	2024-03-22
Zero-Shot Video Retrieval	InternVideo2-6B	InternVideo2: Scaling Foundation Models for …	2024-03-22
Zero-Shot Video Retrieval	InternVideo2-1B	InternVideo2: Scaling Foundation Models for …	2024-03-22
Action Recognition	InternVideo2-6B	InternVideo2: Scaling Foundation Models for …	2024-03-22
Video Retrieval	vid-TLDR (UMT-L)	vid-TLDR: Training Free Token merging …	2024-03-20
Zero-Shot Video Retrieval	vid-TLDR (UMT-L)	vid-TLDR: Training Free Token merging …	2024-03-20
Visual Question Answering (VQA)	BLIP-2 T5	Open-ended VQA benchmarking of Vision-Language …	2024-02-11
Video Retrieval	RTQ	RTQ: Rethinking Video-language Understanding Based …	2023-12-01
Video Retrieval	TESTA (ViT-B/16)	TESTA: Temporal-Spatial Token Aggregation for …	2023-10-29
Zero-Shot Video Retrieval	LanguageBind(ViT-L/14)	LanguageBind: Extending Video-Language Pretraining to …	2023-10-03
Zero-Shot Video Retrieval	LanguageBind(ViT-H/14)	LanguageBind: Extending Video-Language Pretraining to …	2023-10-03
Zero-Shot Video Retrieval	BT-Adapter	BT-Adapter: Video Conversation is Feasible …	2023-09-27
Video Retrieval	DMAE (ViT-B/32)	Dual-Modal Attention-Enhanced Text-Video Retrieval with …	2023-09-20
Video Retrieval	COSA	COSA: Concatenated Sample Pretrained Vision-Language …	2023-06-15
Video Retrieval	VAST	VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation …	2023-05-29
Video Retrieval	VALOR	VALOR: Vision-Audio-Language Omni-Perception Pretraining Model …	2023-04-17
Zero-Shot Video Retrieval	UMT-L (ViT-L/16)	Unmasked Teacher: Towards Training-Efficient Video …	2023-03-28
Video Retrieval	UMT-L (ViT-L/16)	Unmasked Teacher: Towards Training-Efficient Video …	2023-03-28

Research Papers

Recent papers with results on this dataset:

External Links:

ActivityNet

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview