YouTube-8M

Dataset Information
Modalities
Videos
Introduced
2016
License
Homepage

Overview

The YouTube-8M dataset is a large scale video dataset, which includes more than 7 million videos with 4716 classes labeled by the annotation system. The dataset consists of three parts: training set, validate set, and test set. In the training set, each class contains at least 100 training videos. Features of these videos are extracted by the state-of-the-art popular pre-trained models and released for public use. Each video contains audio and visual modality. Based on the visual information, videos are divided into 24 topics, such as sports, game, arts & entertainment, etc

Source: Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA

Variants: YouTube-8M

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Classification DCGN (self-attention graph pooling) Hierarchical Video Frame Sequence Representation … 2019-06-02
Video Classification Hierarchical LSTM with MoE Efficient Video Classification Using Fewer … 2019-02-27
Video Prediction SDCNet SDCNet: Video Prediction Using Spatially-Displaced … 2018-09-01
Video Classification Mixture-of-2-Experts YouTube-8M: A Large-Scale Video Classification … 2016-09-27

Research Papers

Recent papers with results on this dataset: