The YouTube-8M dataset is a large scale video dataset, which includes more than 7 million videos with 4716 classes labeled by the annotation system. The dataset consists of three parts: training set, validate set, and test set. In the training set, each class contains at least 100 training videos. Features of these videos are extracted by the state-of-the-art popular pre-trained models and released for public use. Each video contains audio and visual modality. Based on the visual information, videos are divided into 24 topics, such as sports, game, arts & entertainment, etc
Source: Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA
Variants: YouTube-8M
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Video Classification | DCGN (self-attention graph pooling) | Hierarchical Video Frame Sequence Representation … | 2019-06-02 |
Video Classification | Hierarchical LSTM with MoE | Efficient Video Classification Using Fewer … | 2019-02-27 |
Video Prediction | SDCNet | SDCNet: Video Prediction Using Spatially-Displaced … | 2018-09-01 |
Video Classification | Mixture-of-2-Experts | YouTube-8M: A Large-Scale Video Classification … | 2016-09-27 |
Recent papers with results on this dataset: