Kinetics-700

Dataset Information
Modalities
Videos
Introduced
2019
Homepage

Overview

Kinetics-700 is a video dataset of 650,000 clips that covers 700 human action classes. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 700 video clips. Each clip is annotated with an action class and lasts approximately 10 seconds.

Variants: Kinetics-700

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Generation DiT-XL/2 + CVAE-FT-SE Improving the Diffusability of Autoencoders 2025-02-20
Image Clustering TURTLE (CLIP + DINOv2) Let Go of Your Labels … 2024-06-11

Research Papers

Recent papers with results on this dataset: