UCF101

UCF101 Human Actions dataset

Dataset Information

Modalities

Videos

Languages

Korean

Introduced

2012

License

MIT

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These 101 categories can be classified into 5 types (Body motion, Human-human interactions, Human-object interactions, Playing musical instruments and Sports). The total length of these video clips is over 27 hours. All the videos are collected from YouTube and have a fixed frame rate of 25 FPS with the resolution of 320 × 240.

Source: Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification
Image Source: https://www.crcv.ucf.edu/data/UCF101.php

Variants: UCF101-skeleton, UCF101-MiTv2, UCF-101 Zero-shot, 256x256, class-conditional, UCF101 (finetuned), UCF 101, UCF101, UCF-101 16 frames, Unconditional, Single GPU, UCF-101 16 frames, 64x64, Unconditional, UCF-101 16 frames, 128x128, Unconditional, UCF-101

Associated Benchmarks

This dataset is used in 8 benchmarks:

Few-Shot Learning - Metrics: Harmonic mean
Zero-Shot Learning - Metrics: Accuracy
Image Clustering - Metrics: Accuracy, ARI, NMI
Action Recognition - Metrics: 3-fold Accuracy, Accuracy, Accuracy 20%Test
Video Frame Interpolation - Metrics: PSNR, SSIM, PSNR (sRGB), LPIPS
Zero-Shot Action Recognition - Metrics: Top-1 Accuracy, Top-5 accuracy
Prompt Engineering - Metrics: Harmonic mean
Action Recognition In Videos - Metrics: 3-fold Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Prompt Engineering	MMRL++	MMRL++: Parameter-Efficient and Interaction-Aware Representation …	2025-05-15
Action Recognition	CA2ST(B/16)	CA^2ST: Cross-Attention in Audio, Space, …	2025-03-30
Prompt Engineering	MMRL	MMRL: Multi-Modal Representation Learning for …	2025-03-11
Prompt Engineering	HPT++	HPT++: Hierarchically Prompting Vision-Language Models …	2024-08-27
Video Frame Interpolation	VFIMamba	VFIMamba: Video Frame Interpolation with …	2024-07-02
Image Clustering	TURTLE (CLIP + DINOv2)	Let Go of Your Labels …	2024-06-11
Zero-Shot Action Recognition	TC-CLIP	Leveraging Temporal Contextualization for Video …	2024-04-15
Zero-Shot Learning	ZLaP*	Label Propagation for Zero-shot Classification …	2024-04-05
Zero-Shot Learning	ZLaP	Label Propagation for Zero-shot Classification …	2024-04-05
Prompt Engineering	ProMetaR	Prompt Learning via Meta-Regularization	2024-04-01
Action Recognition	FTP-UniFormerV2-L/14	Enhancing Video Transformers for Action …	2024-03-24
Prompt Engineering	PromptKD	PromptKD: Unsupervised Prompt Distillation for …	2024-03-05
Zero-Shot Action Recognition	EZ-CLIP	EZ-CLIP: Efficient Zeroshot Video Action …	2023-12-13
Prompt Engineering	HPT	Learning Hierarchical Prompt with Structured …	2023-12-11
Zero-Shot Action Recognition	OST	OST: Refining Text Knowledge with …	2023-11-30
Action Recognition	OmniVec	OmniVec: Learning robust representations with …	2023-11-07
Action Recognition	AMD(ViT-B/16)	Asymmetric Masked Distillation for Pre-Training …	2023-11-06
Image Clustering	TAC	Image Clustering with External Guidance	2023-10-18
Action Recognition	ZeroI2V ViT-L/14	ZeroI2V: Zero-Cost Adaptation of Pre-trained …	2023-10-02
Prompt Engineering	DePT	DePT: Decoupled Prompt Tuning	2023-09-14

Research Papers

Recent papers with results on this dataset:

External Links:

UCF101

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview