ML Research Wiki / Benchmarks / Zero-Shot Action Recognition / UCF101

UCF101

Zero-Shot Action Recognition Benchmark

Performance Over Time

📊 Showing 27 results | 📏 Metric: Top-1 Accuracy

Top Performing Models

Rank Model Paper Top-1 Accuracy Date Code
1 OTI(ViT-L/14) Orthogonal Temporal Interpolation for Zero-Shot Video Recognition 92.80 2023-08-14 📦 sweetorangezhuyan/mm2023_oti
2 IMP-MoE-L 📚 Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception 91.50 2023-05-10 -
3 MOV (ViT-L/14) Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models 87.10 2022-07-15 -
4 VideoCoCa 📚 VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners 86.60 2022-12-09 -
5 BIKE Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models 86.60 2022-12-31 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
6 Text4Vis Revisiting Classifier: Transferring Vision-Language Models for Video Recognition 85.80 2022-07-04 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
7 TC-CLIP Leveraging Temporal Contextualization for Video Action Recognition 85.40 2024-04-15 📦 naver-ai/tc-clip 📦 naver-ai/dawin
8 EVA-CLIP-E/14+ 📚 EVA-CLIP: Improved Training Techniques for CLIP at Scale 83.10 2023-03-27 📦 baaivision/eva 📦 PaddlePaddle/PaddleMIX 📦 Yui010206/CREMA 📦 jaehong31/raccoon
9 MOV (ViT-B/16) Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models 82.60 2022-07-15 -
10 OST OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition 79.70 2023-11-30 📦 tomchen-ctj/OST

All Papers (27)