ML Research Wiki / Benchmarks / Zero-Shot Action Recognition / Kinetics

Kinetics

Zero-Shot Action Recognition Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: Top-1 Accuracy

Top Performing Models

Rank Model Paper Top-1 Accuracy Date Code
1 TC-CLIP Leveraging Temporal Contextualization for Video Action Recognition 78.10 2024-04-15 📦 naver-ai/tc-clip 📦 naver-ai/dawin
2 IMP-MoE-L 📚 Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception 76.80 2023-05-10 -
3 OST OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition 75.10 2023-11-30 📦 tomchen-ctj/OST
4 MAXI MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge 71.60 2023-03-15 📦 wlin-at/maxi
5 OTI(ViT-L/14) Orthogonal Temporal Interpolation for Zero-Shot Video Recognition 70.60 2023-08-14 📦 sweetorangezhuyan/mm2023_oti
6 VideoCoCa 📚 VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners 70.10 2022-12-09 -
7 Text4Vis Revisiting Classifier: Transferring Vision-Language Models for Video Recognition 68.90 2022-07-04 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
8 BIKE Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models 68.50 2022-12-31 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
9 X-CLIP Expanding Language-Image Pretrained Models for General Video Recognition 65.20 2022-08-04 📦 microsoft/videox 📦 microsoft/VideoX
10 LanguageBind 📚 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment 64.10 2023-10-03 📦 PKU-YuanGroup/Video-LLaVA 📦 PKU-YuanGroup/MoE-LLaVA 📦 pku-yuangroup/languagebind

All Papers (16)