ML Research Wiki / Benchmarks / Zero-Shot Action Recognition / HMDB51

HMDB51

Zero-Shot Action Recognition Benchmark

Performance Over Time

📊 Showing 24 results | 📏 Metric: Top-1 Accuracy

Top Performing Models

Rank Model Paper Top-1 Accuracy Date Code
1 MSQNet Actor-agnostic Multi-label Action Recognition with Multi-modal Query 69.43 2023-07-20 📦 mondalanindya/msqnet
2 MOV (ViT-L/14) Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models 64.70 2022-07-15 -
3 OTI(ViT-L/14) Orthogonal Temporal Interpolation for Zero-Shot Video Recognition 64.00 2023-08-14 📦 sweetorangezhuyan/mm2023_oti
4 BIKE Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models 61.40 2022-12-31 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
5 MOV (ViT-B/16) Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models 60.80 2022-07-15 -
6 IMP-MoE-L 📚 Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception 59.10 2023-05-10 -
7 VideoCoCa 📚 VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners 58.70 2022-12-09 -
8 Text4Vis Revisiting Classifier: Transferring Vision-Language Models for Video Recognition 58.40 2022-07-04 📦 whwu95/Cap4Video 📦 whwu95/text4vis 📦 whwu95/GPT4Vis 📦 whwu95/BIKE 📦 whwu95/ATM
9 TC-CLIP Leveraging Temporal Contextualization for Video Action Recognition 56.00 2024-04-15 📦 naver-ai/tc-clip 📦 naver-ai/dawin
10 OST OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition 55.90 2023-11-30 📦 tomchen-ctj/OST

All Papers (24)