FTP-UniFormerV2-L/14
|
Enhancing Video Transformers for Action Understan…
|
99.70
|
2024-03-24
|
|
VideoMAE V2-g
|
VideoMAE V2: Scaling Video Masked Autoencoders wi…
|
99.60
|
2023-03-29
|
|
OmniVec
|
OmniVec: Learning robust representations with cro…
|
99.60
|
2023-11-07
|
|
BIKE
|
Bidirectional Cross-Modal Knowledge Exploration f…
|
98.80
|
2022-12-31
|
|
SMART
|
SMART Frame Selection for Action Recognition
|
98.64
|
2020-12-19
|
|
PERF-Net (multi-distilled S3D)
|
PERF-Net: Pose Empowered RGB-Flow Net
|
98.60
|
2020-09-28
|
|
OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)
|
Omni-sourced Webly-supervised Learning for Video …
|
98.60
|
2020-03-29
|
|
ZeroI2V ViT-L/14
|
ZeroI2V: Zero-Cost Adaptation of Pre-trained Tran…
|
98.60
|
2023-10-02
|
|
LGD-3D Two-stream
|
Learning Spatio-Temporal Representation with Loca…
|
98.20
|
2019-06-13
|
|
Text4Vis
|
Revisiting Classifier: Transferring Vision-Langua…
|
98.20
|
2022-07-04
|
|
Two-Stream I3D (Imagenet+Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
98.00
|
2017-05-22
|
|
HATNet (32 frames)
|
Large Scale Holistic Video Understanding
|
97.80
|
2019-04-25
|
|
Two-Stream I3D (Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
97.80
|
2017-05-22
|
|
D3D + D3D
|
D3D: Distilled 3D Networks for Video Action Recog…
|
97.60
|
2018-12-19
|
|
BQN
|
Busy-Quiet Video Disentangling for Video Classifi…
|
97.60
|
2021-03-29
|
|
CCS + TSN (ImageNet+Kinetics pretrained)
|
Cooperative Cross-Stream Network for Discriminati…
|
97.40
|
2019-08-27
|
|
R[2+1]D-TwoStream (Kinetics pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
97.30
|
2017-11-30
|
|
CA2ST(B/16)
|
CA^2ST: Cross-Attention in Audio, Space, and Time…
|
97.20
|
2025-03-30
|
|
Hidden Two-Stream
|
Hidden Two-Stream Convolutional Networks for Acti…
|
97.10
|
2017-04-02
|
|
D3D (Kinetics-600 pretraining)
|
D3D: Distilled 3D Networks for Video Action Recog…
|
97.10
|
2018-12-19
|
|
AMD(ViT-B/16)
|
Asymmetric Masked Distillation for Pre-Training S…
|
97.10
|
2023-11-06
|
|
D3D (Kinetics-400 pretraining)
|
D3D: Distilled 3D Networks for Video Action Recog…
|
97.00
|
2018-12-19
|
|
LGD-3D RGB
|
Learning Spatio-Temporal Representation with Loca…
|
97.00
|
2019-06-13
|
|
STAM-32 (ImageNet/Kinetics pretraining)
|
An Image is Worth 16x16 Words, What is a Video Wo…
|
97.00
|
2021-03-25
|
|
FASTER32
|
FASTER Recurrent Networks for Efficient Video Cla…
|
96.90
|
2019-06-10
|
|
R[2+1]D-RGB (Kinetics pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
96.80
|
2017-11-30
|
|
S3D-G (ImageNet, Kinetics-400 pretrained)
|
Rethinking Spatiotemporal Feature Learning: Speed…
|
96.80
|
2017-12-13
|
|
LGD-3D Flow
|
Learning Spatio-Temporal Representation with Loca…
|
96.80
|
2019-06-13
|
|
Flow-I3D (Imagenet+Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
96.70
|
2017-05-22
|
|
VidTr-L
|
VidTr: Video Transformer Without Convolutions
|
96.70
|
2021-04-23
|
|
CMA iter1-S
|
Two-Stream Video Classification with Cross-Modali…
|
96.50
|
2019-08-01
|
|
Flow-I3D (Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
96.50
|
2017-05-22
|
|
I3D RGB + DMC-Net (I3D)
|
DMC-Net: Generating Discriminative Motion Cues fo…
|
96.50
|
2019-01-11
|
|
A2-Net (ResNet-50)
|
$A^2$-Nets: Double Attention Networks
|
96.40
|
2018-10-27
|
|
MF-Net, RGB only (ImageNet+Kinetics pretrained)
|
Multi-Fiber Networks for Video Recognition
|
96.00
|
2018-07-30
|
|
Optical Flow Guided Feature
|
Optical Flow Guided Feature: A Fast and Robust Mo…
|
96.00
|
2017-11-29
|
|
Prob-Distill
|
Attention Distillation for Learning Video Represe…
|
95.70
|
2019-04-05
|
|
RGB-I3D (Imagenet+Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
95.60
|
2017-05-22
|
|
R[2+1]D-Flow (Kinetics pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
95.50
|
2017-11-30
|
|
TVNet+IDT
|
End-to-End Learning of Motion Representation for …
|
95.40
|
2018-04-02
|
|
TesNet (ImageNet pretrained)
|
Learning spatio-temporal representations with tem…
|
95.20
|
2020-02-11
|
|
RGB-I3D (Kinetics pre-training)
|
Quo Vadis, Action Recognition? A New Model and th…
|
95.10
|
2017-05-22
|
|
R[2+1]D-TwoStream (Sports-1M pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
95.00
|
2017-11-30
|
|
X3D MobileNet-V3 LGD-GC
|
LIGAR: Lightweight General-purpose Action Recogni…
|
94.85
|
2021-08-30
|
|
ST-ResNet + IDT
|
Spatiotemporal Residual Networks for Video Action…
|
94.60
|
2016-11-07
|
|
ResNeXt-101 (64f)
|
Can Spatiotemporal 3D CNNs Retrace the History of…
|
94.50
|
2017-11-27
|
|
TSN+TSM
|
Temporal-Spatial Mapping for Action Recognition
|
94.30
|
2018-09-11
|
|
ARTNet w/ TSN
|
Appearance-and-Relation Networks for Video Classi…
|
94.30
|
2017-11-24
|
|
Temporal Segment Networks
|
Temporal Segment Networks: Towards Good Practices…
|
94.20
|
2016-08-02
|
|
TS-LSTM
|
TS-LSTM and Temporal-Inception: Exploiting Spatio…
|
94.10
|
2017-03-30
|
|
SVT
|
Self-supervised Video Transformer
|
93.70
|
2021-12-02
|
|
R[2+1]D-RGB (Sports-1M pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
93.60
|
2017-11-30
|
|
Two-stream I3D
|
Quo Vadis, Action Recognition? A New Model and th…
|
93.40
|
2017-05-22
|
|
R[2+1]D-Flow (Sports-1M pretrained)
|
A Closer Look at Spatiotemporal Convolutions for …
|
93.30
|
2017-11-30
|
|
VIMPAC
|
VIMPAC: Video Pre-Training via Masked Token Predi…
|
92.70
|
2021-06-21
|
|
S:VGG-16, T:VGG-16 (ImageNet pretrain)
|
Convolutional Two-Stream Network Fusion for Video…
|
92.50
|
2016-04-22
|
|
DMC-Net (I3D)
|
DMC-Net: Generating Discriminative Motion Cues fo…
|
92.30
|
2019-01-11
|
|
two-in-one two stream
|
Dance with Flow: Two-in-One Stream Action Detecti…
|
92.00
|
2019-04-01
|
|
LTC
|
Long-term Temporal Convolutions for Action Recogn…
|
91.70
|
2016-04-15
|
|
TDD + IDT
|
Action Recognition with Trajectory-Pooled Deep-Co…
|
91.50
|
2015-05-19
|
|
Very deep two-stream ConvNet
|
Towards Good Practices for Very Deep Two-Stream C…
|
91.40
|
2015-07-08
|
|
3D ResNeXt-101 + Confidence Distillation
|
Efficient Action Recognition Using Confidence Dis…
|
91.20
|
2021-09-05
|
|
Two-stream+LSTM
|
Beyond Short Snippets: Deep Networks for Video Cl…
|
88.60
|
2015-03-31
|
|
P3D (ImageNet + Sports1M)
|
Learning Spatio-Temporal Representation with Pseu…
|
88.60
|
2017-11-28
|
|
Two-Stream (ImageNet pretrained)
|
Two-Stream Convolutional Networks for Action Reco…
|
88.00
|
2014-06-09
|
|
MV-CNN
|
Real-time Action Recognition with Enhanced Motion…
|
86.40
|
2016-04-26
|
|
Dynamics 2 for DenseNet-201 Transformer
|
Video Action Recognition Collaborative Learning w…
|
86.10
|
2023-02-17
|
|
R(2+1)D-18 (DistInit pretraining)
|
DistInit: Learning Video Representations Without …
|
85.80
|
2019-01-26
|
|
Res3D
|
ConvNet Architecture Search for Spatiotemporal Fe…
|
85.80
|
2017-08-16
|
|
ActionFlowNet
|
ActionFlowNet: Learning Motion Representation for…
|
83.90
|
2016-12-09
|
|
C3D
|
Learning Spatiotemporal Features with 3D Convolut…
|
82.30
|
2014-12-02
|
|
HalluciNet (ResNet-50)
|
HalluciNet-ing Spatiotemporal Representations Usi…
|
79.83
|
2019-12-10
|
|
R[2+1]D (VideoMoCo)
|
VideoMoCo: Contrastive Video Representation Learn…
|
78.70
|
2021-03-10
|
|
3D-ResNet-18 (VideoMoCo)
|
VideoMoCo: Contrastive Video Representation Learn…
|
74.10
|
2021-03-10
|
|
R3D-18
|
Federated Self-supervised Learning for Video Unde…
|
73.16
|
2022-07-05
|
|
CD-UAR
|
Towards Universal Representation for Unseen Actio…
|
42.50
|
2018-03-22
|
|