Text4Vis (w/ ViT-L)
|
Revisiting Classifier: Transferring Vision-Langua…
|
96.90
|
2022-07-04
|
|
BIKE
|
Bidirectional Cross-Modal Knowledge Exploration f…
|
96.10
|
2022-12-31
|
|
InternVideo2-6B
|
InternVideo2: Scaling Foundation Models for Multi…
|
95.90
|
2024-03-22
|
|
NSNet (w/ Swin-L)
|
NSNet: Non-saliency Suppression Sampler for Effic…
|
94.30
|
2022-07-21
|
|
TSQNet (w/ Swin-L)
|
Temporal Saliency Query Network for Efficient Vid…
|
93.70
|
2022-07-21
|
|
DSANet (w/ 3D ResNet50)
|
DSANet: Dynamic Segment Aggregation Network for V…
|
90.50
|
2021-05-25
|
|
MARL (w/ SEResNeXt-152)
|
Multi-Agent Reinforcement Learning Based Frame Sa…
|
90.05
|
2019-07-31
|
|
ListenToLook
|
Listen to Look: Action Recognition by Previewing …
|
89.90
|
2019-12-10
|
|
DSN
|
Dynamic Sampling Networks for Efficient Action Re…
|
87.90
|
2020-06-28
|
|
SMART
|
SMART Frame Selection for Action Recognition
|
84.40
|
2020-12-19
|
|
Ada3D
|
2D or not 2D? Adaptive 3D Convolution Selection f…
|
84.00
|
2020-12-29
|
|
RRA
|
Fine-grained Video Categorization with Redundancy…
|
83.40
|
2018-10-26
|
|
P3D
|
Learning Spatio-Temporal Representation with Pseu…
|
78.90
|
2017-11-28
|
|
VGG19 + 393K webcam images
|
Do Less and Achieve More: Training CNNs for Actio…
|
53.80
|
2015-12-22
|
|
CD-UAR
|
Towards Universal Representation for Unseen Actio…
|
53.80
|
2018-03-22
|
|
VGG19
|
Do Less and Achieve More: Training CNNs for Actio…
|
52.30
|
2015-12-22
|
|