LLaVAction
|
LLaVAction: evaluating and training multi-modal l…
|
58.30
|
2025-03-24
|
|
TIM
|
TIM: A Time Interval Machine for Audio-Visual Act…
|
56.40
|
2024-04-08
|
|
Avion (ViT-L)
|
Training a Large Video Model on a Single Machine …
|
54.40
|
2023-09-28
|
|
M&M (WTS 60M)
|
M&M Mix: A Multimodal Multiview Transformer Ensem…
|
53.60
|
2022-06-20
|
|
LVMAE
|
Extending Video Masked Autoencoders to 128 frames
|
52.10
|
2024-11-20
|
|
TAdaFormer-L/14
|
Temporally-Adaptive Models for Efficient Video Un…
|
51.80
|
2023-08-10
|
|
LaViLa (TimeSformer-L)
|
Learning Video Representations from Large Languag…
|
51.00
|
2022-12-08
|
|
MTV-B (WTS 60M)
|
Multiview Transformers for Video Recognition
|
50.50
|
2022-01-12
|
|
OMNIVORE (Swin-B, finetuned)
|
Omnivore: A Single Model for Many Visual Modaliti…
|
49.90
|
2022-01-20
|
|
CAST(ViT-B/16)
|
CAST: Cross-Attention in Space and Time for Video…
|
49.30
|
2023-11-30
|
|
TAdaConvNeXtV2-S
|
Temporally-Adaptive Models for Efficient Video Un…
|
48.90
|
2023-08-10
|
|
MeMViT-24
|
MeMViT: Memory-Augmented Multiscale Vision Transf…
|
48.40
|
2022-01-20
|
|
MoViNet-A6
|
MoViNets: Mobile Video Networks for Efficient Vid…
|
47.70
|
2021-03-21
|
|
ORViT Mformer-L (ORViT blocks)
|
Object-Region Video Transformers
|
45.70
|
2021-10-13
|
|
TempAgg
|
Technical Report: Temporal Aggregate Representati…
|
45.26
|
2021-06-06
|
|
MoViNet-A5
|
MoViNets: Mobile Video Networks for Efficient Vid…
|
44.50
|
2021-03-21
|
|
Mformer-HR
|
Keeping Your Eye on the Ball: Trajectory Attentio…
|
44.50
|
2021-06-09
|
|
GSF
|
Gate-Shift-Fuse for Video Action Recognition
|
44.48
|
2022-03-16
|
|
MoViNet-A4
|
MoViNets: Mobile Video Networks for Efficient Vid…
|
44.40
|
2021-03-21
|
|
Mformer-L
|
Keeping Your Eye on the Ball: Trajectory Attentio…
|
44.10
|
2021-06-09
|
|
ViViT-L/16x2 Fact. encoder
|
ViViT: A Video Vision Transformer
|
44.00
|
2021-03-29
|
|
MBT
|
Attention Bottlenecks for Multimodal Fusion
|
43.40
|
2021-06-30
|
|
Mformer
|
Keeping Your Eye on the Ball: Trajectory Attentio…
|
43.10
|
2021-06-09
|
|
MoViNet-A2
|
MoViNets: Mobile Video Networks for Efficient Vid…
|
41.20
|
2021-03-21
|
|
TSM
|
Rescaling Egocentric Vision
|
37.39
|
2020-06-23
|
|
SlowFast
|
Rescaling Egocentric Vision
|
36.81
|
2020-06-23
|
|
MoViNet-A0
|
MoViNets: Mobile Video Networks for Efficient Vid…
|
36.80
|
2021-03-21
|
|
TBN
|
Rescaling Egocentric Vision
|
35.55
|
2020-06-23
|
|
TRN
|
Rescaling Egocentric Vision
|
35.28
|
2020-06-23
|
|
TSN
|
Rescaling Egocentric Vision
|
33.57
|
2020-06-23
|
|