ExpansionNet v2
|
Exploiting Multiple Sequence Lengths in Fast End …
|
143.70
|
2022-08-13
|
|
M2 Transformer
|
Meshed-Memory Transformer for Image Captioning
|
131.20
|
2019-12-17
|
|
RDN
|
Reflective Decoding Network for Image Captioning
|
125.20
|
2019-08-30
|
|
Lyrics
|
Lyrics: Boosting Fine-grained Language-Vision Ali…
|
121.10
|
2023-12-08
|
|
Flamingo (80B; 4-shot)
|
Retrieval-Augmented Multimodal Language Modeling
|
103.00
|
2022-11-22
|
|
RA-CM3 (2.7B)
|
Retrieval-Augmented Multimodal Language Modeling
|
89.10
|
2022-11-22
|
|
Flamingo (3B; 4-shot)
|
Retrieval-Augmented Multimodal Language Modeling
|
85.00
|
2022-11-22
|
|
Parti
|
Retrieval-Augmented Multimodal Language Modeling
|
83.90
|
2022-11-22
|
|
Vanilla CM3
|
Retrieval-Augmented Multimodal Language Modeling
|
71.90
|
2022-11-22
|
|
X-LXMERT
|
Retrieval-Augmented Multimodal Language Modeling
|
55.80
|
2022-11-22
|
|
minDALL-E
|
Retrieval-Augmented Multimodal Language Modeling
|
48.00
|
2022-11-22
|
|
UNIMO-large
|
UNIMO: Towards Unified-Modal Understanding and Ge…
|
39.60
|
2020-12-31
|
|
ruDALL-E-XL
|
Retrieval-Augmented Multimodal Language Modeling
|
38.70
|
2022-11-22
|
|
Bit Diffusion (20 steps)
|
Analog Bits: Generating Discrete Data using Diffu…
|
34.70
|
2022-08-08
|
|
NIC (ResNet-50, CutMix)
|
CutMix: Regularization Strategy to Train Strong C…
|
24.90
|
2019-05-13
|
|
DALL-E
|
Retrieval-Augmented Multimodal Language Modeling
|
20.20
|
2022-11-22
|
|