BLIP-2 ViT-G (zero-shot, 1K test set)
|
BLIP-2: Bootstrapping Language-Image Pre-training…
|
98.10
|
2023-01-30
|
|
BLIP-2 ViT-L (zero-shot, 1K test set)
|
BLIP-2: Bootstrapping Language-Image Pre-training…
|
97.60
|
2023-01-30
|
|
MaMMUT (ours)
|
MaMMUT: A Simple Architecture for Joint Learning …
|
96.00
|
2023-03-29
|
|
HADA
|
HADA: A Graph-based Amalgamation Framework in Ima…
|
95.94
|
2023-01-11
|
|
ALBEF
|
HADA: A Graph-based Amalgamation Framework in Ima…
|
95.30
|
2023-01-11
|
|
UNITER
|
HADA: A Graph-based Amalgamation Framework in Ima…
|
94.08
|
2023-01-11
|
|
LGSGM
|
A Deep Local and Global Scene-Graph Matching for …
|
84.10
|
2021-06-04
|
|
GSMN
|
A Deep Local and Global Scene-Graph Matching for …
|
82.30
|
2021-06-04
|
|
VisualSparta
|
VisualSparta: An Embarrassingly Simple Approach t…
|
82.00
|
2021-01-01
|
|