Oscar
|
Oscar: Object-Semantics Aligned Pre-training for …
|
99.80
|
2020-04-13
|
|
BLIP-2 (ViT-G, fine-tuned)
|
BLIP-2: Bootstrapping Language-Image Pre-training…
|
98.50
|
2023-01-30
|
|
ONE-PEACE (ViT-G, w/o ranking)
|
ONE-PEACE: Exploring One General Representation M…
|
98.30
|
2023-05-18
|
|
BLIP-2 (ViT-L, fine-tuned)
|
BLIP-2: Bootstrapping Language-Image Pre-training…
|
98.00
|
2023-01-30
|
|
Unicoder-VL
|
Unicoder-VL: A Universal Encoder for Vision and L…
|
97.20
|
2019-08-16
|
|
IAIS
|
Learning Relation Alignment for Calibrated Cross-…
|
94.48
|
2021-05-28
|
|
CLIP (zero-shot)
|
Learning Transferable Visual Models From Natural …
|
88.10
|
2021-02-26
|
|
DVSA
|
Deep Visual-Semantic Alignments for Generating Im…
|
74.80
|
2014-12-07
|
|
FLAVA (ViT-B, zero-shot)
|
FLAVA: A Foundational Language And Vision Alignme…
|
42.74
|
2021-12-08
|
|