📊 Showing 6 results | 📏 Metric: Accuracy (Private)
Rank | Model | Paper | Accuracy (Private) | Date | Code |
---|---|---|---|---|---|
1 | CoCa | CoCa: Contrastive Captioners are Image-Text Foundation Models | 77.60 | 2022-05-04 | 📦 mlfoundations/open_clip 📦 facebookresearch/multimodal 📦 lucidrains/CoCa-pytorch |
2 | BASIC | Combined Scaling for Zero-shot Transfer Learning | 76.10 | 2021-11-19 | - |
3 | EVA-CLIP-18B | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | 74.70 | 2024-02-06 | 📦 baaivision/EVA 📦 baaivision/eva |
4 | InternVL-C | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | 73.90 | 2023-12-21 | 📦 opengvlab/internvl 📦 opengvlab/internvl-mmdetseg |
5 | EVA-CLIP-E/14+ | EVA-CLIP: Improved Training Techniques for CLIP at Scale | 71.60 | 2023-03-27 | 📦 baaivision/eva 📦 PaddlePaddle/PaddleMIX 📦 Yui010206/CREMA 📦 jaehong31/raccoon |
6 | AltCLIP | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | 58.70 | 2022-11-12 | 📦 flagai-open/flagai 📦 pwc-1/Paper-8 |