📊 Showing 5 results | 📏 Metric: R1
Rank | Model | Paper | R1 | Date | Code |
---|---|---|---|---|---|
1 | PaCE | PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts | 15.20 | 2023-05-24 | 📦 AlibabaResearch/DAMO-ConvAI |
2 | VLMo | VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts | 11.50 | 2021-11-03 | 📦 microsoft/unilm 📦 ylsung/vl-merging |
3 | ViLT | ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | 11.50 | 2021-02-05 | 📦 huggingface/transformers 📦 dandelin/vilt 📦 glamor-usc/climb |
4 | SCAN | Stacked Cross Attention for Image-Text Matching | 10.40 | 2018-03-21 | 📦 kuanghuei/SCAN 📦 MysteryVaibhav/SCAN 📦 adlnlp/attention_vl |
5 | DE++ | PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling | 9.00 | 2021-07-06 | - |