COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.
Source: COCO-CN
Variants: COCO-CN
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Image Retrieval | CN-CLIP (RN50) | Chinese CLIP: Contrastive Vision-Language Pretraining … | 2022-11-02 |
Image Retrieval | CN-CLIP (ViT-L/14@336px) | Chinese CLIP: Contrastive Vision-Language Pretraining … | 2022-11-02 |
Image Retrieval | CN-CLIP (ViT-L/14) | Chinese CLIP: Contrastive Vision-Language Pretraining … | 2022-11-02 |
Image Retrieval | CN-CLIP (ViT-B/16) | Chinese CLIP: Contrastive Vision-Language Pretraining … | 2022-11-02 |
Image Retrieval | CN-CLIP (ViT-H/14) | Chinese CLIP: Contrastive Vision-Language Pretraining … | 2022-11-02 |
Image Retrieval | R2D2 (ViT-B) | CCMB: A Large-scale Chinese Cross-modal … | 2022-05-08 |
Image Retrieval | R2D2 (ViT-L/14) | CCMB: A Large-scale Chinese Cross-modal … | 2022-05-08 |
Image Retrieval | Wukong (ViT-L/14) | Wukong: A 100 Million Large-scale … | 2022-02-14 |
Image Retrieval | Wukong (ViT-B/32) | Wukong: A 100 Million Large-scale … | 2022-02-14 |
Recent papers with results on this dataset: