The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that aptly describe the relationship between the image and the corresponding text. These annotations provide valuable insights into the semantic connection between each pair's visual and textual elements.
Variants: CC3M-TagMask
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Semantic Segmentation | TTD (MaskCLIP) | TTD: Text-Tag Self-Distillation Enhancing Image-Text … | 2024-03-30 |
Semantic Segmentation | TTD (TCL) | TTD: Text-Tag Self-Distillation Enhancing Image-Text … | 2024-03-30 |
Multi-Label Text Classification | TTD (w/ fine-tuning) | TTD: Text-Tag Self-Distillation Enhancing Image-Text … | 2024-03-30 |
Multi-Label Text Classification | TTD (w/o fine-tuning) | TTD: Text-Tag Self-Distillation Enhancing Image-Text … | 2024-03-30 |
Multi-Label Text Classification | Qwen-72B | Qwen Technical Report | 2023-09-28 |
Semantic Segmentation | TCL | Learning to Generate Text-grounded Mask … | 2022-12-01 |
Semantic Segmentation | MaskCLIP | Extract Free Dense Labels from … | 2021-12-02 |
Multi-Label Text Classification | NLTK | NLTK: The Natural Language Toolkit | 2002-05-17 |
Recent papers with results on this dataset: