Image Retrieval from Contextual Descriptions
Given 10 minimally contrastive (highly similar) images and a complex description for one of them, the task is to retrieve the correct image.
The source of most images are videos and descriptions as well as retrievals come from human.
Variants: ImageCoDe
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Image Retrieval | ContextualCLIP | Image Retrieval from Contextual Descriptions | 2022-03-29 |
Recent papers with results on this dataset: