PhotoChat, the first dataset that casts light on the photo sharing behavior in online messaging. PhotoChat contains 12k dialogues,
each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a
photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context.
Variants: PhotoChat
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Image Retrieval | PaCE | PaCE: Unified Multi-modal Dialogue Pre-training … | 2023-05-24 |
Image Retrieval | VLMo | VLMo: Unified Vision-Language Pre-Training with … | 2021-11-03 |
Image Retrieval | DE++ | PhotoChat: A Human-Human Dialogue Dataset … | 2021-07-06 |
Image Retrieval | ViLT | ViLT: Vision-and-Language Transformer Without Convolution … | 2021-02-05 |
Image Retrieval | SCAN | Stacked Cross Attention for Image-Text … | 2018-03-21 |
Recent papers with results on this dataset: