CQADupStack

Dataset Information
Modalities
Texts
License
Unknown
Homepage

Overview

CQADupStack is a benchmark dataset for community question-answering research. It contains threads from twelve StackExchange subforums, annotated with duplicate question information. Pre-defined training and test splits are provided, both for retrieval and classification experiments, to ensure maximum comparability between different studies using the set. Furthermore, it comes with a script to manipulate the data in various ways.

Variants: CQADupStack

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Information Retrieval SGPT-BE-5.8B SGPT: GPT Sentence Embeddings for … 2022-02-17
Information Retrieval TSDAE TSDAE: Using Transformer-based Sequential Denoising … 2021-04-14

Research Papers

Recent papers with results on this dataset: