CQADupStack is a benchmark dataset for community question-answering research. It contains threads from twelve StackExchange subforums, annotated with duplicate question information. Pre-defined training and test splits are provided, both for retrieval and classification experiments, to ensure maximum comparability between different studies using the set. Furthermore, it comes with a script to manipulate the data in various ways.
Variants: CQADupStack
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Information Retrieval | SGPT-BE-5.8B | SGPT: GPT Sentence Embeddings for … | 2022-02-17 |
Information Retrieval | TSDAE | TSDAE: Using Transformer-based Sequential Denoising … | 2021-04-14 |
Recent papers with results on this dataset: