CQADupStack

Name: CQADupStack
License: Unknown

Dataset Information

Modalities

Texts

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

CQADupStack is a benchmark dataset for community question-answering research. It contains threads from twelve StackExchange subforums, annotated with duplicate question information. Pre-defined training and test splits are provided, both for retrieval and classification experiments, to ensure maximum comparability between different studies using the set. Furthermore, it comes with a script to manipulate the data in various ways.

Variants: CQADupStack

Associated Benchmarks

This dataset is used in 1 benchmark:

Information Retrieval - Metrics: mAP@100

Recent Benchmark Submissions

Task	Model	Paper	Date
Information Retrieval	SGPT-BE-5.8B	SGPT: GPT Sentence Embeddings for …	2022-02-17
Information Retrieval	TSDAE	TSDAE: Using Transformer-based Sequential Denoising …	2021-04-14

Research Papers

Recent papers with results on this dataset:

External Links:

CQADupStack

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview