Douban

Douban Conversation Corpus

Dataset Information
Modalities
Texts
Languages
Chinese
Introduced
2016
License
Unknown
Homepage

Overview

We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of Douban Conversation Corpus are shown in the following table.

Train Val Test
session-response pairs 1m 50k 10k
Avg. positive response per session 1 1 1.18
Fless Kappa N\A N\A 0.41
Min turn per session 3 3 3
Max ture per session 98 91 45
Average turn per session 6.69 6.75 5.95
Average Word per utterance 18.56 18.50 20.74

The test data contains 1000 dialogue context, and for each context we create 10 responses as candidates. We recruited three labelers to judge if a candidate is a proper response to the session. A proper response means the response can naturally reply to the message given the context. Each pair received three labels and the majority of the labels was taken as the final decision.



As far as we known, this is the first human-labeled test set for retrieval-based chatbots. The entire corpus link https://www.dropbox.com/s/90t0qtji9ow20ca/DoubanConversaionCorpus.zip?dl=0

Variants: Douban, Douban Monti

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Recommendation Systems ∞-AE Infinite Recommendation Networks: A Data-Centric … 2022-06-03
Conversational Response Selection Uni-Encoder Uni-Encoder: A Fast and Accurate … 2021-06-02
Conversational Response Selection Uni-Enc+BERT-FP Uni-Encoder: A Fast and Accurate … 2021-06-02
Recommendation Systems FedGNN FedGNN: Federated Graph Neural Network … 2021-02-09
Conversational Response Selection SA-BERT+HCL Dialogue Response Selection with Hierarchical … 2020-12-29
Conversational Response Selection UMS_BERT+ Do Response Selection Models Really … 2020-09-10
Conversational Response Selection SA-BERT Speaker-Aware BERT for Multi-Turn Response … 2020-04-07
Conversational Response Selection BERT An Effective Domain Adaptive Post-Training … 2019-08-13
Conversational Response Selection Poly-encoder Poly-encoders: Transformer Architectures and Pre-training … 2019-04-22
Recommendation Systems DGRec Session-based Social Recommendation via Dynamic … 2019-02-25
Link Prediction HSRL (DW) Learning Topological Representation for Networks … 2019-02-15
Link Prediction Event2vec Representation Learning for Heterogeneous Information … 2019-01-29
Conversational Response Selection IMN Interactive Matching Network for Multi-Turn … 2019-01-07
Conversational Response Selection DUA Modeling Multi-turn Conversation with Deep … 2018-06-24
Conversational Response Selection SMN Sequential Matching Network: A New … 2016-12-06
Recommendation Systems U-CFN Hybrid Recommender System based on … 2016-06-24
Recommendation Systems I-CFN Hybrid Recommender System based on … 2016-06-24

Research Papers

Recent papers with results on this dataset: