LCSTS

Dataset Information
Modalities
Texts
Languages
Chinese
Homepage

Overview

LCSTS is a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which is released to the public. This corpus consists of over 2 million real Chinese short texts with short summaries given by the author of each text. The authors also manually tagged the relevance of 10,666 short summaries with their corresponding short texts 10,666 short summaries with their corresponding short texts.

Source: LCSTS: A Large Scale Chinese Short Text Summarization Dataset

Variants: LCSTS

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Text Generation BART (TextBox 2.0) TextBox 2.0: A Text Generation … 2022-12-26
Text Summarization LSTM-seq2seq LCSTS: A Large Scale Chinese … 2015-06-19

Research Papers

Recent papers with results on this dataset: