LCSTS

Name: LCSTS
License: Custom (research-only)

Dataset Information

Modalities

Texts

Languages

Chinese

License

Custom (research-only)

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

LCSTS is a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which is released to the public. This corpus consists of over 2 million real Chinese short texts with short summaries given by the author of each text. The authors also manually tagged the relevance of 10,666 short summaries with their corresponding short texts 10,666 short summaries with their corresponding short texts.

Source: LCSTS: A Large Scale Chinese Short Text Summarization Dataset

Variants: LCSTS

Associated Benchmarks

This dataset is used in 2 benchmarks:

Text Generation - Metrics: ROUGE-L
Text Summarization - Metrics: ROUGE-1

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Generation	BART (TextBox 2.0)	TextBox 2.0: A Text Generation …	2022-12-26
Text Summarization	LSTM-seq2seq	LCSTS: A Large Scale Chinese …	2015-06-19

Research Papers

Recent papers with results on this dataset:

External Links:

LCSTS

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview