SST-2

Dataset Information
Introduced
2013
License
Unknown
Homepage

Overview

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.

Variants: SST-2 Binary classification, SST-2 Binary classification Dev, SST2, sst2-es-mt

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Classification OPT-1.3B Achieving Dimension-Free Communication in Federated … 2024-05-24
Classification OPT-125M Achieving Dimension-Free Communication in Federated … 2024-05-24
Text Classification DeBERTa Transformers are Short Text Classifiers: … 2022-11-30
Text Classification BERT Transformers are Short Text Classifiers: … 2022-11-30

Research Papers

Recent papers with results on this dataset: