SICK

Sentences Involving Compositional Knowledge

Dataset Information
Modalities
Texts
Languages
Chinese
Introduced
2014
License
Homepage

Overview

The Sentences Involving Compositional Knowledge (SICK) dataset is a dataset for compositional distributional semantics. It includes a large number of sentence pairs that are rich in the lexical, syntactic and semantic phenomena. Each pair of sentences is annotated in two dimensions: relatedness and entailment. The relatedness score ranges from 1 to 5, and Pearson’s r is used for evaluation; the entailment relation is categorical, consisting of entailment, contradiction, and neutral. There are 4439 pairs in the train split, 495 in the trial split used for development and 4906 in the test split. The sentence pairs are generated from image and video caption datasets before being paired up using some algorithm.

Source: Multi-Label Transfer Learning for Multi-Relational Semantic Similarity
Image Source: https://www.researchgate.net/figure/Example-of-SICK-dataset-sentence-expansion-process-14_fig1_344863619

Variants: SICK, SICK-R

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Tabular Data Generation Binary Diffusion Tabular Data Generation using Binary … 2024-09-20
Semantic Textual Similarity Rematch Rematch: Robust and Efficient Matching … 2024-04-02
Semantic Textual Similarity PromptEOL+CSE+LLaMA-30B Scaling Sentence Embeddings with Large … 2023-07-31
Semantic Textual Similarity PromptEOL+CSE+OPT-13B Scaling Sentence Embeddings with Large … 2023-07-31
Semantic Textual Similarity PromptEOL+CSE+OPT-2.7B Scaling Sentence Embeddings with Large … 2023-07-31
Tabular Data Generation GReaT Language Models are Realistic Tabular … 2022-10-12
Tabular Data Generation Distill-GReaT Language Models are Realistic Tabular … 2022-10-12
Semantic Textual Similarity PromCSE-RoBERTa-large (0.355B) Improved Universal Sentence Embeddings with … 2022-03-14
Semantic Textual Similarity Trans-Encoder-RoBERTa-large-cross (unsup.) Trans-Encoder: Unsupervised sentence-pair modelling through … 2021-09-27
Semantic Textual Similarity Trans-Encoder-BERT-large-cross (unsup.) Trans-Encoder: Unsupervised sentence-pair modelling through … 2021-09-27
Semantic Textual Similarity Trans-Encoder-BERT-large-bi (unsup.) Trans-Encoder: Unsupervised sentence-pair modelling through … 2021-09-27
Semantic Textual Similarity Trans-Encoder-BERT-base-cross (unsup.) Trans-Encoder: Unsupervised sentence-pair modelling through … 2021-09-27
Semantic Textual Similarity Trans-Encoder-BERT-base-bi (unsup.) Trans-Encoder: Unsupervised sentence-pair modelling through … 2021-09-27
Natural Language Inference NeuralLog NeuralLog: Natural Language Inference with … 2021-05-29
Semantic Textual Similarity SimCSE-RoBERTalarge SimCSE: Simple Contrastive Learning of … 2021-04-18
Semantic Textual Similarity Mirror-BERT-base (unsup.) Fast, Effective, and Self-Supervised: Transforming … 2021-04-16
Semantic Textual Similarity Mirror-RoBERTa-base (unsup.) Fast, Effective, and Self-Supervised: Transforming … 2021-04-16
Semantic Textual Similarity Dino (STSb/̄🦕) Generating Datasets with Pretrained Language … 2021-04-15
Semantic Textual Similarity Dino (STS/̄🦕) Generating Datasets with Pretrained Language … 2021-04-15
Semantic Textual Similarity BERTbase-flow (NLI) On the Sentence Embeddings from … 2020-11-02

Research Papers

Recent papers with results on this dataset: