RUSSE

Name: RUSSE
Published: 2018-03-15
License: MIT License

Russian Words in Context (based on RUSSE)

Dataset Information

Modalities

Texts

Languages

Russian

Introduced

2018

License

MIT License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

WiC: The Word-in-Context Dataset A reliable benchmark for the evaluation of context-sensitive word embeddings.

Depending on its context, an ambiguous word can refer to multiple, potentially unrelated, meanings. Mainstream static word embeddings, such as Word2vec and GloVe, are unable to reflect this dynamic semantic nature. Contextualised word embeddings are an attempt at addressing this limitation by computing dynamic representations for words which can adapt based on context.

Russian SuperGLUE task borrows original data from the Russe project, Word Sense Induction and Disambiguation shared task (2018)

Task Type

Reading Comprehension. Binary Classification: true/false

Example

{
  "idx" : 8,
  "word" : "дорожка",
  "sentence1" : "Бурые ковровые дорожки заглушали шаги",
  "sentence2" : "Приятели решили выпить на дорожку в местном баре",
  "start1" : 15,
  "end1" : 23,
  "start2" : 26,
  "end2" : 34,
  "label" : false,
  "gold_sense1" : 1,
  "gold_sense2" : 2
}

How did we collect data?

All text examples were collected from Russe original dataset, already collected by Russian Semantic Evaluation at ACL SIGSLAV. Human assessment was carried out on Yandex.Toloka.

In version 2, we have manually collected in the same format testset.

Variants: RUSSE

Associated Benchmarks

This dataset is used in 1 benchmark:

Word Sense Disambiguation - Metrics: Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Word Sense Disambiguation	heuristic majority	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Word Sense Disambiguation	majority_class	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Word Sense Disambiguation	Random weighted	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Word Sense Disambiguation	Human Benchmark	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29
Word Sense Disambiguation	Baseline TF-IDF1.1	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29

Research Papers

Recent papers with results on this dataset:

External Links: