RuCoS

Name: RuCoS
Published: 2020-06-11
License: MIT License

Russian Reading Comprehension with Commonsense Reasoning

Dataset Information

Modalities

Texts

Languages

Russian

Introduced

2020

License

MIT License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Russian reading comprehension with Commonsense reasoning (RuCoS) is a large-scale reading comprehension dataset that requires commonsense reasoning. RuCoS consists of queries automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of RuCoS is to evaluate a machine`s ability of commonsense reasoning in reading comprehension.

Example

  {'source': 'Lenta',
   'passage': {
          'text':
          'Мать двух мальчиков, брошенных отцом в московском аэропорту Шереметьево, забрала их. Об этом сообщили ТАСС в пресс-службе министерства образования и науки Хабаровского края. Сейчас младший ребенок посещает детский сад, а старший ходит в школу. В учебных заведениях с ними по необходимости работают штатные психологи. Также министерство социальной защиты населения рассматривает вопрос о бесплатном оздоровлении детей в летнее время. Через несколько дней после того, как Виктор Гаврилов бросил своих детей в аэропорту, он явился с повинной к следователям в городе Батайске Ростовской области.\n@context\nБросившего детей в Шереметьево отца задержали за насилие над женой\n@context\nРоссиянина заподозрили в истязании брошенных в Шереметьево детей\n@context\nОставивший двоих детей в Шереметьево россиянин сам пришел к следователям',
          'entities': [
              {'start': 60, 'end': 71, 'text': 'Шереметьево'},
              {'start': 102, 'end': 106, 'text': 'ТАСС'},
              {'start': 155, 'end': 172, 'text': 'Хабаровского края'},
              {'start': 470, 'end': 485, 'text': 'Виктор Гаврилов'},
              {'start': 563, 'end': 571, 'text': 'Батайске'},
              {'start': 572, 'end': 590, 'text': 'Ростовской области'},
              {'start': 620, 'end': 631, 'text': 'Шереметьево'},
              {'start': 725, 'end': 736, 'text': 'Шереметьево'},
              {'start': 778, 'end': 789, 'text': 'Шереметьево'}
          ]
      },
      'qas': [
          {
              'query': '26 января @placeholder бросил сыновей в возрасте пяти и семи лет в Шереметьево.',
              'answers': [
                  {'start': 470, 'end': 485, 'text': 'Виктор Гаврилов'}
              ],
              'idx': 0
          }
      ],
      'idx': 0
  }

How did we collect data?

All text examples were collected from open news sources, then automatically filtered with QA systems to prevent obvious questions to infiltrate the dataset. The texts were then filtered by IPM frequency of the contained words and, finally, manually reviewed.

Variants: RuCoS

Associated Benchmarks

This dataset is used in 1 benchmark:

Common Sense Reasoning - Metrics: Average F1, EM

Recent Benchmark Submissions

Task	Model	Paper	Date
Common Sense Reasoning	heuristic majority	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Common Sense Reasoning	Random weighted	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Common Sense Reasoning	majority_class	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Common Sense Reasoning	Human Benchmark	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29
Common Sense Reasoning	Baseline TF-IDF1.1	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29
Common Sense Reasoning	MT5 Large	mT5: A massively multilingual pre-trained …	2020-10-22

Research Papers

Recent papers with results on this dataset:

External Links:

RuCoS

Overview edit

How did we collect data?

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview