LiDiRus

Linguistic Diagnostic for Russian

Dataset Information
Modalities
Texts
Languages
Russian
Introduced
2020
License
Homepage

Overview

LiDiRus is a diagnostic dataset that covers a large volume of linguistic phenomena, while allowing you to evaluate information systems on a simple test of textual entailment recognition. See more details diagnostics.

Task Type

RTE (Recognizing Textual Entailment) Sentence Pair Classification - Entailment - Not Entailment

Example

{
     'sentence1': "Кошка сидела на коврике.",
     'sentence2': "Кошка не сидела на коврике.",
     'label': 'not_entailment',
     'knowledge': '',
     'lexical-semantics': '',
     'logic': 'Negation',
     'predicate-argument-structure': ''
    }

How did we collect data?

All text examples manually translated and adapted from English SuperGLUE Diagnostics

Variants: LiDiRus

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Natural Language Inference heuristic majority Unreasonable Effectiveness of Rule-Based Heuristics … 2021-05-03
Natural Language Inference Random weighted Unreasonable Effectiveness of Rule-Based Heuristics … 2021-05-03
Natural Language Inference majority_class Unreasonable Effectiveness of Rule-Based Heuristics … 2021-05-03
Natural Language Inference Human Benchmark RussianSuperGLUE: A Russian Language Understanding … 2020-10-29
Natural Language Inference Baseline TF-IDF1.1 RussianSuperGLUE: A Russian Language Understanding … 2020-10-29
Natural Language Inference MT5 Large mT5: A massively multilingual pre-trained … 2020-10-22

Research Papers

Recent papers with results on this dataset: