LiDiRus

Name: LiDiRus
Published: 2020-10-29
License: MIT License

Linguistic Diagnostic for Russian

Dataset Information

Modalities

Texts

Languages

Russian

Introduced

2020

License

MIT License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

LiDiRus is a diagnostic dataset that covers a large volume of linguistic phenomena, while allowing you to evaluate information systems on a simple test of textual entailment recognition. See more details diagnostics.

Task Type

RTE (Recognizing Textual Entailment) Sentence Pair Classification - Entailment - Not Entailment

Example

{
     'sentence1': "Кошка сидела на коврике.",
     'sentence2': "Кошка не сидела на коврике.",
     'label': 'not_entailment',
     'knowledge': '',
     'lexical-semantics': '',
     'logic': 'Negation',
     'predicate-argument-structure': ''
    }

How did we collect data?

All text examples manually translated and adapted from English SuperGLUE Diagnostics

Variants: LiDiRus

Associated Benchmarks

This dataset is used in 1 benchmark:

Natural Language Inference - Metrics: MCC

Recent Benchmark Submissions

Task	Model	Paper	Date
Natural Language Inference	heuristic majority	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Natural Language Inference	Random weighted	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Natural Language Inference	majority_class	Unreasonable Effectiveness of Rule-Based Heuristics …	2021-05-03
Natural Language Inference	Human Benchmark	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29
Natural Language Inference	Baseline TF-IDF1.1	RussianSuperGLUE: A Russian Language Understanding …	2020-10-29
Natural Language Inference	MT5 Large	mT5: A massively multilingual pre-trained …	2020-10-22

Research Papers

Recent papers with results on this dataset:

External Links: