MedNLI

Medical Natural Language Inference

Dataset Information
Modalities
Texts, Medical
License
Unknown
Homepage

Overview

The MedNLI dataset consists of the sentence pairs developed by Physicians from the Past Medical History section of MIMIC-III clinical notes annotated for Definitely True, Maybe True and Definitely False. The dataset contains 11,232 training, 1,395 development and 1,422 test instances. This provides a natural language inference task (NLI) grounded in the medical history of patients.

Source: MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
Image Source: https://arxiv.org/abs/1904.02181

Variants: MedNLI

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Natural Language Inference ClinicalMosaic Patient Trajectory Prediction: Integrating Clinical … 2025-02-25
Natural Language Inference BiomedGPT-B BiomedGPT: A Generalist Vision-Language Foundation … 2023-05-26
Few-Shot Learning CoT-T5-11B (1024 Shot) The CoT Collection: Improving Zero-shot … 2023-05-23
Natural Language Inference SciFive-large SciFive: a text-to-text transformer model … 2021-05-28
Natural Language Inference CharacterBERT (base, medical) CharacterBERT: Reconciling ELMo and BERT … 2020-10-20
Natural Language Inference NCBI_BERT(base) (P+M) Transfer Learning in Biomedical Natural … 2019-06-13

Research Papers

Recent papers with results on this dataset: