MIMIC-III

The Medical Information Mart for Intensive Care III

Dataset Information
Modalities
Medical, Tabular
Languages
English
Introduced
2016
License
MIT
Homepage

Overview

The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.

The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

Source: MIT Laboratory for Computational Biology

Variants: MIMIC-III

Associated Benchmarks

This dataset is used in 7 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Medical Code Prediction GKI-ICD A General Knowledge Injection Framework … 2025-05-24
Medical Code Prediction PLM-CA An Unsupervised Approach to Achieve … 2024-06-13
Multivariate Time Series Forecasting FLD Functional Latent Dynamics for Irregularly … 2024-05-06
Multivariate Time Series Forecasting GraFITi Forecasting Irregularly Sampled Time Series … 2023-05-22
Medical Code Prediction MSMN+KEPTLongformer Knowledge Injected Prompt Based Fine-tuning … 2022-10-07
Length-of-Stay prediction EHR-Graph Transformer Unsupervised Pre-Training on Patient Population … 2022-03-23
Length-of-Stay prediction EHR-Graph Transformer (pre-trained) Unsupervised Pre-Training on Patient Population … 2022-03-23
Medical Code Prediction MSMN Code Synonyms Do Matter: Multiple … 2022-03-03
Multivariate Time Series Forecasting Neural Flows Neural Flows: Efficient Alternative to … 2021-10-25
Medical Code Prediction RAC Read, Attend, and Code: Pushing … 2021-07-10
Mortality Prediction ELECTRA (pretrained) MeDAL: Medical Abbreviation Disambiguation Dataset … 2020-12-27
Mortality Prediction ELECTRA (from scratch) MeDAL: Medical Abbreviation Disambiguation Dataset … 2020-12-27
Mortality Prediction LSTM (pretrained) MeDAL: Medical Abbreviation Disambiguation Dataset … 2020-12-27
Mortality Prediction LSTM+SA (from scratch) MeDAL: Medical Abbreviation Disambiguation Dataset … 2020-12-27
Mortality Prediction LSTM+SA (pretrained) MeDAL: Medical Abbreviation Disambiguation Dataset … 2020-12-27
Medical Code Prediction HAN Explainable Automated Coding of Clinical … 2020-10-29
Multi-Label Text Classification HAN Explainable Automated Coding of Clinical … 2020-10-29
Multi-Label Text Classification HLAN Explainable Automated Coding of Clinical … 2020-10-29
Multi-Label Classification Of Biomedical Texts Convolutional Neural Network with per-label Attention Predicting Multiple ICD-10 Codes from … 2020-07-29
Medical Code Prediction LAAT A Label Attention Model for … 2020-07-13

Research Papers

Recent papers with results on this dataset: