MedSecId

Dataset Information
Modalities
Texts
Languages
English
Introduced
2022
License
Unknown
Homepage

Overview

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.

Variants: MedSecId

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Classification GPT-4 LLM-Based Section Identifiers Excel on … 2024-04-25

Research Papers

Recent papers with results on this dataset: