Ohsumed

Dataset Information
Modalities
Texts
Languages
English
License
Unknown
Homepage

Overview

Ohsumed includes medical abstracts from the MeSH categories of the year 1991. In [Joachims, 1997] were used the first 20,000 documents divided in 10,000 for training and 10,000 for testing. The specific task was to categorize the 23 cardiovascular diseases categories. After selecting the such category subset, the unique abstract number becomes 13,929 (6,286 for training and 7,643 for testing). As current computers can easily manage larger number of documents we make available all 34,389 cardiovascular diseases abstracts out of 50,216 medical abstracts contained in the year 1991.

Source: http://disi.unitn.it/moschitti/corpora.htm

Variants: Ohsumed

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Text Classification RoBERTaGCN BertGCN: Transductive Text Classification by … 2021-05-12
Text Classification REL-RWMD k-NN Speeding up Word Mover's Distance … 2019-12-01
Text Classification Our Model* Text Level Graph Neural Network … 2019-10-06
Text Classification GraphStar Graph Star Net for Generalized … 2019-06-21
Text Classification ApproxRepSet Rep the Set: Neural Networks … 2019-04-03
Text Classification SGC Simplifying Graph Convolutional Networks 2019-02-19
Text Classification SGCN Simplifying Graph Convolutional Networks 2019-02-19
Text Classification Text GCN Graph Convolutional Networks for Text … 2018-09-15
Text Classification CNN+Lowercased On the Role of Text … 2017-07-06

Research Papers

Recent papers with results on this dataset: