Ohsumed includes medical abstracts from the MeSH categories of the year 1991. In [Joachims, 1997] were used the first 20,000 documents divided in 10,000 for training and 10,000 for testing. The specific task was to categorize the 23 cardiovascular diseases categories. After selecting the such category subset, the unique abstract number becomes 13,929 (6,286 for training and 7,643 for testing). As current computers can easily manage larger number of documents we make available all 34,389 cardiovascular diseases abstracts out of 50,216 medical abstracts contained in the year 1991.
Variants: Ohsumed
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Text Classification | RoBERTaGCN | BertGCN: Transductive Text Classification by … | 2021-05-12 |
Text Classification | REL-RWMD k-NN | Speeding up Word Mover's Distance … | 2019-12-01 |
Text Classification | Our Model* | Text Level Graph Neural Network … | 2019-10-06 |
Text Classification | GraphStar | Graph Star Net for Generalized … | 2019-06-21 |
Text Classification | ApproxRepSet | Rep the Set: Neural Networks … | 2019-04-03 |
Text Classification | SGC | Simplifying Graph Convolutional Networks | 2019-02-19 |
Text Classification | SGCN | Simplifying Graph Convolutional Networks | 2019-02-19 |
Text Classification | Text GCN | Graph Convolutional Networks for Text … | 2018-09-15 |
Text Classification | CNN+Lowercased | On the Role of Text … | 2017-07-06 |
Recent papers with results on this dataset: