Lot-insts

Long-Tailed instituition names

Dataset Information
Languages
English
Introduced
2023
License
Unknown
Homepage

Overview

LoT-insts contains over 25k classes whose frequencies are naturally long-tail distributed. Its test set from four different subsets: many-, medium-, and few-shot sets, as well as a zero-shot open set. To our best knowledge, this is the first natural language dataset that focuses on this long-tailed and open classification problem.

Variants: Lot-insts

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Text Classification Character-BERT+RS Text Classification in the Wild: … 2023-02-19
Text Classification CD-V1 Text Classification in the Wild: … 2023-02-19
Text Classification sCool Text Classification in the Wild: … 2023-02-19
Text Classification FastText Text Classification in the Wild: … 2023-02-19
Text Classification Naive Bayes Text Classification in the Wild: … 2023-02-19
Long-tail Learning Character-BERT+RS Text Classification in the Wild: … 2023-02-19

Research Papers

Recent papers with results on this dataset: