RAFT

Realworld Annotated Few-shot Tasks

Dataset Information
Introduced
2021
License
Unknown
Homepage

Overview

The RAFT benchmark (Realworld Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment.

RAFT is a few-shot classification benchmark that tests language models:

  • across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.)
  • on economically valuable classification tasks (someone inherently cares about the task)
  • with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)

Description from: https://raft.elicit.org/

Image source: https://raft.elicit.org/

Variants: RAFT

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Few-Shot Text Classification T-Few Few-Shot Parameter-Efficient Fine-Tuning is Better … 2022-05-11
Few-Shot Text Classification Human (crowdsourced) RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification GPT-3 RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification AdaBoost RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification GPT-Neo RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification GPT-2 RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification BART MNLI zero-shot RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification Plurality-class RAFT: A Real-World Few-Shot Text … 2021-09-28
Few-Shot Text Classification GPT-3 zero-shot RAFT: A Real-World Few-Shot Text … 2021-09-28

Research Papers

Recent papers with results on this dataset: