ML Research Wiki / Benchmarks / Question Answering / Natural Questions

Natural Questions

Question Answering Benchmark

Performance Over Time

📊 Showing 46 results | 📏 Metric: EM

Top Performing Models

Rank Model Paper EM Date Code
1 Atlas (full, Wiki-dec-2018 index) Atlas: Few-shot Learning with Retrieval Augmented Language Models 64.00 2022-08-05 📦 facebookresearch/atlas 📦 thunlp/clueanchor
2 Atlas (full, Wiki-dec-2021+CC index) Atlas: Few-shot Learning with Retrieval Augmented Language Models 60.40 2022-08-05 📦 facebookresearch/atlas 📦 thunlp/clueanchor
3 DPA-RAG Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation 59.19 2024-06-26 📦 dongguanting/dpa-rag
4 FiE FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering 58.40 2022-11-18 -
5 R2-D2 (full) R2-D2: A Modular Baseline for Open-Domain Question Answering 55.90 2021-09-08 📦 KNOT-FIT-BUT/R2-D2
6 ReAtt Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer 54.70 2022-12-05 📦 jzbjyb/reatt
7 FiD-KD (full) Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering 54.70 2020-07-02 📦 jhyuklee/DensePhrases 📦 princeton-nlp/DensePhrases 📦 facebookresearch/FiD
8 RankRAG-llama3-70b (Zero-Shot, KILT) 📚 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs 54.20 2024-07-02 -
9 EMDR^2 End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering 52.50 2021-06-09 📦 DevSinghSachan/emdr2 📦 DevSinghSachan/art
10 FID (full) Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering 51.40 2020-07-02 📦 jhyuklee/DensePhrases 📦 princeton-nlp/DensePhrases 📦 facebookresearch/FiD

All Papers (46)

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

2024
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

2024
ChatQA-1.5-llama3-8b (Zero-Shot, KILT)

Language Models are Few-Shot Learners

2020
GPT-3 175B (Few-Shot, k=64)