ML Research Wiki / Benchmarks / Question Answering / TriviaQA

TriviaQA

Question Answering Benchmark

Performance Over Time

📊 Showing 51 results | 📏 Metric: EM

Top Performing Models

Rank Model Paper EM Date Code
1 RankRAG-llama3-70b (Zero-Shot, KILT) 📚 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs 86.50 2024-07-02 -
2 PaLM 2-L (one-shot) 📚 PaLM 2 Technical Report 86.10 2023-05-17 📦 eternityyw/tram-benchmark
3 ChatQA-1.5-llama3-70b (Zero-Shot, KILT) 📚 ChatQA: Surpassing GPT-4 on Conversational QA and RAG 85.60 2024-01-18 -
4 LLaMA 2 70B (one-shot) Llama 2: Open Foundation and Fine-Tuned Chat Models 85.00 2023-07-18 📦 facebookresearch/llama 📦 llamafamily/llama-chinese 📦 flagalpha/llama2-chinese
5 GPT-4-0613 (Zero-shot) GPT-4 Technical Report 84.80 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
6 SpanBERT SpanBERT: Improving Pre-training by Representing and Predicting Spans 83.60 2019-07-24 📦 facebookresearch/SpanBERT 📦 mandarjoshi90/coref 📦 zixinzeng-jennifer/spanbert_trans
7 RankRAG-llama3-8b (Zero-Shot, KILT) 📚 RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs 82.90 2024-07-02 -
8 PaLM 2-M (one-shot) PaLM 2 Technical Report 81.70 2023-05-17 📦 eternityyw/tram-benchmark
9 PaLM-540B (Few-Shot) 📚 PaLM: Scaling Language Modeling with Pathways 81.40 2022-04-05 📦 lucidrains/CoCa-pytorch 📦 lucidrains/PaLM-pytorch 📦 google/paxml
10 PaLM-540B (One-Shot) PaLM: Scaling Language Modeling with Pathways 81.40 2022-04-05 📦 lucidrains/CoCa-pytorch 📦 lucidrains/PaLM-pytorch 📦 google/paxml

All Papers (51)

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

2024
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

2024
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)

Language Models are Few-Shot Learners

2020
GPT-3 175B (Few-Shot)

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

2024
ChatQA-1.5-llama3-70b (Zero-Shot, DPR)