RankRAG-llama3-70b (Zero-Shot, KILT)
|
RankRAG: Unifying Context Ranking with Retrieval-…
|
86.50
|
2024-07-02
|
|
PaLM 2-L (one-shot)
|
PaLM 2 Technical Report
|
86.10
|
2023-05-17
|
|
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
|
ChatQA: Surpassing GPT-4 on Conversational QA and…
|
85.60
|
2024-01-18
|
|
LLaMA 2 70B (one-shot)
|
Llama 2: Open Foundation and Fine-Tuned Chat Mode…
|
85.00
|
2023-07-18
|
|
GPT-4-0613 (Zero-shot)
|
GPT-4 Technical Report
|
84.80
|
2023-03-15
|
|
SpanBERT
|
SpanBERT: Improving Pre-training by Representing …
|
83.60
|
2019-07-24
|
|
RankRAG-llama3-8b (Zero-Shot, KILT)
|
RankRAG: Unifying Context Ranking with Retrieval-…
|
82.90
|
2024-07-02
|
|
PaLM 2-M (one-shot)
|
PaLM 2 Technical Report
|
81.70
|
2023-05-17
|
|
PaLM-540B (Few-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
81.40
|
2022-04-05
|
|
PaLM-540B (One-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
81.40
|
2022-04-05
|
|
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
|
ChatQA: Surpassing GPT-4 on Conversational QA and…
|
81.00
|
2024-01-18
|
|
BigBird-etc
|
Big Bird: Transformers for Longer Sequences
|
80.90
|
2020-07-28
|
|
DPA-RAG
|
Understand What LLM Needs: Dual Preference Alignm…
|
80.10
|
2024-06-26
|
|
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
|
Breaking the Ceiling of the LLM Community by Trea…
|
79.29
|
2024-06-18
|
|
LinkBERT (large)
|
LinkBERT: Pretraining Language Models with Docume…
|
78.20
|
2022-03-29
|
|
DyREX
|
DyREx: Dynamic Query Representation for Extractiv…
|
77.37
|
2022-10-26
|
|
code-davinci-002 175B + REPLUG LSR (Few-Shot)
|
REPLUG: Retrieval-Augmented Black-Box Language Mo…
|
77.30
|
2023-01-30
|
|
PaLM-540B (Zero-Shot)
|
PaLM: Scaling Language Modeling with Pathways
|
76.90
|
2022-04-05
|
|
code-davinci-002 175B + REPLUG (Few-Shot)
|
REPLUG: Retrieval-Augmented Black-Box Language Mo…
|
76.80
|
2023-01-30
|
|
GLaM 62B/64E (One-shot)
|
GLaM: Efficient Scaling of Language Models with M…
|
75.80
|
2021-12-13
|
|
GLaM 62B/64E (Few-shot)
|
GLaM: Efficient Scaling of Language Models with M…
|
75.80
|
2021-12-13
|
|
RA-DIT (Zero-Shot)
|
RA-DIT: Retrieval-Augmented Dual Instruction Tuni…
|
75.40
|
2023-10-02
|
|
PaLM 2-S (one-shot)
|
PaLM 2 Technical Report
|
75.20
|
2023-05-17
|
|
Search-o1
|
Search-o1: Agentic Search-Enhanced Large Reasonin…
|
74.10
|
2025-01-09
|
|
LLaMA 65B (few-shot, k=64)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
73.00
|
2023-02-27
|
|
FiE+PAQ
|
FiE: Building a Global Probability Space by Lever…
|
72.60
|
2022-11-18
|
|
LLaMA 65B (few-shot, k=5)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
72.60
|
2023-02-27
|
|
RankRAG-llama3-70b (Zero-Shot, DPR)
|
RankRAG: Unifying Context Ranking with Retrieval-…
|
72.60
|
2024-07-02
|
|
FiD+Distil
|
Distilling Knowledge from Reader to Retriever for…
|
72.10
|
2020-12-08
|
|
LLaMA 65B (one-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
71.60
|
2023-02-27
|
|
EMDR2
|
End-to-End Training of Multi-Document Reader and …
|
71.40
|
2021-06-09
|
|
S-Norm
|
Simple and Effective Multi-Paragraph Reading Comp…
|
71.32
|
2017-10-29
|
|
GLaM 62B/64E (Zero-shot)
|
GLaM: Efficient Scaling of Language Models with M…
|
71.30
|
2021-12-13
|
|
GPT-3 175B (Few-Shot)
|
Language Models are Few-Shot Learners
|
71.20
|
2020-05-28
|
|
UnitedQA (Hybrid reader)
|
UnitedQA: A Hybrid Approach for Open Domain Quest…
|
70.30
|
2021-01-01
|
|
Mistral 7B (5-shot)
|
Mistral 7B
|
69.90
|
2023-10-10
|
|
ChatQA-1.5-llama3-70b (Zero-Shot, DPR)
|
ChatQA: Surpassing GPT-4 on Conversational QA and…
|
69.00
|
2024-01-18
|
|
LLaMA 65B (zero-shot)
|
LLaMA: Open and Efficient Foundation Language Mod…
|
68.20
|
2023-02-27
|
|
Fusion-in-Decoder (large)
|
Leveraging Passage Retrieval with Generative Mode…
|
67.60
|
2020-07-02
|
|
TOME-2
|
Mention Memory: incorporating textual knowledge i…
|
65.80
|
2021-10-12
|
|
Shakti-LLM (2.5B)
|
SHAKTI: A 2.5 Billion Parameter Small Language Mo…
|
58.20
|
2024-10-15
|
|
Branch-Train-MiX 4x7B (sampling top-2 experts)
|
Branch-Train-MiX: Mixing Expert LLMs into a Mixtu…
|
57.10
|
2024-03-12
|
|
DPR
|
Dense Passage Retrieval for Open-Domain Question …
|
56.80
|
2020-04-10
|
|
Reading Twice for NLU
|
Dynamic Integration of Background Knowledge in Ne…
|
56.73
|
2017-06-08
|
|
FLAN 137B (zero-shot)
|
Finetuned Language Models Are Zero-Shot Learners
|
56.70
|
2021-09-03
|
|
RAG
|
Retrieval-Augmented Generation for Knowledge-Inte…
|
56.10
|
2020-05-22
|
|
Mnemonic Reader
|
Reinforced Mnemonic Reader for Machine Reading Co…
|
52.85
|
2017-05-08
|
|
MEMEN
|
MEMEN: Multi-layer Embedding with Memory Networks…
|
46.90
|
2017-07-28
|
|
ReasonBERTR
|
ReasonBERT: Pre-trained to Reason with Distant Su…
|
45.50
|
2021-09-10
|
|
ORQA
|
Latent Retrieval for Weakly Supervised Open Domai…
|
45.00
|
2019-06-01
|
|
ReasonBERTB
|
ReasonBERT: Pre-trained to Reason with Distant Su…
|
37.20
|
2021-09-10
|
|