ML Research Wiki / Benchmarks / Question Answering / WebQuestions

WebQuestions

Question Answering Benchmark

Performance Over Time

📊 Showing 36 results | 📏 Metric: EM

Top Performing Models

Rank	Model	Paper	EM	Date	Code
1	CoA	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models	70.70	2024-03-26	📦 MAGICS-LAB/Chain-of-Actions
2	CoA w/o actions	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models	64.70	2024-03-26	📦 MAGICS-LAB/Chain-of-Actions
3	DSP	DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines	59.40	2023-10-05	📦 stanfordnlp/dsp 📦 stanfordnlp/dspy 📦 codelion/optillm
4	DSP	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models	59.40	2024-03-26	📦 MAGICS-LAB/Chain-of-Actions
5	FiE+PAQ 📚	FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering	56.30	2022-11-18	-
6	FiE	FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering	52.40	2022-11-18	-
7	FiDO	FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference	51.10	2022-12-15	-
8	RAG	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	45.20	2020-05-22	📦 huggingface/transformers 📦 assafelovic/gpt-researcher 📦 deepset-ai/haystack
9	Few-shot	Language Models are Few-Shot Learners	44.70	2020-05-28	📦 ggml-org/llama.cpp 📦 ggerganov/llama.cpp 📦 karpathy/llm.c
10	Few-shot	Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models	44.70	2024-03-26	📦 MAGICS-LAB/Chain-of-Actions

All Papers (36)

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

CoA

MAGICS-LAB/Chain-of-Actions

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

CoA w/o actions

MAGICS-LAB/Chain-of-Actions

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

2023

DSP

stanfordnlp/dsp stanfordnlp/dspy codelion/optillm

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

DSP

MAGICS-LAB/Chain-of-Actions

FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering

2022

FiE+PAQ

FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering

2022

FiE

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

2022

FiDO

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

2020

RAG

huggingface/transformers assafelovic/gpt-researcher

Language Models are Few-Shot Learners

2020

Few-shot

ggml-org/llama.cpp ggerganov/llama.cpp

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

Few-shot

MAGICS-LAB/Chain-of-Actions

PaLM: Scaling Language Modeling with Pathways

2022

PaLM-540B (Few-Shot)

lucidrains/CoCa-pytorch lucidrains/PaLM-pytorch

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

Zero-shot

MAGICS-LAB/Chain-of-Actions

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

2019

T5.1.1-XXL+SSM

huggingface/transformers PaddlePaddle/PaddleNLP

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

2022

CoT

microsoft/guidance guidance-ai/guidance

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

CoT

MAGICS-LAB/Chain-of-Actions

Dense Passage Retrieval for Open-Domain Question Answering

2020

DPR

huggingface/transformers deepset-ai/haystack

Language Models are Few-Shot Learners

2020

GPT-3-175B (Few-Shot)

ggml-org/llama.cpp ggerganov/llama.cpp

REALM: Retrieval-Augmented Language Model Pre-Training

2020

REALM

deepset-ai/haystack google-research/language

ReAct: Synergizing Reasoning and Acting in Language Models

2022

React

ysymyth/ReAct thudm/agenttuning

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

React

MAGICS-LAB/Chain-of-Actions

Latent Retrieval for Weakly Supervised Open Domain Question Answering

2019

ORQA

google-research/language mia-workshop/mia-shared-task-2022 okanvk/Question-Answering-Project

Measuring and Narrowing the Compositionality Gap in Language Models

2022

Self-Ask

ofirpress/self-ask

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

Self-Ask

MAGICS-LAB/Chain-of-Actions

PaLM 2 Technical Report

2023

PaLM 2-L (one-shot)

eternityyw/tram-benchmark

PaLM 2 Technical Report

2023

PaLM 2-M (one-shot)

eternityyw/tram-benchmark

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

2024

ToT

MAGICS-LAB/Chain-of-Actions

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

2023

ToT

ysymyth/tree-of-thought-llm princeton-nlp/tree-of-thought-llm

Language Models are Few-Shot Learners

2020

GPT-3-175B (One-Shot)

ggml-org/llama.cpp ggerganov/llama.cpp

PaLM: Scaling Language Modeling with Pathways

2022

PaLM-540B (One-Shot)

lucidrains/CoCa-pytorch lucidrains/PaLM-pytorch

PaLM 2 Technical Report

2023

PaLM 2-S (one-shot)

eternityyw/tram-benchmark

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

2021

GLaM 62B/64E (Zero-Shot)

Language Models are Few-Shot Learners

2020

GPT-3-175B (Zero-Shot)

ggml-org/llama.cpp ggerganov/llama.cpp

PaLM: Scaling Language Modeling with Pathways

2022

PaLM-540B (Zero-Shot)

lucidrains/CoCa-pytorch lucidrains/PaLM-pytorch

Large-scale Simple Question Answering with Memory Networks

2015

Memory Networks (ensemble)

facebookresearch/ParlAI au1khan/FactQA aukhanee/FactQA

Question Answering with Subgraph Embeddings

2014

Subgraph embeddings

gmtt/CSCI590

Open Question Answering with Weakly Supervised Embedding Models

2014

Weakly Supervised Embeddings

Model	Paper	EM	Date
CoA	Chain-of-Action: Faithful and Multimodal Question…	70.70	2024-03-26
CoA w/o actions	Chain-of-Action: Faithful and Multimodal Question…	64.70	2024-03-26
DSP	DSPy: Compiling Declarative Language Model Calls …	59.40	2023-10-05
DSP	Chain-of-Action: Faithful and Multimodal Question…	59.40	2024-03-26
FiE+PAQ	FiE: Building a Global Probability Space by Lever…	56.30	2022-11-18
FiE	FiE: Building a Global Probability Space by Lever…	52.40	2022-11-18
FiDO	FiDO: Fusion-in-Decoder optimized for stronger pe…	51.10	2022-12-15
RAG	Retrieval-Augmented Generation for Knowledge-Inte…	45.20	2020-05-22
Few-shot	Language Models are Few-Shot Learners	44.70	2020-05-28
Few-shot	Chain-of-Action: Faithful and Multimodal Question…	44.70	2024-03-26
PaLM-540B (Few-Shot)	PaLM: Scaling Language Modeling with Pathways	43.50	2022-04-05
Zero-shot	Chain-of-Action: Faithful and Multimodal Question…	43.00	2024-03-26
T5.1.1-XXL+SSM	Exploring the Limits of Transfer Learning with a …	42.80	2019-10-23
CoT	Chain-of-Thought Prompting Elicits Reasoning in L…	42.50	2022-01-28
CoT	Chain-of-Action: Faithful and Multimodal Question…	42.50	2024-03-26
DPR	Dense Passage Retrieval for Open-Domain Question …	42.40	2020-04-10
GPT-3-175B (Few-Shot)	Language Models are Few-Shot Learners	41.50	2020-05-28
REALM	REALM: Retrieval-Augmented Language Model Pre-Tra…	40.70	2020-02-10
React	ReAct: Synergizing Reasoning and Acting in Langua…	38.30	2022-10-06
React	Chain-of-Action: Faithful and Multimodal Question…	38.30	2024-03-26
ORQA	Latent Retrieval for Weakly Supervised Open Domai…	36.40	2019-06-01
Self-Ask	Measuring and Narrowing the Compositionality Gap …	31.10	2022-10-07
Self-Ask	Chain-of-Action: Faithful and Multimodal Question…	31.10	2024-03-26
PaLM 2-L (one-shot)	PaLM 2 Technical Report	28.20	2023-05-17
PaLM 2-M (one-shot)	PaLM 2 Technical Report	26.90	2023-05-17
ToT	Chain-of-Action: Faithful and Multimodal Question…	26.30	2024-03-26
ToT	Tree of Thoughts: Deliberate Problem Solving with…	26.30	2023-05-17
GPT-3-175B (One-Shot)	Language Models are Few-Shot Learners	25.30	2020-05-28
PaLM-540B (One-Shot)	PaLM: Scaling Language Modeling with Pathways	22.60	2022-04-05
PaLM 2-S (one-shot)	PaLM 2 Technical Report	21.80	2023-05-17
GLaM 62B/64E (Zero-Shot)	GLaM: Efficient Scaling of Language Models with M…	15.50	2021-12-13
GPT-3-175B (Zero-Shot)	Language Models are Few-Shot Learners	14.40	2020-05-28
PaLM-540B (Zero-Shot)	PaLM: Scaling Language Modeling with Pathways	10.60	2022-04-05
Memory Networks (ensemble)	Large-scale Simple Question Answering with Memory…		2015-06-05
Subgraph embeddings	Question Answering with Subgraph Embeddings		2014-06-14
Weakly Supervised Embeddings	Open Question Answering with Weakly Supervised Em…		2014-04-16

WebQuestions

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (36)