ML Research Wiki / Benchmarks / Text-To-SQL / BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

Text-To-SQL Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: Execution Accuracy % (Test)

Top Performing Models

Rank	Model	Paper	Execution Accuracy % (Test)	Date	Code
1	Human Performance	Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	92.96	2023-05-04	📦 bird-bench/mini_dev
2	XiYan-SQL	A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL	73.34	2024-11-13	📦 XGenerationLab/XiYan-SQL 📦 xgenerationlab/xiyan_mcp_server 📦 XGenerationLab/M-Schema 📦 xgenerationlab/xiyan-dbdescgen 📦 XGenerationLab/XiYan-DateResolver
3	CHASE-SQL + Gemini	CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL	73.14	2024-10-02	-
4	Distillery + GPT-4o	The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models	67.21	2024-08-14	-
5	MSc-SQL	MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation	65.60	2024-10-16	📦 layer6ai-labs/msc-sql
6	CHESS	CHESS: Contextual Harnessing for Efficient SQL Synthesis	65.00	2024-05-27	📦 shayantalaei/chess 📦 yeounoh/lc_nl2sql
7	MAC-SQL + GPT-4	MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL	57.56	2023-12-18	📦 wbbeyourself/mac-sql
8	DAIL-SQL + GPT-4	Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation	54.76	2023-08-29	📦 beachwang/dail-sql
9	DIN-SQL + GPT-4	DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction	50.72	2023-04-21	📦 mohammadrezapourreza/few-shot-nl2sql-with-prompting
10	DELLM + MAC-SQL	Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM	48.92	2024-02-18	📦 Rcrossmeister/Knowledge-to-SQL

All Papers (16)

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

2023

Human Performance

bird-bench/mini_dev

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

2024

XiYan-SQL

XGenerationLab/XiYan-SQL xgenerationlab/xiyan_mcp_server

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

2024

CHASE-SQL + Gemini

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

2024

Distillery + GPT-4o

MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation

2024

MSc-SQL

layer6ai-labs/msc-sql

CHESS: Contextual Harnessing for Efficient SQL Synthesis

2024

CHESS

shayantalaei/chess yeounoh/lc_nl2sql

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

2023

MAC-SQL + GPT-4

wbbeyourself/mac-sql

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

2023

DAIL-SQL + GPT-4

beachwang/dail-sql

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

2023

DIN-SQL + GPT-4

mohammadrezapourreza/few-shot-nl2sql-with-prompting

Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM

2024

DELLM + MAC-SQL

Rcrossmeister/Knowledge-to-SQL

Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?

2023

GPT-4 (Baseline)

CurryTang/Graph-LLM trais-lab/llm-structured-data

Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?

2023

Claude-2 (Baseline)

CurryTang/Graph-LLM trais-lab/llm-structured-data

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

2023

ChatGPT (Baseline)

bird-bench/mini_dev

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

2023

CoT + ChatGPT

bird-bench/mini_dev

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

2023

Codex (Baseline)

bird-bench/mini_dev

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

2023

Palm-2 (Baseline)

bird-bench/mini_dev

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (16)

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation

CHESS: Contextual Harnessing for Efficient SQL Synthesis

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM

Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?

Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Model	Paper	Execution Accuracy % (Test)	Date
Human Performance	Can LLM Already Serve as A Database Interface? A …	92.96	2023-05-04
XiYan-SQL	A Preview of XiYan-SQL: A Multi-Generator Ensembl…	73.34	2024-11-13
CHASE-SQL + Gemini	CHASE-SQL: Multi-Path Reasoning and Preference Op…	73.14	2024-10-02
Distillery + GPT-4o	The Death of Schema Linking? Text-to-SQL in the A…	67.21	2024-08-14
MSc-SQL	MSc-SQL: Multi-Sample Critiquing Small Language M…	65.60	2024-10-16
CHESS	CHESS: Contextual Harnessing for Efficient SQL Sy…	65.00	2024-05-27
MAC-SQL + GPT-4	MAC-SQL: A Multi-Agent Collaborative Framework fo…	57.56	2023-12-18
DAIL-SQL + GPT-4	Text-to-SQL Empowered by Large Language Models: A…	54.76	2023-08-29
DIN-SQL + GPT-4	DIN-SQL: Decomposed In-Context Learning of Text-t…	50.72	2023-04-21
DELLM + MAC-SQL	Knowledge-to-SQL: Enhancing SQL Generation with D…	48.92	2024-02-18
GPT-4 (Baseline)	Can LLMs Effectively Leverage Graph Structural In…	46.35	2023-09-28
Claude-2 (Baseline)	Can LLMs Effectively Leverage Graph Structural In…	42.70	2023-09-28
ChatGPT (Baseline)	Can LLM Already Serve as A Database Interface? A …	37.22	2023-05-04
CoT + ChatGPT	Can LLM Already Serve as A Database Interface? A …	36.64	2023-05-04
Codex (Baseline)	Can LLM Already Serve as A Database Interface? A …	34.35	2023-05-04
Palm-2 (Baseline)	Can LLM Already Serve as A Database Interface? A …	27.38	2023-05-04