ML Research Wiki / Benchmarks / Text-To-SQL / BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

Text-To-SQL Benchmark

Performance Over Time

📊 Showing 16 results | 📏 Metric: Execution Accuracy % (Test)

Top Performing Models

Rank Model Paper Execution Accuracy % (Test) Date Code
1 Human Performance Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs 92.96 2023-05-04 📦 bird-bench/mini_dev
2 XiYan-SQL A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL 73.34 2024-11-13 📦 XGenerationLab/XiYan-SQL 📦 xgenerationlab/xiyan_mcp_server 📦 XGenerationLab/M-Schema 📦 xgenerationlab/xiyan-dbdescgen 📦 XGenerationLab/XiYan-DateResolver
3 CHASE-SQL + Gemini CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL 73.14 2024-10-02 -
4 Distillery + GPT-4o The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models 67.21 2024-08-14 -
5 MSc-SQL MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation 65.60 2024-10-16 📦 layer6ai-labs/msc-sql
6 CHESS CHESS: Contextual Harnessing for Efficient SQL Synthesis 65.00 2024-05-27 📦 shayantalaei/chess 📦 yeounoh/lc_nl2sql
7 MAC-SQL + GPT-4 MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL 57.56 2023-12-18 📦 wbbeyourself/mac-sql
8 DAIL-SQL + GPT-4 Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation 54.76 2023-08-29 📦 beachwang/dail-sql
9 DIN-SQL + GPT-4 DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction 50.72 2023-04-21 📦 mohammadrezapourreza/few-shot-nl2sql-with-prompting
10 DELLM + MAC-SQL Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM 48.92 2024-02-18 📦 Rcrossmeister/Knowledge-to-SQL

All Papers (16)