ML Research Wiki / Benchmarks / answerability prediction / PeerQA

PeerQA

answerability prediction Benchmark

Performance Over Time

📊 Showing 5 results | 📏 Metric: Macro F1

Click "Edit" next to any result to modify it, or add a new result at the bottom. All changes will be reviewed before going live.

Yellow rows = Pending edits Green rows = Pending new results

Model	Paper	Macro F1	Date
Mistral-IT-v02-7B-32k	Mistral 7B	0.47	2023-10-10
GPT-3.5-Turbo-0613-16k	Language Models are Few-Shot Learners	0.33	2020-05-28
Llama-3-IT-8B-8k	The Llama 3 Herd of Models	0.31	2024-07-31
GPT-4o-2024-08-06	GPT-4 Technical Report	0.31	2023-03-15
Llama-3-IT-8B-32k	The Llama 3 Herd of Models	0.29	2024-07-31

Rank	Model	Paper	Macro F1	Date	Code
1	Mistral-IT-v02-7B-32k	Mistral 7B	0.47	2023-10-10	📦 mistralai/mistral-src 📦 facebookresearch/fairseq2 📦 mgmalek/efficient_cross_entropy
2	GPT-3.5-Turbo-0613-16k	Language Models are Few-Shot Learners	0.33	2020-05-28	📦 ggml-org/llama.cpp 📦 ggerganov/llama.cpp 📦 karpathy/llm.c
3	Llama-3-IT-8B-8k	The Llama 3 Herd of Models	0.31	2024-07-31	📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd
4	GPT-4o-2024-08-06	GPT-4 Technical Report	0.31	2023-03-15	📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
5	Llama-3-IT-8B-32k	The Llama 3 Herd of Models	0.29	2024-07-31	📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd