ML Research Wiki / Benchmarks / Question Answering / PeerQA

PeerQA

Question Answering Benchmark

Performance Over Time

📊 Showing 5 results | 📏 Metric: Prometheus-2 Answer Correctness

Top Performing Models

Rank Model Paper Prometheus-2 Answer Correctness Date Code
1 GPT-3.5-Turbo-0613-16k Language Models are Few-Shot Learners 0.24 2020-05-28 📦 ggml-org/llama.cpp 📦 ggerganov/llama.cpp 📦 karpathy/llm.c
2 Llama-3-IT-8B-8k The Llama 3 Herd of Models 0.23 2024-07-31 📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd
3 Llama-3-IT-8B-32k The Llama 3 Herd of Models 0.23 2024-07-31 📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd
4 GPT-4o-2024-08-06-128k GPT-4 Technical Report 0.23 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
5 Mistral-v02-7B-32k Mistral 7B 0.19 2023-10-10 📦 mistralai/mistral-src 📦 facebookresearch/fairseq2 📦 mgmalek/efficient_cross_entropy

All Papers (5)

Language Models are Few-Shot Learners

2020
GPT-3.5-Turbo-0613-16k