ML Research Wiki / Benchmarks / answerability prediction / PeerQA

PeerQA

answerability prediction Benchmark

Performance Over Time

📊 Showing 5 results | 📏 Metric: Macro F1

Top Performing Models

Rank Model Paper Macro F1 Date Code
1 Mistral-IT-v02-7B-32k Mistral 7B 0.47 2023-10-10 📦 mistralai/mistral-src 📦 facebookresearch/fairseq2 📦 mgmalek/efficient_cross_entropy
2 GPT-3.5-Turbo-0613-16k Language Models are Few-Shot Learners 0.33 2020-05-28 📦 ggml-org/llama.cpp 📦 ggerganov/llama.cpp 📦 karpathy/llm.c
3 Llama-3-IT-8B-8k The Llama 3 Herd of Models 0.31 2024-07-31 📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd
4 GPT-4o-2024-08-06 GPT-4 Technical Report 0.31 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
5 Llama-3-IT-8B-32k The Llama 3 Herd of Models 0.29 2024-07-31 📦 zhuzilin/ring-flash-attention 📦 wenet-e2e/west 📦 zechenli03/sensorllm 📦 ziye2chen/LLMs-for-Mathematical-Analysis 📦 willemsenbram/mention-detection-vgd

All Papers (5)

Language Models are Few-Shot Learners

2020
GPT-3.5-Turbo-0613-16k