EQ-Bench

Dataset Information
Modalities
Ranking
Languages
English
Introduced
2023
License
MIT
Homepage

Overview

This dataset contains benchmark scores for EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://www.eqbench.com.

Variants: EQ-Bench

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Emotional Intelligence OpenAI gpt-4-0613 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence migtissera/SynthIA-70B-v1.5 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence OpenAI gpt-4-0314 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Qwen/Qwen-72B-Chat EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Anthropic Claude2 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence meta-llama/Llama-2-70b-chat-hf EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence 01-ai/Yi-34B-Chat EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence OpenAI gpt-3.5-0613 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence OpenAI gpt-3.5-turbo-0301 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Open-Orca/Mistral-7B-OpenOrca EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Qwen/Qwen-14B-Chat EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence OpenAI text-davinci-003 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Intel/neural-chat-7b-v3-1 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence OpenAI text-davinci-002 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence openchat/openchat 3.5 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence lmsys/vicuna-33b-v1.3 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence meta-llama/Llama-2-13b-chat-hf EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence lmsys/vicuna-13b-v1.1 EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence meta-llama/Llama-2-7b-chat-hf EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11
Emotional Intelligence Koala 13B EQ-Bench: An Emotional Intelligence Benchmark … 2023-12-11

Research Papers

Recent papers with results on this dataset: