Spider-Agent + o1-preview
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
17.03
|
2024-11-12
|
|
Spider-Agent + GPT-4o
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
10.13
|
2024-11-12
|
|
Spider-Agent + Claude-3.5-Sonnect
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
9.02
|
2024-11-12
|
|
Spider-Agent + GPT-4
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
8.86
|
2024-11-12
|
|
Spider-Agent + Qwen2.5-72B
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
6.17
|
2024-11-12
|
|
Spider-Agent + DeepSeek-V2.5
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
5.22
|
2024-11-12
|
|
Spider-Agent + Gemini-Pro-1.5
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
2.53
|
2024-11-12
|
|
Spider-Agent + Llama-3.1-405B
|
Spider 2.0: Evaluating Language Models on Real-Wo…
|
2.21
|
2024-11-12
|
|