Test-driven benchmark to challenge LLMs to write JavaScript React application
Variants: WebApp1K-React
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Code Generation | o1-preview | A Case Study of Web … | 2024-09-19 |
Code Generation | o1-mini | A Case Study of Web … | 2024-09-19 |
Code Generation | deepseek-v2.5 | A Case Study of Web … | 2024-09-19 |
Code Generation | claude-3.5-sonnet | Insights from Benchmarking Frontier Language … | 2024-09-08 |
Code Generation | llama-v3p1-405b-instruct | Insights from Benchmarking Frontier Language … | 2024-09-08 |
Code Generation | mistral-large-2 | Insights from Benchmarking Frontier Language … | 2024-09-08 |
Code Generation | deepseek-coder-v2-instruct | Insights from Benchmarking Frontier Language … | 2024-09-08 |
Code Generation | gpt-4o-2024-08-06 | Insights from Benchmarking Frontier Language … | 2024-09-08 |
Recent papers with results on this dataset: