Test-driven benchmark to challenge LLMs to write long JavaScript React application
Variants: WebApp1k-Duo-React
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Code Generation | claude-3-5-sonnet | A Case Study of Web … | 2024-09-19 |
Code Generation | o1-mini | A Case Study of Web … | 2024-09-19 |
Code Generation | o1-preview | A Case Study of Web … | 2024-09-19 |
Code Generation | gpt-4o-2024-08-06 | A Case Study of Web … | 2024-09-19 |
Code Generation | deepseek-v2.5 | A Case Study of Web … | 2024-09-19 |
Code Generation | mistral-large-2 | A Case Study of Web … | 2024-09-19 |
Recent papers with results on this dataset: