CMU CoNaLa, the Code/Natural Language Challenge
The CMU CoNaLa, the Code/Natural Language Challenge dataset is a joint project from the Carnegie Mellon University NeuLab and Strudel labs. Its purpose is for testing the generation of code snippets from natural language. The data comes from StackOverflow questions. There are 2379 training and 500 test examples that were manually annotated. Every example has a natural language intent and its corresponding python snippet. In addition to the manually annotated dataset, there are also 598,237 mined intent-snippet pairs. These examples are similar to the hand-annotated ones except that they contain a probability if the pair is valid.
Source: CoNaLa dataset Homepage
Variants: CoNaLa
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Code Generation | TranX + BERT w/mined | The impact of lexical and … | 2022-02-28 |
Code Generation | BART W/ Mined | Reading StackOverflow Encourages Cheating: Adding … | 2021-06-08 |
Code Generation | BART Base | Reading StackOverflow Encourages Cheating: Adding … | 2021-06-08 |
Code Generation | BERT + TAE | Code Generation from Natural Language … | 2021-01-01 |
Code Generation | External Knowledge With API | Incorporating External Knowledge through Pre-training … | 2020-04-20 |
Code Generation | External Knowledge With API + Reranking | Incorporating External Knowledge through Pre-training … | 2020-04-20 |
Code Generation | TranX | TRANX: A Transition-based Neural Abstract … | 2018-10-05 |
Recent papers with results on this dataset: