CoNaLa

CMU CoNaLa, the Code/Natural Language Challenge

Dataset Information
Modalities
Texts
Languages
English, Chinese
Introduced
2018
License
Unknown
Homepage

Overview

The CMU CoNaLa, the Code/Natural Language Challenge dataset is a joint project from the Carnegie Mellon University NeuLab and Strudel labs. Its purpose is for testing the generation of code snippets from natural language. The data comes from StackOverflow questions. There are 2379 training and 500 test examples that were manually annotated. Every example has a natural language intent and its corresponding python snippet. In addition to the manually annotated dataset, there are also 598,237 mined intent-snippet pairs. These examples are similar to the hand-annotated ones except that they contain a probability if the pair is valid.

Source: CoNaLa dataset Homepage

Variants: CoNaLa

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Code Generation TranX + BERT w/mined The impact of lexical and … 2022-02-28
Code Generation BART W/ Mined Reading StackOverflow Encourages Cheating: Adding … 2021-06-08
Code Generation BART Base Reading StackOverflow Encourages Cheating: Adding … 2021-06-08
Code Generation BERT + TAE Code Generation from Natural Language … 2021-01-01
Code Generation External Knowledge With API Incorporating External Knowledge through Pre-training … 2020-04-20
Code Generation External Knowledge With API + Reranking Incorporating External Knowledge through Pre-training … 2020-04-20
Code Generation TranX TRANX: A Transition-based Neural Abstract … 2018-10-05

Research Papers

Recent papers with results on this dataset: