The CodeSearchNet Corpus is a large dataset of functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub. The CodeSearchNet Corpus includes:
* Six million methods overall
* Two million of which have associated documentation (docstrings, JavaDoc, and more)
* Metadata that indicates the original location (repository or line number, for example) where the data was found
Source: https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/
Variants: CodeSearchNet, CodeSearchNet - Python, CodeSearchNet - Java, CodeSearchNet - Go, CodeSearchNet - Php, CodeSearchNet - Ruby, CodeSearchNet - JavaScript
This dataset is used in 3 benchmarks:
Recent papers with results on this dataset: