CoDesc

Dataset Information
Modalities
Texts
Languages
English
Introduced
2021
License
Homepage

Overview

CoDesc is a large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.

Variants: CoDesc

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Source Code Summarization Transformer CoDesc: A Large Code-Description Parallel … 2021-05-29
Code Search Self-attention CoDesc: A Large Code-Description Parallel … 2021-05-29
Code Search NBOW CoDesc: A Large Code-Description Parallel … 2021-05-29
Code Search RNN CoDesc: A Large Code-Description Parallel … 2021-05-29

Research Papers

Recent papers with results on this dataset: