Code Information Retrieval Benchmark
CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.
Variants: , CoIR
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Code Search | Voyage-code-002 | CoIR: A Comprehensive Benchmark for … | 2024-07-03 |
Recent papers with results on this dataset: