CoIR

Code Information Retrieval Benchmark

Dataset Information
Introduced
2024
Homepage

Overview

CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.

Variants: , CoIR

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Code Search Voyage-code-002 CoIR: A Comprehensive Benchmark for … 2024-07-03

Research Papers

Recent papers with results on this dataset: