CoIR

Name: CoIR
Published: 2024-07-03
License: Apache-2.0 license

Code Information Retrieval Benchmark

Dataset Information

Introduced

2024

License

Apache-2.0 license

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.

Variants: , CoIR

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Search - Metrics: nDCG@10

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Search	Voyage-code-002	CoIR: A Comprehensive Benchmark for …	2024-07-03

Research Papers

Recent papers with results on this dataset:

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models (2024) -

External Links:

CoIR

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview