DaReCzech

Dataset for text relevance ranking in Czech

Dataset Information
Modalities
Texts
Languages
Czech
Introduced
2021
License
Unknown
Homepage

Overview

DareCzech

DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs, which makes it one of the largest available datasets for this task.

Obtaining the Annotated Data

Please, first read a disclaimer that contains the terms of use. If you comply with them, send an email to [email protected] and the link to the dataset will be sent to you.

Variants: DaReCzech

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Document Ranking Query-doc RobeCzech (Roberta-base) Siamese BERT-based Model for Web … 2021-12-03
Document Ranking Query-doc Small-E-Czech (Electra-small) Siamese BERT-based Model for Web … 2021-12-03
Document Ranking Siamese Small-E-Czech (Electra-small) Siamese BERT-based Model for Web … 2021-12-03

Research Papers

Recent papers with results on this dataset: