Abt-Buy

Dataset Information
Modalities
Tabular
Languages
English
Introduced
2010
Homepage

Overview

The Abt-Buy dataset for entity resolution derives from the online retailers Abt.com and Buy.com. The dataset contains 1081 entities from abt.com and 1092 entities from buy.com as well as a gold standard (perfect mapping) with 1097 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description and product price.

The dataset was initially published in the repository of the Database Group of the University of Leipzig:
https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution

To enable the reproducibility of the results and the comparability of the performance of different matchers on the Abt-Buy matching task, the dataset was split into fixed train, validation and test sets.
The fixed splits are provided in the CompERBench repository:

http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html

Variants: Abt-Buy

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Entity Resolution Meta-Llama-3.1-8B-Instruct_fine_tuned Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution Meta-Llama-3.1-8B-Instruct Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt-4o-mini-2024-07-18_fine_tuned Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt-4o-2024-08-06 Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution Meta-Llama-3.1-70B-Instruct Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt-4o-mini-2024-07-18 Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt4-0613_zeroshot Entity Matching using Large Language … 2023-10-17
Entity Resolution RoBERTa-SupCon Supervised Contrastive Learning for Product … 2022-02-04
Entity Resolution Ditto Deep Entity Matching with Pre-Trained … 2020-04-01

Research Papers

Recent papers with results on this dataset: