Amazon-Google

Dataset Information
Modalities
Tabular
Languages
English
Introduced
2010
Homepage

Overview

The Amazon-Google dataset for entity resolution derives from the online retailers Amazon.com and the product search service of Google accessible through the Google Base Data API. The dataset contains 1363 entities from amazon.com and 3226 google products as well as a gold standard (perfect mapping) with 1300 matching record pairs between the two data sources. The common attributes between the two data sources are: product name, product description, manufacturer and price.

The dataset was initially published in the repository of the Database Group of the University of Leipzig: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution

To enable the reproducibility of the results and the comparability of the performance of different matchers on the Amazon-Google matching task, the dataset was split into fixed train, validation and test sets. The fixed splits are provided in the CompERBench repository:

http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html

Variants: Amazon-Google

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Entity Resolution gpt-4o-2024-08-06 Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt-4o-mini-2024-07-18 Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt-4o-mini-2024-07-18_fine_tuned Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution Meta-Llama-3.1-70B-Instruct Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution Meta-Llama-3.1-8B-Instruct_fine_tuned Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution Meta-Llama-3.1-8B-Instruct Fine-tuning Large Language Models for … 2024-09-12
Entity Resolution gpt4-0613_fewshot-10 Entity Matching using Large Language … 2023-10-17
Entity Resolution text-davinci-002_fewshot-10 Can Foundation Models Wrangle Your … 2022-05-20
Entity Resolution text-davinci-002_zeroshot Can Foundation Models Wrangle Your … 2022-05-20
Entity Resolution RoBERTa-SupCon Supervised Contrastive Learning for Product … 2022-02-04
Entity Resolution CorDEL-Sum CorDEL: A Contrastive Deep Learning … 2020-09-15
Entity Resolution Ditto Deep Entity Matching with Pre-Trained … 2020-04-01

Research Papers

Recent papers with results on this dataset: