WDC-PAVE

Web Data Commones - Product Attribute Value Extraction

Dataset Information
Modalities
Texts
Languages
English
Introduced
2024
License
Unknown
Homepage

Overview

The datasets contains 1,420 human annotated product offers, systematically selected from the Web Data Commons Product Matching Corpus, featuring 24,582 annotated attribute-value pairs, making it a valuable resource for both product attribute-value extraction and product matching tasks.
The normalized gold standard contains the standardized attribute value pairs as described below.

Variants: WDC-PAVE

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Attribute Value Extraction GPT-4_10_example_values_&_10_demonstrations Using LLMs for the Extraction … 2024-03-04
Attribute Value Extraction GPT-3.5_10_example_values_&_10_demonstrations Using LLMs for the Extraction … 2024-03-04
Attribute Value Extraction AVEQA Using LLMs for the Extraction … 2024-03-04
Attribute Value Extraction MAVEQA Using LLMs for the Extraction … 2024-03-04
Attribute Value Extraction SU-OpenTag Using LLMs for the Extraction … 2024-03-04

Research Papers

Recent papers with results on this dataset: