Electronics

Dataset Information
Modalities
Graphs
Languages
English
Introduced
2022
License

Overview

This data was collected by performing a breadth-first search on the user-product-review graph until termination, meaning that it is a fairly comprehensive collection of English-language product data. We split the full dataset into top-level categories, e.g. Books, Movies, Music. We do this mainly for practical reasons, as it allows each model and dataset to fit in memory on a single machine (requiring around 64GB RAM and 2-3 days to run our largest experiment). Note that splitting the data in this way has little impact on performance, as there are few links that cross top-level categories, and the hierarchical nature of our model means that few parameters are shared across categories.

To obtain ground-truth for pairs of substitutable and complementary products we also crawl graphs of four types from Amazon:

  1. 'Users who viewed x also viewed y'; 91M edges.

  2. 'Users who viewed x eventually bought y'; 8.18M edges.

  3. 'Users who bought x also bought y'; 133M edges.

  4. 'Users frequently bought x and y together'; 4.6M edges.

We refer to edges of type 1 and 2 as substitutes and edges of type 3 or 4 as complements, though we focus on 'also viewed' and 'also bought' links in our experiments, since these form the vast majority of the dataset. Note the minor differences between certain edge types, e.g. edges of type 4 indicate that two items were purchased as part of a single basket, rather than across sessions.

Variants: Electronics

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.