Multi-relational Graph Dataset for Amazon Fraudulent Account Detection
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.
# Nodes | %Fraud Nodes (Class=1) |
---|---|
11,944 | 9.5 |
Relation | # Edges |
---|---|
U-P-U | 175,608 |
U-S-U | 3,566,479 |
U-V-U | 1,036,737 |
All | 4,398,392 |
The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.
To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
Variants: Amazon-Fraud
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Fraud Detection | GTAN | Semi-supervised Credit Card Fraud Detection … | 2024-12-24 |
Node Classification | GTAN | Semi-supervised Credit Card Fraud Detection … | 2024-12-24 |
Fraud Detection | RioGNN | Reinforced Neighborhood Selection Guided Multi-Relational … | 2021-04-16 |
Node Classification | RioGNN | Reinforced Neighborhood Selection Guided Multi-Relational … | 2021-04-16 |
Fraud Detection | CARE-GNN | Enhancing Graph Neural Network-based Fraud … | 2020-08-19 |
Node Classification | CARE-GNN | Enhancing Graph Neural Network-based Fraud … | 2020-08-19 |
Recent papers with results on this dataset: