Amazon-Fraud

Name: Amazon-Fraud
Published: 2020-08-19
License: Apache-2.0

Multi-relational Graph Dataset for Amazon Fraudulent Account Detection

Dataset Information

Modalities

Graphs

Languages

English

Introduced

2020

License

Apache-2.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes	%Fraud Nodes (Class=1)
11,944	9.5

Relation	# Edges
U-P-U	175,608
U-S-U	3,566,479
U-V-U	1,036,737
All	4,398,392

Graph Construction

The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Variants: Amazon-Fraud

Associated Benchmarks

This dataset is used in 2 benchmarks:

Fraud Detection - Metrics: AUC-ROC, Averaged Precision, F1 Macro, G-mean
Node Classification - Metrics: AUC-ROC

Recent Benchmark Submissions

Task	Model	Paper	Date
Fraud Detection	GTAN	Semi-supervised Credit Card Fraud Detection …	2024-12-24
Node Classification	GTAN	Semi-supervised Credit Card Fraud Detection …	2024-12-24
Fraud Detection	RioGNN	Reinforced Neighborhood Selection Guided Multi-Relational …	2021-04-16
Node Classification	RioGNN	Reinforced Neighborhood Selection Guided Multi-Relational …	2021-04-16
Fraud Detection	CARE-GNN	Enhancing Graph Neural Network-based Fraud …	2020-08-19
Node Classification	CARE-GNN	Enhancing Graph Neural Network-based Fraud …	2020-08-19

Research Papers

Recent papers with results on this dataset:

External Links:

Amazon-Fraud

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview