Yelp-Fraud

Name: Yelp-Fraud
Published: 2020-08-19
License: Apache-2.0

Multi-relational Graph Dataset for Yelp Spam Review Detection

Dataset Information

Modalities

Graphs

Languages

English

Introduced

2020

License

Apache-2.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes	%Fraud Nodes (Class=1)
45,954	14.5

Relation	# Edges
R-U-R	49,315
R-T-R	573,616
R-S-R	3,402,743
All	3,846,979

Graph Construction

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Variants: Yelp-Fraud

Associated Benchmarks

This dataset is used in 2 benchmarks:

Fraud Detection - Metrics: AUC-ROC, Averaged Precision, F1 Macro, G-mean
Node Classification - Metrics: AUC-ROC

Recent Benchmark Submissions

Task	Model	Paper	Date
Fraud Detection	GTAN	Semi-supervised Credit Card Fraud Detection …	2024-12-24
Node Classification	GTAN	Semi-supervised Credit Card Fraud Detection …	2024-12-24
Fraud Detection	JA-GNN	Financial Fraud Detection using Jump-Attentive …	2024-11-07
Fraud Detection	BOLT-GRAPH	BOLT: An Automated Deep Learning …	2023-03-30
Node Classification	BOLT-GRAPH	BOLT: An Automated Deep Learning …	2023-03-30
Node Classification	RioGNN	Reinforced Neighborhood Selection Guided Multi-Relational …	2021-04-16
Fraud Detection	RioGNN	Reinforced Neighborhood Selection Guided Multi-Relational …	2021-04-16
Fraud Detection	GAT+JK	New Benchmarks for Learning on …	2021-04-03
Node Classification	GAT+JK	New Benchmarks for Learning on …	2021-04-03
Node Classification	CARE-GNN	Enhancing Graph Neural Network-based Fraud …	2020-08-19
Fraud Detection	CARE-GNN	Enhancing Graph Neural Network-based Fraud …	2020-08-19

Research Papers

Recent papers with results on this dataset:

External Links:

Yelp-Fraud

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview