Yelp-Fraud

Multi-relational Graph Dataset for Yelp Spam Review Detection

Dataset Information
Modalities
Graphs
Languages
English
Introduced
2020
License
Homepage

Overview

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

  • Dataset Statistics
# Nodes %Fraud Nodes (Class=1)
45,954 14.5
Relation # Edges
R-U-R 49,315
R-T-R 573,616
R-S-R 3,402,743
All 3,846,979
  • Graph Construction

The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Variants: Yelp-Fraud

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Fraud Detection GTAN Semi-supervised Credit Card Fraud Detection … 2024-12-24
Node Classification GTAN Semi-supervised Credit Card Fraud Detection … 2024-12-24
Fraud Detection JA-GNN Financial Fraud Detection using Jump-Attentive … 2024-11-07
Fraud Detection BOLT-GRAPH BOLT: An Automated Deep Learning … 2023-03-30
Node Classification BOLT-GRAPH BOLT: An Automated Deep Learning … 2023-03-30
Node Classification RioGNN Reinforced Neighborhood Selection Guided Multi-Relational … 2021-04-16
Fraud Detection RioGNN Reinforced Neighborhood Selection Guided Multi-Relational … 2021-04-16
Fraud Detection GAT+JK New Benchmarks for Learning on … 2021-04-03
Node Classification GAT+JK New Benchmarks for Learning on … 2021-04-03
Node Classification CARE-GNN Enhancing Graph Neural Network-based Fraud … 2020-08-19
Fraud Detection CARE-GNN Enhancing Graph Neural Network-based Fraud … 2020-08-19

Research Papers

Recent papers with results on this dataset: