Reddit

Name: Reddit
Published: 2017-01-01
License: Unknown

Dataset Information

Modalities

Graphs

Languages

English

Introduced

2017

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.

Source: https://arxiv.org/pdf/1706.02216.pdf
Image Source: https://minimaxir.com/2016/05/reddit-graph/

Variants: The Reddit Ethereum Dataset, The Reddit COVID Dataset, The Reddit Climate Change Dataset, Ten Million Reddit Answers, squadshifts reddit, Six Months of GME on Reddit, Reddit TIFU, Reddit /r/WallStreetBets data for August of 2021, Reddit /r/NoNewNormal dataset, Reddit Norm Violations, Reddit (multi-ref), REDDIT-MULTI-5k, REDDIT-MULTI-12K, Reddit Ideology Database, Reddit Engagement Dataset, Reddit C-SSRS, Reddit cryptocurrency data for August 2021, Reddit Corpus, Reddit Conversation Corpus, REDDIT-BINARY, REDDIT-B, REDDIT-5K, REDDIT-12K, Pushshift Reddit, PolyAI Reddit, One Year of Doge on Reddit, One Million Reddit Questions, One Million Reddit Jokes, One Million Reddit Confessions, Legal Advice Reddit, Five Years of AAPL on Reddit, FigLang 2020 Reddit Dataset, CodeSwitch-Reddit, lmqg/qg_squadshifts, Reddit

Associated Benchmarks

This dataset is used in 1 benchmark:

Node Classification - Metrics: Accuracy, Micro-F1

Recent Benchmark Submissions

Task	Model	Paper	Date
Node Classification	CoFree-GNN	Communication-Free Distributed GNN Training with …	2023-08-06
Node Classification	EnGCN	A Comprehensive Study on Large-Scale …	2022-10-14
Node Classification	BNS-GCN	BNS-GCN: Efficient Full-Graph Training of …	2022-03-21
Node Classification	PCAPass + XGBoost	Dimensionality Reduction Meets Message Passing …	2022-02-01
Node Classification	shaDow-SAGE	Decoupling the Depth and Scope …	2022-01-19
Node Classification	shaDow-GAT	Decoupling the Depth and Scope …	2022-01-19
Node Classification	VQ-GNN (SAGE-Mean)	VQ-GNN: A Universal Framework to …	2021-10-27
Node Classification	TGCL+ResNet	Deeper-GXX: Deepening Arbitrary GNNs	2021-10-26
Node Classification	GRACE	Deep Graph Contrastive Representation Learning	2020-06-07
Node Classification	SIGN	SIGN: Scalable Inception Graph Neural …	2020-04-23
Node Classification	JKNet+DropEdge	DropEdge: Towards Deep Graph Convolutional …	2019-07-25
Node Classification	GraphSAINT	GraphSAINT: Graph Sampling Based Inductive …	2019-07-10
Node Classification	ASGCN	Adaptive Sampling Towards Fast Graph …	2018-09-14
Node Classification	FastGCN	FastGCN: Fast Learning with Graph …	2018-01-30
Node Classification	GraphSAGE	Inductive Representation Learning on Large …	2017-06-07

Research Papers

Recent papers with results on this dataset:

External Links:

Reddit

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview