arXiv

Name: arXiv
License: Unknown

Dataset Information

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full-text PDFs, and more.

We hope to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction, and semantic search interfaces.

Variants: arXiv

Associated Benchmarks

This dataset is used in 1 benchmark:

Text Summarization - Metrics: ROUGE-1, ROUGE-2, ROUGE-L

Recent Benchmark Submissions

Task	Model	Paper	Date
Text Summarization	BigBird-Pegasus	Big Bird: Transformers for Longer …	2020-07-28

Research Papers

Recent papers with results on this dataset:

Big Bird: Transformers for Longer Sequences (2020) -

External Links:

arXiv

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview