FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …
ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment …
AG News (AG’s News Corpus) is a subdataset of AG's corpus of news articles constructed by assembling titles and description …
The BTAD ( beanTech Anomaly Detection) dataset is a real-world industrial anomaly dataset. The dataset contains a total of 2830 …
The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …
COCO-OOC goes beyond standard object detection to ask the question: Which objects are out-of-context (OOC)? Given an image with a …
Avenue Dataset contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 …
Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …
Fishyscapes is a public benchmark for uncertainty estimation in a real-world task of semantic segmentation for urban driving. It evaluates …
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given …
HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total …
An abnormal activity data-set for research use that contains 4,83,566 annotated frames. Source: [Multi-timescale Trajectory Prediction for Abnormal Human Activity …
The Industrial Textile Defect Detection (ITDD) dataset includes 1885 industrial textile images categorized into 4 categories: cotton fabric, dyed fabric, …
InsPLAD is a Dataset for Power Line Asset Inspection containing 10,607 high-resolution Unmanned Aerial Vehicles colour images. It contains 17 …
This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held …
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred …
Includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432). Source: [Attention Based Glaucoma Detection: A …
Lost and Found is a novel lost-cargo image sequence dataset comprising more than two thousand frames with pixelwise annotations of …
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the …
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …
MPDD is a dataset aimed at benchmarking visual defect detection methods in industrial metal parts manufacturing. It consists of more …
MVTec 3D Anomaly Detection Dataset (MVTec 3D-AD) is a comprehensive 3D dataset for the task of unsupervised anomaly detection and …
MVTec Logical Constraints Anomaly Detection (MVTec LOCO AD) dataset is intended for the evaluation of unsupervised anomaly localization algorithms. The …
The Musk dataset describes a set of molecules, and the objective is to detect musks from non-musks. This dataset describes …
Outliers or anomalies are instances that do not conform to the norm of a dataset. Outlier detection is an important …
Multi-pose Anomaly Detection (MAD) dataset, which represents the first attempt to evaluate the performance of pose-agnostic anomaly detection. The MAD …
This dataset contains images of unusual dangers which can be encountered by a vehicle on the road – animals, rocks, …
a dataset of time-series anomaly detection
Street View House Numbers (SVHN) is a digit classification benchmark dataset that contains 600,000 32×32 RGB images of printed digits …
The Shanghaitech dataset is a large-scale crowd counting dataset. It consists of 1198 annotated crowd images. The dataset is divided …
The ShanghaiTech Campus dataset has 13 scenes with complex light conditions and camera angles. It contains 130 abnormal events and …
Street Scene is a dataset for video anomaly detection. Street Scene consists of 46 training and 35 testing high resolution …
The TII-SSRC-23 dataset offers a comprehensive collection of network traffic patterns, meticulously compiled to support the development and research of …
Thyroid is a dataset for detection of thyroid diseases, in which patients diagnosed with hypothyroid or subnormal are anomalies against …
UBnormal is a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. Unlike existing data sets, …
The UCF-Crime dataset is a large-scale dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world …
The UCR Anomaly Archive is a collection of 250 uni-variate time series collected in human medicine, biology, meteorology and industry. …
The UCSD Anomaly Detection Dataset was acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways. The crowd …
Five datasets used in NeurTraL-AD paper: \textit{RacketSports (RS).} Accelerometer and gyroscope recording of players playing four different racket sports. Each …
The code to create the dataset is available here. The dataset used in the paper is available on github - …
The VisA dataset contains 12 subsets corresponding to 12 different objects as shown in the above figure. There are 10,821 …
WFDD is a dataset for benchmarking anomaly detection methods with a focus on textile inspection. It includes 4101 woven fabric …
voraus-AD contains machine data of a collaborative robot, which moves a can by performing an industrial pick-and-place task. The samples …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with …
A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round …
Electrophysiological data from implanted electrodes in the human brain are rare, and therefore scientific access to it has remained somewhat …
The Infant Health and Development Program (IHDP) is a randomized controlled study designed to evaluate the effect of home visit …
The Jobs dataset by LaLonde [36] is a widely used benchmark in the causal inference community, where the treatment is …
Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records …
In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-1M Insect …
The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to …
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …
This dataset is a combination of the following three datasets : figshare, SARTAJ dataset and Br35H This dataset contains 7022 …
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …
Common corruptions dataset for CIFAR10
Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it …
Data was collected for normal bearings, single-point drive end and fan end defects. Data was collected at 12,000 samples/second and …
The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia …
We construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across …
A public data set of walking full-body kinematics and kinetics in individuals with Parkinson’s disease
HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile …
The HRF dataset is a dataset for retinal vessel segmentation which comprises 45 images and is organized as 15 subsets. …
The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel …
The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across …
This dataset was presented as part of the ICLR 2023 paper 𝘈 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘧𝘰𝘳 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘪𝘯𝘨 𝘊𝘭𝘢𝘴𝘴-𝘰𝘶𝘵-𝘰𝘧-𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 𝘥𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘪𝘵𝘴 𝘢𝘱𝘱𝘭𝘪𝘤𝘢𝘵𝘪𝘰𝘯 …
Dataset Introduction In this work, we introduce the In-Diagram Logic (InDL) dataset, an innovative resource crafted to rigorously evaluate the …
This data set comprises 22 fundus images with their corresponding manual annotations for the blood vessels, separated as arteries and …
The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. …
The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each …
The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are …
MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …
Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. …
A large real-world event-based dataset for object classification. Source: HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification
The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera …
The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries …
he RSSCN7 dataset contains satellite images acquired from Google Earth, which is originally collected for remote sensing scene classification. We …
The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and …
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. …
This dataset is based on the Spiking Heidelberg Digits (SHD) dataset. Sample inputs consist of two spike encoded digits sampled …
The SPOTS-10 dataset is an extensive collection of grayscale images showcasing diverse patterns found in ten animal species. Specifically, SPOTS-10 …
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the …
Sentiment140 is a dataset that allows you to discover the sentiment of a brand, product, or topic on Twitter. Source: …
This dataset consists of computer-generated images for gas leakage segmentation. It features diverse backgrounds, interfering foreground objects, and precise ground …
arxiv : https://arxiv.org/abs/2304.11708 Accepted at 29th International Congress on Sound and Vibration (ICSV29). The drone has been used for various …
Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, …
Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and …
Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, …
The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a …
WiC is a benchmark for the evaluation of context-sensitive word embeddings. WiC is framed as a binary classification task. Each …
Enlarge the dataset to understand how image background effect the Computer Vision ML model. With the following topics: Blur Background …
Criteo contains 7 days of click-through data, which is widely used for CTR prediction benchmarking. There are 26 anonymous categorical …
A clickthrough prediction dataset, for more information please see the Kaggle page
The task is to predict the chances of a user listening to a song repetitively after the first observable listening …
The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, …
The iPinYou Global RTB(Real-Time Bidding) Bidding Algorithm Competition is organized by iPinYou from April 1st, 2013 to December 31st, 2013.The …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and …
The Yelp2018 dataset is adopted from the 2018 edition of the yelp challenge. Wherein local businesses like restaurants and bars …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …
USPS is a digit dataset automatically scanned from envelopes by the U.S. Postal Service containing a total of 9,298 16×16 …
The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness. CIFAKE is a dataset that …
The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos. The DFDC dataset …
FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used …
FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation …
FakeAVCeleb is a novel Audio-Video Deepfake dataset that not only contains deepfake videos but respective synthesized cloned audios as well. …
Localized Audio Visual DeepFake Dataset (LAV-DF). Paper: Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The EMOTIC dataset, named after EMOTions In Context, is a database of images with people in real environments, annotated with …
1000 songs has been selected from Free Music Archive (FMA). The excerpts which were annotated are available in the same …
Fer2013 contains approximately 30,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of …
The MSP-Podcast corpus contains speech segments from podcast recordings which are perceptually annotated using crowdsourcing. The collection of this corpus …
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains …
The SEED dataset contains subjects' EEG signals when they were watching films clips. The film clips are carefully selected so …
Ethics (per ethics) dataset is created to test the knowledge of the basic concepts of morality. The task is to …
AVeriTeC (Automated Verification of Textual Claims) is a dataset of 4568 real-world claims covering fact-checks by 50 different organizations. Each …
A new face annotation dataset with balanced distribution between genders and ethnic origins. Source: [SensitiveNets: Learning Agnostic Representations with Application …
MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 …
The UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116 years old). The …
This dataset provides simulated flood inundation maps of Abu Dhabi's coast under 174 different shoreline protection scenarios. The maps were …
A realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in …
JARVIS-DFT is a repository of density functional theory based calculation data for materials.
The Materials Project is a collection of chemical compounds labelled with different attributes. The labelling is performed by different simulations, …
This is a large-scale dataset of quantum-mechanically calculated properties (DFT level) of crystalline materials for graph representation learning that contains …
QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node …
Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred …
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
ARKitScenes is an RGB-D dataset captured with the widely available Apple LiDAR scanner. Along with the per-frame raw data (Wide …
A binarized version of MNIST. Source: Binarized MNIST
The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …
CLEVR (Compositional Language and Elementary Visual Reasoning) is a synthetic Visual Question Answering dataset. It contains images of 3D-rendered objects; …
CelebFaces Attributes dataset contains 202,599 face images of the size 178×218 from 10,177 celebrities, each annotated with 40 binary labels …
The CelebA-HQ dataset is a high-quality version of CelebA that consists of 30,000 images at 1024×1024 resolution. Source: [IntroVAE: Introspective …
Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense …
Flickr-Faces-HQ (FFHQ) consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity …
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …
The Large-scale Scene Understanding (LSUN) challenge aims to provide a different benchmark for large-scale scene classification and understanding. The LSUN …
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …
MetFaces is an image dataset of human faces extracted from works of art. The dataset consists of 1336 high-quality PNG …
Samples from NASA Perseverance and set of GAN generated synthetic images from Neural Mars.
The ObjectsRoom dataset is based on the MuJoCo environment used by the Generative Query Network [4] and is a multi-object …
RC-49 is a benchmark dataset for generating images conditional on a continuous scalar variable. It is made by rendering 49 …
The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean …
This is a dataset of 306,006 galaxies whose coordinates are taken from the Sloan Digital Sky Survey Data Release 7 …
The STL-10 is an image dataset derived from ImageNet and popularly used to evaluate algorithms of unsupervised feature learning or …
A simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary geometric primitives richly annotated regarding semantics and …
The Stacked MNIST dataset is derived from the standard MNIST dataset with an increased number of discrete modes. 240,000 RGB …
The Stanford Cars dataset consists of 196 classes of cars with a total of 16,185 images, taken from the rear. …
The Stanford Dogs dataset contains 20,580 images of 120 classes of dogs from around the world, which are divided into …
A Dense-text Image Benchmark to evaluate large generation model's ability on text generation.
Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained …
ViZDoom is an AI research platform based on the classical First Person Shooter game Doom. The most popular game mode …
WISE, the first benchmark specifically designed for World Knowledge-Informed Semantic Evaluation. WISE moves beyond simple word-pixel mapping by challenging models …
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …
Ultra-high definition benchmark (UHDBench) includes 2293 images at 2k resolution sourced from the ground-truth test sets of HRSOD, LIU4k, UAVid, …
The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
SyntaxGym, adapted for interventional interpretability.
CelebFaces Attributes dataset contains 202,599 face images of the size 178×218 from 10,177 celebrities, each annotated with 40 binary labels …
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
A large-scale hierarchical dataset of diverse student activities collected by Santa, a multi-platform self-study solution equipped with artificial intelligence tutoring …
2000 HUB5 English Evaluation Transcripts was developed by the Linguistic Data Consortium (LDC) and consists of transcripts of 40 English …
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a …
The Books3 dataset emerged as part of a broader effort to train AI models for natural language understanding and generation. …
C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It was based on Common Crawl dataset: https://commoncrawl.org. …
The Curation Corpus is a collection of 40,000 professionally-written summaries of news articles, with links to the articles themselves. Source: …
Free Law Project is a leading nonprofit organization that aims to make the legal ecosystem more equitable and competitive through …
The Hutter Prize Wikipedia dataset, also known as enwiki8, is a byte-level dataset consisting of the first 100 million bytes …
The LAMBADA (LAnguage Modeling Broadened to Account for Discourse Aspects) benchmark is an open-ended cloze task which consists of about …
OpenWebText is an open-source recreation of the WebText corpus. The text is web content extracted from URLs shared on Reddit …
PhilPapers is a remarkable resource for the philosophical community. Let me break it down for you: 1. PhilPapers: It's an …
A collection of 385,705 scientific abstracts about Cognitive Control and their GPT-3 embeddings.
The SALMon dataset and benchmark was introduced in the paper "A Suite for Acoustic Language Model Evaluation", with the goal …
The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets …
We introduced a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled …
A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round …
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good …
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good …
This is the Big-Bench version of our language-based movie recommendation dataset https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/movie_recommendation GPT-2 has a 48.8% accuracy, chance is 25%.
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
This dataset is a benchmark for complex reasoning abilities in large language models, drawing on United Kingdom Linguistics Olympiad problems …
RuWorldTree is a QA dataset with multiple-choice elementary-level science questions, which evaluate the understanding of core science facts. Motivation The …
The Winograd schema challenge composes tasks with syntactic ambiguity, which can be resolved with logic and reasoning. Motivation The dataset …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
The Microsoft Malware Classification Challenge was announced in 2015 along with a publication of a huge dataset of nearly 0.5 …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the …
The QNLI (Question-answering NLI) dataset is a Natural Language Inference dataset automatically derived from the Stanford Question Answering Dataset v1.1 …
The Maximum Unbiased Validation (MUV) dataset is a benchmark dataset selected from PubChem BioAssay. It was created by applying a …
MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and …
PCBA dataset 11 is a collection of high-quality dose-response data, formulated as a multitask learning benchmark from 128 high-throughput screening …
QM7 dataset is a subset of the GDB-13 database. GDB-13 contains nearly 1 billion stable and synthetically accessible organic molecules. …
QM8 dataset is a collection of molecular data used for studying quantum mechanical calculations of electronic spectra and excited state …
QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …
SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and …
The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense …
The ClinTox dataset compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. The …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra: - 💥 **De novo …
Audioset is an audio event dataset, which consists of over 2M human-annotated 10-second video clips. These clips are collected from …
Consists of more than 210k videos for 310 audio classes. Source: VGGSound: A Large-scale Audio-Visual Dataset
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
D4RL is a collection of environments for offline reinforcement learning. These environments include Maze2D, AntMaze, Adroit, Gym, Flow, FrankKitchen and …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
4D-DRESS is the first real-world 4D dataset of human clothing, capturing 64 human outfits in more than 520 motion sequences. …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
AviationQA is introduced in the paper titled- There is No Big Brother or Small Brother: Knowledge Infusion in Language Models …
BIG-Bench Hard (BBH) is a subset of the BIG-Bench, a diverse evaluation suite for language models. BBH focuses on a …
BLURB is a collection of resources for biomedical natural language processing. In general domains such as newswire and the Web, …
The Bamboogle dataset is a collection of questions that was constructed to investigate the ability of language models to perform …
BioASQ is a question answering dataset. Instances in the BioASQ dataset are composed of a question (Q), human-annotated answers (A), …
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are …
The COmmonsense Dataset Adversarially-authored by Humans (CODAH) is an evaluation set for commonsense question-answering in the sentence completion style of …
The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing progress in open-domain commonsense causal reasoning. …
CaseHOLD (Case Holdings On Legal Decisions) is a law dataset comprised of over 53,000+ multiple choice questions to identify the …
The dataset covers Hindi and Tamil, collected without the use of translation. It provides a realistic information-seeking task with questions …
CheGeKa is a Jeopardy!-like Russian QA dataset collected from the official Russian quiz database ChGK. Motivation The task can be …
Click to add a brief description of the dataset (Markdown and LaTeX enabled). Provide: * a high-level explanation of the …
CliCR is a new dataset for domain specific reading comprehension used to construct around 100,000 cloze queries from clinical case …
CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure …
A filtered version of CronQuestions and which can better demonstrate the model’s inference ability for complex temporal questions.
ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set …
ConditionalQA is a Question Answering (QA) dataset that contains complex questions with conditional answers, i.e. the answers are only applicable …
ConvFinQA is a dataset designed to study the chain of numerical reasoning in conversational question answering. The dataset contains 3892 …
CRONQUESTIONS, the Temporal KGQA dataset consists of two parts: a KG with temporal annotations, and a set of natural language …
Discrete Reasoning Over Paragraphs DROP is a crowdsourced, adversarially-created, 96k-question benchmark, in which a system must resolve references in a …
DaNetQA is a question answering dataset for yes/no questions. These questions are naturally occurring ---they are generated in unprompted and …
DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in …
EgoTask QA benchmark contains 40K balanced question-answer pairs selected from 368K programmatically generated questions generated over 2K egocentric videos. It …
FEVER is a publicly available dataset for fact extraction and verification against textual sources. It consists of 185,445 claims manually …
A French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ …
FairytaleQA is a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Annotated by educational experts based on an …
FinQA is a new large-scale dataset with Question-Answering pairs over Financial reports, written by financial experts. The dataset contains 8,281 …
GraphQuestions is a characteristic-rich dataset designed for factoid question answering. The dataset aims to provide a systematic way of constructing …
HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are …
HotpotQA is a question answering dataset collected on the English Wikipedia, containing about 113K crowd-sourced questions that are constructed to …
A new large-scale question-answering dataset that requires reasoning on heterogeneous information. Each question is aligned with a Wikipedia table and …
JaQuAD (Japanese Question Answering Dataset) is a question answering dataset in Japanese that consists of 39,696 extractive question-answer pairs on …
A large-scale dataset for Complex KBQA. Source: [KQA Pro: A Large-Scale Dataset with Interpretable Programs and Accurate SPARQLs for Complex …
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively …
The MRQA (Machine Reading for Question Answering) dataset is a dataset for evaluating the generalization capabilities of reading comprehension systems. …
The MS MARCO (Microsoft MAchine Reading Comprehension) is a collection of datasets focused on deep learning in search. The first …
MapEval-Textual contains 300 question-answer pairs. The task is to answer question by fetching necessary informations using external Map APIs.
MapEval-Textual contains 300 context-question-answer triplets. The necessary geo-spatial information is provided in the context. The task is to answer question …
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This …
Multiple choice question answering based on the United States Medical License Exams (USMLE). The dataset is collected from the professional …
The MetaQA dataset consists of a movie ontology derived from the WikiMovies Dataset and three sets of question-answer pairs written …
A machine reading comprehension (MRC) dataset with discourse structure built over multiparty dialog. Molweni's source samples from the Ubuntu Chat …
MultiQ is a multi-hop QA dataset for Russian, suitable for general open-domain question answering, information retrieval, and reading comprehension tasks. …
MultiRC (Multi-Sentence Reading Comprehension) is a dataset of short paragraphs and multi-sentence questions, i.e., questions that can be answered by …
MULTITQ is a large-scale dataset featuring ample relevant facts and multiple temporal granularities.
NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal …
The NarrativeQA dataset includes a list of documents with Wikipedia summaries, links to full stories, and questions and answers. Source: …
The Natural Questions corpus is a question answering dataset containing 307,373 training examples, 7,830 development examples, and 7,842 test examples. …
The NewsQA dataset is a crowd-sourced machine reading comprehension dataset of 120,000 question-answer pairs. * Documents are CNN news articles. …
The Open Table-and-Text Question Answering (OTT-QA) dataset contains open questions which require retrieving tables and text from the web to …
OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject. …
PIQA is a dataset for commonsense reasoning, and was created to investigate the physical knowledge of existing models in NLP. …
We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which …
PopQA is an open-domain QA dataset with 14k QA pairs with fine-grained Wikidata entity ID, Wikipedia page views, and relationship …
PubChemQA consists of molecules and their corresponding textual descriptions from PubChem. It contains a single type of question, i.e., please …
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary …
QASPER is a dataset for question answering on scientific research papers. It consists of 5,049 questions over 1,585 Natural Language …
Question Answering in Context is a large-scale dataset that consists of around 14K crowdsourced Question Answering dialogs with 98K question-answer …
QuALITY (Question Answering with Long Input Texts, Yes!) is a multiple-choice question answering dataset for long document comprehension. The dataset …
Quora Question Pairs (QQP) dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary …
The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 …
Logical reasoning is an important ability to examine, analyze, and critically evaluate arguments as they occur in ordinary language as …
RecipeQA is a dataset for multimodal comprehension of cooking recipes. It consists of over 36K question-answer pairs automatically generated from …
RuOpenBookQA is a QA dataset with multiple-choice elementary-level science questions which probe the understanding of core science facts. Motivation RuOpenBookQA …
SCDE is a human-created sentence cloze dataset, collected from public school English examinations in China. The task requires a model …
Social Interaction QA (SIQA) is a question-answering benchmark for testing social commonsense intelligence. Contrary to many prior benchmarks that focus …
SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an …
The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct …
Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate …
A large scale analogue of Stanford SQuAD in the Russian language - is a valuable resource that has not been …
The “Mental Health” forum was used, a forum dedicated to people suffering from schizophrenia and different mental disorders. Relevant posts …
SimpleQuestions is a large-scale factoid question answering dataset. It consists of 108,442 natural language questions, each paired with a corresponding …
A Benchmark for Robust Multi-Hop Spatial Reasoning in Texts
Representation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. …
StrategyQA is a question answering benchmark where the required reasoning steps are implicit in the question, and should be inferred …
TAT-QA (Tabular And Textual dataset for Question Answering) is a large-scale QA dataset, aiming to stimulate progress of QA research …
Existing benchmarks for temporal QA focus on a single information source (either a KB or a text corpus), and include …
TempQA-WD is a benchmark dataset for temporal reasoning designed to encourage research in extending the present approaches to target a …
Here, we take a key step in this direction and release a new benchmark, TempQuestions, containing 1,271 questions, that are …
Question answering over knowledge graphs (KG-QA) is a vital topic in IR. Questions with temporal intent are a special class …
Torque is an English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships. Source: …
Text Retrieval Conference Question Answering (TrecQA) is a dataset created from the TREC-8 (1999) to TREC-13 (2004) Question Answering tracks. …
TriviaQA is a realistic text-based question answering dataset which includes 950K question-answer pairs from 662K documents collected from Wikipedia and …
TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises …
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering …
UniProtQA consists of proteins and textual queries about their functions and properties. The dataset is constructed from UniProt, and consists …
The WebQuestions dataset is a question answering dataset using Freebase as the knowledge base and contains 6,642 question-answer pairs. It …
The WebQuestionsSP dataset is released as part of our ACL-2016 paper “The Value of Semantic Parse Labeling for Knowledge Base …
WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K …
WikiHop is a multi-hop question-answering dataset. The query of WikiHop is constructed with entities and relations from WikiData, while supporting …
The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain …
WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further …
WikiTableQuestions is a question answering dataset over semi-structured tables. It is comprised of question-answer pairs on HTML tables, and was …
We aim to improve the bAbI benchmark as a means of developing intelligent dialogue agents. To this end, we propose …
We aim to improve the bAbI benchmark as a means of developing intelligent dialogue agents. To this end, we propose …
This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links …
This datasets is a subset of the Amazon reviews dataset which contain Fashion related products
This datasets is a subset of the Amazon reviews dataset which contain Men related products
This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This …
The Ciao dataset contains rating information of users given to items, and also contain item category information. The data comes …
Delicious : This data set contains tagged web pages retrieved from the website delicious.com. Source: [Text segmentation on multilabel documents: …
We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based …
The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of …
Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and …
The Pinterest dataset contains more than 1 million images associated to Pinterest users’ who have “pinned” them. Source: https://openaccess.thecvf.com/content_iccv_2015/papers/Geng_Learning_Image_and_ICCV_2015_paper.pdf
This dataset contains 21,889 outfits from polyvore.com, in which 17,316 are for training, 1,497 for validation and 3,076 for testing. …
ReDial (Recommendation Dialogues) is an annotated dataset of dialogues, where users recommend movies to each other. The dataset consists of …
The WeChat dataset for fake news detection contains more than 20k news labelled as fake news or not.
The Yelp Dataset is a valuable resource for academic research, teaching, and learning. It provides a rich collection of real-world …
The Yelp2018 dataset is adopted from the 2018 edition of the yelp challenge. Wherein local businesses like restaurants and bars …
GraspNet-1Billion provides large-scale training data and a standard evaluation platform for the task of general robotic grasping. The dataset contains …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators. Source: [Guiding …
FlickrStyle10K is collected and built on Flickr30K image caption dataset. The original FlickrStyle10K dataset has 10,000 pairs of images and …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
AG News (AG’s News Corpus) is a subdataset of AG's corpus of news articles constructed by assembling titles and description …
The CIFAR-10 database (Canadian Institute For Advanced Research database) is a large collection of natural color images. It has a …
The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists …
The Corpus of Linguistic Acceptability (CoLA) consists of 10657 sentences from 23 linguistics publications, expertly annotated for acceptability (grammaticality) by …
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has …
UNSW-NB15 is a network intrusion dataset. It contains nine different attacks, includes DoS, worms, Backdoors, and Fuzzers. The dataset contains …
Table is a compact and efficient form for summarizing and presenting correlative information in handwritten and printed archival documents, scientific …
STDW is a diverse large-scale dataset for table detection with more than seven thousand samples containing a wide variety of …
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, …
Median house prices for California districts derived from the 1990 census. About Dataset Context This is the dataset used in …
What do the instances in this dataset represent? The instances represent hospitalized patient records diagnosed with diabetes. **Are there recommended …
The Sentences Involving Compositional Knowledge (SICK) dataset is a dataset for compositional distributional semantics. It includes a large number of …
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. …
DeepFashion is a dataset containing around 800K diverse fashion images with their rich annotations (46 categories, 1,000 descriptive attributes, bounding …
Office-Home is a benchmark dataset for domain adaptation which contains 4 domains where each domain consists of 65 categories. The …
According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be …
MGTAB is the first standardized graph-based benchmark for stance and bot detection. MGTAB contains 10,199 expert-annotated users and 7 types …
The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
MP20 (Xie et al., 2022) contains 45,231 metastable crystal structures from the Materials Project (Jain et al., 2013), each with …
GEOM-DRUGS is a dataset of 430,000 large organic molecules of up to 180 atoms from [Axelrod and Gómez-Bombarelli, Nature Scientific …
QM9 provides quantum chemical properties (at DFT level) for a relevant, consistent, and comprehensive chemical space of small organic molecules. …
AnoShift is a large-scale anomaly detection benchmark, which focuses on splitting the test data based on its temporal distance to …
The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that …
This is a synthetic dataset for defect detection on textured surfaces. It was originally created for a competition at the …
Fashion-MNIST is a dataset comprising of 28×28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per …
The dataset is constructed from images of defective production items that were provided and annotated by Kolektor Group d.o.o.. The …
KolektorSDD2 is a surface-defect detection dataset with over 3000 images containing several types of defects, obtained while addressing a real-world …
The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including …
The Reuters-21578 dataset is a collection of documents with news articles. The original corpus has 10,369 documents and a vocabulary …
Soil Moisture Active Passive (SMAP) dataset is a dataset of soil samples and telemetry information using the Mars rover by …
TIMo (Time-of-Flight Indoor Monitoring) is a dataset of infrared and depth videos intended for the use in Anomaly Detection and …
The code to create the dataset is available here. The dataset used in the paper is available on github - …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
Datasets are listed in the repository's readme file. This one is extra and yields 20K+ items after filtering with a …
The dataset consists of two versions: $X_1$ with $P_3$ and $X_1$ without $P_3$, where $P_3$ represents a set of random …
This dataset contains meteorological observations (temperature) at the land-based weather stations located in the United States, collected from the Online …
SEVIR is an annotated, curated and spatio-temporally aligned dataset containing over 10,000 weather events that each consist of 384 km …
The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has …
The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their …
Median house prices for California districts derived from the 1990 census. About Dataset Context This is the dataset used in …
In this dataset we added [Company Name, Car Model, Car Type, Fuel Type, Transmission, Engine (cc), Mileage, Kms_driven, Buyers, Horsepower …
Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age …
This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. …