FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along ā¦
Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. ā¦
The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding ā¦
LRS3-TED is a multi-modal dataset for visual and audio-visual speech recognition. It includes face tracks from over 400 hours of ā¦
A database with 2,000 videos captured by surveillance cameras in real-world scenes. Source: [RWF-2000: An Open Large Scale Video Database ā¦
The Stanford 40 Action Dataset contains images of humans performing 40 actions. In each image, we provide a bounding box ā¦
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with ā¦
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with ā¦
A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round ā¦
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine ā¦
A Ball-Collision Dataset (ABCD) serves as a comprehensive benchmark for investigating the interaction dynamics of moving objects within 3D environments. ā¦
BEHAVE is a full body human-object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along ā¦
The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in ā¦
We introduce a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks, termed Open6DOR.
CALVIN (Composing Actions from Language and Vision), is an open-source simulated benchmark to learn long-horizon language-conditioned robot manipulation tasks.
RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research ā¦
Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as ā¦
Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as ā¦
The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. ā¦
The SheetCopilot dataset contains 28 evaluation workbooks and 221 spreadsheet manipulation tasks that are applied to these workbooks. These tasks ā¦
GraspNet-1Billion provides large-scale training data and a standard evaluation platform for the task of general robotic grasping. The dataset contains ā¦
We release expert-made scribble annotations for the medical ACDC dataset [1]. The released data must be considered as extending the ā¦
The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. ā¦
AI-TOD comes with 700,621 object instances for eight categories across 28,036 aerial images. Compared to existing object detection datasets in ā¦
The AIRS (Aerial Imagery for Roof Segmentation) dataset provides a wide coverage of aerial imagery with 7.5 cm resolution and ā¦
ATLANTIS is a benchmark for semantic segmentation of waterbody images. This dataset covers a wide range of natural waterbodies such ā¦
ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China ā¦
A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048Ć1600 to ā¦
The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that ā¦
The dataset includes annotations for burned area delineation and land cover segmentation, with a focus on European soil. The dataset ā¦
The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to ā¦
The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and ā¦
The dataset contains two subsets of synthetic, semantically segmented road-scene images, which have been created for developing and applying the ā¦
CamVid (Cambridge-driving Labeled Video Database) is a road/driving scene understanding database which was originally captured as five video sequences with ā¦
Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense ā¦
Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. ā¦
The training and validation data are subsets of the training split of the Cityscapes dataset. The test set is taken ā¦
DADA-seg is a pixel-wise annotated accident dataset, which contains a variety of critical scenarios from traffic accidents. It is used ā¦
DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, ā¦
DELIVER is an arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, the dataset is ā¦
The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout ā¦
DSEC is a stereo camera dataset in driving scenarios that contains data from two monochrome event cameras and two global ā¦
Dark Zurich is an image dataset containing a total of 8779 images captured at nighttime, twilight, and daytime, along with ā¦
DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer ā¦
From DroneDeploy: Weāve collected a dataset of aerial orthomosaics and elevation images. These have been annotated into 6 different classes: ā¦
Cholecystectomy is a very common abdominal surgical procedure almost ubiquitously performed with a laparoscopic approach, hence guided by an endoscopic ā¦
The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French ā¦
FMB contains 1500 well-registered infrared and visible image pairs with 14 annotated pixel-level categories. Also, it covers a wide range ā¦
The dataset consists of 96 terrain-corrected (Level-1T) scenes from Landsat 8 OLI and TIRS, covering diverse biomes. This variety supports ā¦
The dataset was created using high-resolution (8 m) satellite imagery from the Gaofen series (Gaofen-2 and Gaofen-6), captured in 2019 ā¦
FoodSeg103 is a new food image dataset containing 7,118 images. Images are annotated with 104 ingredient classes and each image ā¦
This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used ā¦
The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and ā¦
HAM10000 is a dataset of 10000 training images for detecting pigmented skin lesions. The authors collected dermatoscopic images from different ā¦
This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South ā¦
For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. ā¦
The INRIA Aerial Image Labeling dataset is comprised of 360 RGB tiles of 5000Ć5000px with a spatial resolution of 30cm/px ā¦
The data set contains 38 patches (of the same size), each consisting of a true orthophoto (TOP) extracted from a ā¦
The data set contains 33 patches (of different sizes), each consisting of a true orthophoto (TOP) extracted from a larger ā¦
Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two ā¦
KITTI-360 is a large-scale dataset that contains rich sensory information and full annotations. It is the successor of the popular ā¦
Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, ā¦
This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South ā¦
LaRS is the largest and most diverse panoptic maritime obstacle detection dataset. Highlights: * Diverse scenes from manual capture, public ā¦
Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four ā¦
Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four ā¦
MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types ā¦
The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside ā¦
Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and ā¦
MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect ā¦
X-ray images in this data set have been acquired from the tuberculosis control program of the Department of Health andHuman ā¦
Nighttime Driving is a dataset of road scenes consisting of 35,000 images ranging from daytime to twilight time and to ā¦
OpenEDS (Open Eye Dataset) is a large scale data set of eye-images captured using a virtual-reality (VR) head mounted display ā¦
The PASCAL Context dataset is an extension of the PASCAL VOC 2010 detection challenge, and it contains pixel-wise labels for ā¦
The PASCAL Visual Object Classes (VOC) 2012 dataset contains 20 object categories including vehicles, household, animals, and other: aeroplane, bicycle, ā¦
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are: Person: person ā¦
PASCAL VOC 2011 is an image segmentation dataset. It contains around 2,223 images for training, consisting of 5,034 objects. Testing ā¦
PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is ā¦
Extension of the PASTIS benchmark with radar and optical image time series.
PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session ā¦
The increasing incidence of melanoma has recently promoted the development of computer-aided diagnosis systems for the classification of dermoscopic images. ā¦
This dataset for the semantic segmentation of potholes and cracks on the road surface was assembled from 5 other datasets ā¦
https://paperswithcode.com/sota/semantic-segmentation-on-isprs-potsdam
A Video Dataset for Visual Perception and Autonomous Navigation in Unstructured Environments. Website: http://rugd.vision/ The RUGD dataset focuses on semantic ā¦
The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean ā¦
The Stanford 3D Indoor Scene Dataset (S3DIS) dataset contains 6 large-scale indoor areas with 271 rooms. Each point in the ā¦
The SBCoseg dataset includes 889 groups of images and each group consists of 18 images with a common object, leading ā¦
The STARE (Structured Analysis of the Retina) dataset is a dataset for retinal vessel segmentation. It contains 20 equal-sized (700Ć605) ā¦
The SWIMSEG dataset contains 1013 images of sky/cloud patches, along with their corresponding binary segmentation maps. The ground truth annotation ā¦
The SWINSEG dataset contains 115 nighttime images of sky/cloud patches along with their corresponding binary ground truth maps. The ground ā¦
The SWINySEG dataset contains 6768 daytime- and nighttime-images of sky/cloud patches along with their corresponding binary ground truth maps. The ā¦
The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and ā¦
ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled ā¦
Semantic3D is a point cloud dataset of scanned outdoor scenes with over 3 billion points. It contains 15 training and ā¦
The SemanticPOSS dataset for 3D semantic segmentation contains 2988 various and complicated LiDAR scans with large quantity of dynamic instances. ā¦
ShapeNet is a large scale repository for 3D CAD models developed by researchers from Stanford University, Princeton University and the ā¦
SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, ā¦
Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground ā¦
A large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations, ā¦
UAVid is a high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving ā¦
UPLight is an underwater RGB-Polarization multimodal semantic segmentation dataset with 12 typical underwater semantic classes.
Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential seman- tic details to ā¦
WildDash is a benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm ā¦
Research on semantic segmentation of traffic scenes using color and polarization information (including training and testing sets).
iSAID contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The images of iSAID is the same as ā¦
RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance ā¦
RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance ā¦
Room-Across-Room (RxR) is a multilingual dataset for Vision-and-Language Navigation (VLN) for Matterport3D environments. In contrast to related datasets such as ā¦
Touchdown is a corpus for executing navigation instructions and resolving spatial descriptions in visual real-world environments. The task is to ā¦
7,672 human written natural language navigation instructions for routes in OpenStreetMap with a focus on visual landmarks. Validated in Street ā¦
AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and ā¦
R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated ā¦
EuRoC MAV is a visual-inertial datasets collected on-board a Micro Aerial Vehicle (MAV). The dataset contains stereo images, synchronized IMU ā¦