Machine Learning Benchmarks

Browse 119 benchmarks across 22 tasks
← ML Research Wiki / Benchmarks / Robots
Clear
Browse by Category

1 Image, 2*2 Stitchi

FQL-Driving

FQL-driving

šŸ“Š 1 results
šŸ“ Metrics: 0..5sec

10-shot image generation

FQL-Driving

FQL-driving

šŸ“Š 1 results
šŸ“ Metrics: 0-shot MRR

FlyingThings3D

FlyingThings3D is a synthetic dataset for optical flow, disparity and scene flow estimation. It consists of everyday objects flying along …

šŸ“Š 1 results
šŸ“ Metrics: 0..5sec

MEAD

Multi-view Emotional Audio-visual Dataset

šŸ“Š 1 results
šŸ“ Metrics: 12k

Music21

Music21 is an untrimmed video dataset crawled by keyword query from Youtube. It contains music performances belonging to 21 categories. …

šŸ“Š 1 results
šŸ“ Metrics: 0..5sec

3D Absolute Human Pose Estimation

Human3.6M

The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding …

šŸ“Š 4 results
šŸ“ Metrics: MRPE, Average MPJPE (mm), PA-MPJPE

Active Speaker Detection

LRS3-TED

LRS3-TED is a multi-modal dataset for visual and audio-visual speech recognition. It includes face tracks from over 400 hours of …

šŸ“Š 1 results
šŸ“ Metrics: Accuracy

Activity Recognition

RWF-2000

A database with 2,000 videos captured by surveillance cameras in real-world scenes. Source: [RWF-2000: An Open Large Scale Video Database …

šŸ“Š 4 results
šŸ“ Metrics: Accuracy

Stanford40

The Stanford 40 Action Dataset contains images of humans performing 40 actions. In each image, we provide a bounding box …

šŸ“Š 2 results
šŸ“ Metrics: Top-3 Accuracy (%)

Autonomous Driving

ApolloCar3D

ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with …

šŸ“Š 1 results
šŸ“ Metrics: A3DP

Autonomous Vehicles

ApolloCar3D

ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with …

šŸ“Š 1 results
šŸ“ Metrics: A3DP

Benchmarking

Wiki-40B

A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round …

šŸ“Š 1 results
šŸ“ Metrics: Perplexity

CARLA longest6

CARLA

CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine …

šŸ“Š 19 results
šŸ“ Metrics: Driving Score, Route Completion, Infraction Score

Collision Avoidance

A Ball Collision Dataset (ABCD)

A Ball-Collision Dataset (ABCD) serves as a comprehensive benchmark for investigating the interaction dynamics of moving objects within 3D environments. …

šŸ“Š 1 results
šŸ“ Metrics: Accuracy (L:R) - T1

Contact Detection

BEHAVE

BEHAVE is a full body human-object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along …

šŸ“Š 4 results
šŸ“ Metrics: Precision, Recall

Motion Planning

nuScenes

The nuScenes dataset is a large-scale autonomous driving dataset. The dataset has 3D bounding boxes for 1000 scenes collected in …

šŸ“Š 1 results
šŸ“ Metrics: Collision, L2

Object Rearrangement

Open6DOR V2

We introduce a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks, termed Open6DOR.

šŸ“Š 4 results
šŸ“ Metrics: 6-DoF, pos-level1, pos-level0, rot-level0, rot-level1, rot-level2

Robot Manipulation

CALVIN

CALVIN (Composing Actions from Language and Vision), is an open-source simulated benchmark to learn long-horizon language-conditioned robot manipulation tasks.

šŸ“Š 19 results
šŸ“ Metrics: avg. sequence length (D to D)

RLBench

RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research …

šŸ“Š 16 results
šŸ“ Metrics: Succ. Rate (18 tasks, 100 demo/task), Succ. Rate (18 tasks, 10 demo/task), Training Time (V100 x 8 x day), Training Time (A100 x hour), Succ. Rate (10 tasks, 100 demos/task), Succ. Rate (74 tasks, 100 demos/task), Inference Speed (fps), Input Image Size

SimplerEnv-Google Robot

Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as …

šŸ“Š 9 results
šŸ“ Metrics: Visual Matching, Visual Matching-Pick Coke Can, Visual Matching-Move Near, Visual Matching-Open/Close Drawer, Variant Aggregation, Variant Aggregation-Pick Coke Can, Variant Aggregation-Move Near, Variant Aggregation-Open/Close Drawer

SimplerEnv-Widow X

Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as …

šŸ“Š 7 results
šŸ“ Metrics: Average, Put Spoon on Towel, Put Carrot on Plate, Stack Green Block on Yellow Block, Put Eggplant in Yellow Basket, Put Eggplant in Yellow Basket

Robot Task Planning

PackIt

The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. …

šŸ“Š 4 results
šŸ“ Metrics: Average Reward

SheetCopilot

The SheetCopilot dataset contains 28 evaluation workbooks and 221 spreadsheet manipulation tasks that are applied to these workbooks. These tasks …

šŸ“Š 2 results
šŸ“ Metrics: Pass@1

Robotic Grasping

GraspNet-1Billion

GraspNet-1Billion provides large-scale training data and a standard evaluation platform for the task of general robotic grasping. The dataset contains …

šŸ“Š 5 results
šŸ“ Metrics: mAP, AP_seen, AP_similar, AP_novel

NBMOD

Introduction NBMOD is a dataset created for researching the task of specific object grasp detection by robots in noisy …

šŸ“Š 1 results
šŸ“ Metrics: Acc

Semantic Segmentation

ACDC Scribbles

We release expert-made scribble annotations for the medical ACDC dataset [1]. The released data must be considered as extending the …

šŸ“Š 6 results
šŸ“ Metrics: Dice (Average)

ADE20K

The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. …

šŸ“Š 229 results
šŸ“ Metrics: Validation mIoU, Test Score, Params (M), GFLOPs (512 x 512), GFLOPs, Mean IoU (class)

AI-TOD

AI-TOD comes with 700,621 object instances for eight categories across 28,036 aerial images. Compared to existing object detection datasets in …

šŸ“Š 2 results
šŸ“ Metrics: Dice

AIRS

The AIRS (Aerial Imagery for Roof Segmentation) dataset provides a wide coverage of aerial imagery with 7.5 cm resolution and …

šŸ“Š 1 results
šŸ“ Metrics: IoU

ATLANTIS

ATLANTIS is a benchmark for semantic segmentation of waterbody images. This dataset covers a wide range of natural waterbodies such …

šŸ“Š 1 results
šŸ“ Metrics: A-acc, A-mIoU, Accuracy, mIoU

ApolloScape

ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China …

šŸ“Š 2 results
šŸ“ Metrics: mIoU

BIG

A high-resolution semantic segmentation dataset with 50 validation and 100 test objects. Image resolution in BIG ranges from 2048Ɨ1600 to …

šŸ“Š 4 results
šŸ“ Metrics: mBA, IoU

CC3M-TagMask

The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that …

šŸ“Š 4 results
šŸ“ Metrics: mIoU

CEMS-W

The dataset includes annotations for burned area delineation and land cover segmentation, with a focus on European soil. The dataset …

šŸ“Š 3 results
šŸ“ Metrics: mIoU

COCO (Common Objects in Context)

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to …

šŸ“Š 9 results
šŸ“ Metrics: mIoU

COCO-Stuff

The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and …

šŸ“Š 1 results
šŸ“ Metrics: F.W. IU, Per-Class Accuracy, Pixel Accuracy, mIoU

Cam2BEV

The dataset contains two subsets of synthetic, semantically segmented road-scene images, which have been created for developing and applying the …

šŸ“Š 1 results
šŸ“ Metrics: Mean IoU

CamVid

CamVid (Cambridge-driving Labeled Video Database) is a road/driving scene understanding database which was originally captured as five video sequences with …

šŸ“Š 20 results
šŸ“ Metrics: Mean IoU, Global Accuracy

Cityscapes

Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense …

šŸ“Š 2 results
šŸ“ Metrics: mIoU, Pixel Accuracy

Cityscapes 3D

Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. …

šŸ“Š 1 results
šŸ“ Metrics: mIoU

Cityscapes VIPriors subset

The training and validation data are subsets of the training split of the Cityscapes dataset. The test set is taken …

šŸ“Š 1 results
šŸ“ Metrics: Accuracy, mIoU

DADA-seg

DADA-seg is a pixel-wise annotated accident dataset, which contains a variety of critical scenarios from traffic accidents. It is used …

šŸ“Š 27 results
šŸ“ Metrics: mIoU

DDD17

DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, …

šŸ“Š 9 results
šŸ“ Metrics: mIoU

DELIVER

DELIVER is an arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, the dataset is …

šŸ“Š 9 results
šŸ“ Metrics: mIoU, test mIoU

DIVA-HisDB

The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout …

šŸ“Š 2 results
šŸ“ Metrics: Mean IoU (class)

DSEC

DSEC is a stereo camera dataset in driving scenarios that contains data from two monochrome event cameras and two global …

šŸ“Š 9 results
šŸ“ Metrics: mIoU

Dark Zurich

Dark Zurich is an image dataset containing a total of 8779 images captured at nighttime, twilight, and daytime, along with …

šŸ“Š 14 results
šŸ“ Metrics: mIoU

DensePASS

DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer …

šŸ“Š 35 results
šŸ“ Metrics: mIoU

DroneDeploy

From DroneDeploy: We’ve collected a dataset of aerial orthomosaics and elevation images. These have been annotated into 6 different classes: …

šŸ“Š 1 results
šŸ“ Metrics: Mean IoU (test), Mean IoU (val)

Endoscapes

Cholecystectomy is a very common abdominal surgical procedure almost ubiquitously performed with a laparoscopic approach, hence guided by an endoscopic …

šŸ“Š 2 results
šŸ“ Metrics: Mean F1

FLAIR (French Land cover from Aerospace ImageRy)

The French National Institute of Geographical and Forest Information (IGN) has the mission to document and measure land-cover on French …

šŸ“Š 4 results
šŸ“ Metrics: mIoU

FMB Dataset

FMB contains 1500 well-registered infrared and visible image pairs with 14 annotated pixel-level categories. Also, it covers a wide range …

šŸ“Š 13 results
šŸ“ Metrics: mIoU

Fine-Grained Cloud Segmentation Dataset

The dataset consists of 96 terrain-corrected (Level-1T) scenes from Landsat 8 OLI and TIRS, covering diverse biomes. This variety supports …

šŸ“Š 3 results
šŸ“ Metrics: mIoU

Fine-Grained Grass Segmentation Dataset

The dataset was created using high-resolution (8 m) satellite imagery from the Gaofen series (Gaofen-2 and Gaofen-6), captured in 2019 …

šŸ“Š 9 results
šŸ“ Metrics: mIoU

FoodSeg103

FoodSeg103 is a new food image dataset containing 7,118 images. Images are annotated with 104 ingredient classes and each image …

šŸ“Š 7 results
šŸ“ Metrics: mIoU

Forward-Looking Sonar Marine Debris Datasets

This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used …

šŸ“Š 1 results
šŸ“ Metrics: mIOU

Freiburg Forest

The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and …

šŸ“Š 2 results
šŸ“ Metrics: Mean IoU

HAM10000

HAM10000 is a dataset of 10000 training images for detecting pigmented skin lesions. The authors collected dermatoscopic images from different …

šŸ“Š 1 results
šŸ“ Metrics: Average Dice, Average IOU

HERA RFI Detection

This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South …

šŸ“Š 2 results
šŸ“ Metrics: AUPRC, AUROC, F1

Hypersim

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. …

šŸ“Š 5 results
šŸ“ Metrics: mIoU, mIoU (test)

INRIA Aerial Image Labeling

The INRIA Aerial Image Labeling dataset is comprised of 360 RGB tiles of 5000Ɨ5000px with a spatial resolution of 30cm/px …

šŸ“Š 6 results
šŸ“ Metrics: IoU, mIOU

ISPRS Potsdam

The data set contains 38 patches (of the same size), each consisting of a true orthophoto (TOP) extracted from a …

šŸ“Š 17 results
šŸ“ Metrics: Overall Accuracy, Mean F1, Mean IoU

ISPRS Vaihingen

The data set contains 33 patches (of different sizes), each consisting of a true orthophoto (TOP) extracted from a larger …

šŸ“Š 10 results
šŸ“ Metrics: Overall Accuracy, Average F1, Category mIoU

ImageNet-S

Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two …

šŸ“Š 20 results
šŸ“ Metrics: mIoU (val), mIoU (test)

KITTI-360

KITTI-360 is a large-scale dataset that contains rich sensory information and full annotations. It is the successor of the popular …

šŸ“Š 14 results
šŸ“ Metrics: mIoU

Kvasir-Instrument

Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, …

šŸ“Š 2 results
šŸ“ Metrics: DSC, mIoU

LOFAR RFI Detection

This dataset contains simulated and expert-labelled spectrograms from two radio telescopes: the Hydrogen Epoch of Reionization Array (HERA) in South …

šŸ“Š 2 results
šŸ“ Metrics: AUPRC, AUROC, F1

LaRS

LaRS is the largest and most diverse panoptic maritime obstacle detection dataset. Highlights: * Diverse scenes from manual capture, public …

šŸ“Š 20 results
šŸ“ Metrics: Q, F1, μ, mIoU

LoveDA

  1. 5987 high spatial resolution (0.3 m) remote sensing images from Nanjing, Changzhou, and Wuhan 2. Focus on different geographical …
šŸ“Š 16 results
šŸ“ Metrics: Category mIoU

MCubeS

Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four …

šŸ“Š 21 results
šŸ“ Metrics: mIoU

MCubeS (P)

Multimodal material segmentation (MCubeS) dataset contains 500 sets of images from 42 street scenes. Each scene has images for four …

šŸ“Š 8 results
šŸ“ Metrics: mIoU

MUSES: MUlti-SEnsor Semantic perception dataset

MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types …

šŸ“Š 2 results
šŸ“ Metrics: mIoU

Matterport3D

The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside …

šŸ“Š 4 results
šŸ“ Metrics: Test mIoU, Validation mIoU

Mila Simulated Floods

Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and …

šŸ“Š 1 results
šŸ“ Metrics: mIoU

MixedWM38

MixedWM38 Dataset(WaferMap) has more than 38000 wafer maps, including 1 normal pattern, 8 single defect patterns, and 29 mixed defect …

šŸ“Š 1 results
šŸ“ Metrics: Dice, Mean IoU

Montgomery County X-ray Set

X-ray images in this data set have been acquired from the tuberculosis control program of the Department of Health andHuman …

šŸ“Š 3 results
šŸ“ Metrics: F1-score

Nighttime Driving

Nighttime Driving is a dataset of road scenes consisting of 35,000 images ranging from daytime to twilight time and to …

šŸ“Š 12 results
šŸ“ Metrics: mIoU

OpenEDS

OpenEDS (Open Eye Dataset) is a large scale data set of eye-images captured using a virtual-reality (VR) head mounted display …

šŸ“Š 1 results
šŸ“ Metrics: mIOU

PASCAL Context

The PASCAL Context dataset is an extension of the PASCAL VOC 2010 detection challenge, and it contains pixel-wise labels for …

šŸ“Š 62 results
šŸ“ Metrics: mIoU, Mean Accuracy, Pixel Accuracy

PASCAL VOC

The PASCAL Visual Object Classes (VOC) 2012 dataset contains 20 object categories including vehicles, household, animals, and other: aeroplane, bicycle, …

šŸ“Š 1 results
šŸ“ Metrics: mIoU

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are: Person: person …

šŸ“Š 2 results
šŸ“ Metrics: Mean IoU

PASCAL VOC 2011

PASCAL VOC 2011 is an image segmentation dataset. It contains around 2,223 images for training, consisting of 5,034 objects. Testing …

šŸ“Š 1 results
šŸ“ Metrics: Mean IoU

PASCAL VOC 2012 test

SCC Data Set

šŸ“Š 51 results
šŸ“ Metrics: Mean IoU, FLOPS, Params

PASTIS

PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite image time series. It is …

šŸ“Š 3 results
šŸ“ Metrics: Mean IoU (test), Number of Params, Overall Accuracy

PASTIS-R

Extension of the PASTIS benchmark with radar and optical image time series.

šŸ“Š 1 results
šŸ“ Metrics: IoU

PETRAW

PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session …

šŸ“Š 4 results
šŸ“ Metrics: Mean IoU (class)

PH2

The increasing incidence of melanoma has recently promoted the development of computer-aided diagnosis systems for the classification of dermoscopic images. …

šŸ“Š 2 results
šŸ“ Metrics: Average Dice, Average IOU

Pothole Mix

This dataset for the semantic segmentation of potholes and cracks on the road surface was assembled from 5 other datasets …

šŸ“Š 7 results
šŸ“ Metrics: Test Dice Multiclass, Test mIoU, Validation Dice Multiclass, Validation mIoU

Potsdam

https://paperswithcode.com/sota/semantic-segmentation-on-isprs-potsdam

šŸ“Š 3 results
šŸ“ Metrics: mIoU

RUGD

A Video Dataset for Visual Perception and Autonomous Navigation in Unstructured Environments. Website: http://rugd.vision/ The RUGD dataset focuses on semantic …

šŸ“Š 1 results
šŸ“ Metrics: AIOU, mIoU

Replica

The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean …

šŸ“Š 5 results
šŸ“ Metrics: mIoU

S3DIS

The Stanford 3D Indoor Scene Dataset (S3DIS) dataset contains 6 large-scale indoor areas with 271 rooms. Each point in the …

šŸ“Š 50 results
šŸ“ Metrics: Mean IoU, mAcc, oAcc, FLOPs, Number of params, mIoU, Params (M)

SBCoseg

The SBCoseg dataset includes 889 groups of images and each group consists of 18 images with a common object, leading …

šŸ“Š 1 results
šŸ“ Metrics: Jaccard

STARE

The STARE (Structured Analysis of the Retina) dataset is a dataset for retinal vessel segmentation. It contains 20 equal-sized (700Ɨ605) …

šŸ“Š 1 results
šŸ“ Metrics: AUC

SWIMSEG

The SWIMSEG dataset contains 1013 images of sky/cloud patches, along with their corresponding binary segmentation maps. The ground truth annotation …

šŸ“Š 1 results
šŸ“ Metrics: Average Precision, Average Recall, F1-Score, MCC, Mean IoU

SWINSEG

The SWINSEG dataset contains 115 nighttime images of sky/cloud patches along with their corresponding binary ground truth maps. The ground …

šŸ“Š 1 results
šŸ“ Metrics: Average Precision, Average Recall, F1-Score, MCC, Mean IoU

SWINySEG

The SWINySEG dataset contains 6768 daytime- and nighttime-images of sky/cloud patches along with their corresponding binary ground truth maps. The …

šŸ“Š 1 results
šŸ“ Metrics: Average Precision, Average Recall, F1-Score, MCC, Mean IoU

SYNTHIA

The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and …

šŸ“Š 2 results
šŸ“ Metrics: mIoU

ScanNet

ScanNet is an instance-level indoor RGB-D dataset that includes both 2D and 3D data. It is a collection of labeled …

šŸ“Š 44 results
šŸ“ Metrics: val mIoU, test mIoU

Semantic3D

Semantic3D is a point cloud dataset of scanned outdoor scenes with over 3 billion points. It contains 15 training and …

šŸ“Š 13 results
šŸ“ Metrics: mIoU, oAcc

SemanticPOSS

The SemanticPOSS dataset for 3D semantic segmentation contains 2988 various and complicated LiDAR scans with large quantity of dynamic instances. …

šŸ“Š 1 results
šŸ“ Metrics: Mean IoU

ShapeNet

ShapeNet is a large scale repository for 3D CAD models developed by researchers from Stanford University, Princeton University and the …

šŸ“Š 4 results
šŸ“ Metrics: Mean IoU

SpaceNet 1

SpaceNet 1: Building Detection v1 is a dataset for building footprint detection. The data is comprised of 382,534 building footprints, …

šŸ“Š 10 results
šŸ“ Metrics: Mean IoU

Structured3D

Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground …

šŸ“Š 4 results
šŸ“ Metrics: Test mIoU, Validation mIoU

Trans10K

A large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations, …

šŸ“Š 14 results
šŸ“ Metrics: mIoU, GFLOPs

UAVid

UAVid is a high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving …

šŸ“Š 6 results
šŸ“ Metrics: Mean IoU

UPLight

UPLight is an underwater RGB-Polarization multimodal semantic segmentation dataset with 12 typical underwater semantic classes.

šŸ“Š 6 results
šŸ“ Metrics: mIoU

VDD

Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential seman- tic details to …

šŸ“Š 7 results
šŸ“ Metrics: mIoU

WildDash

WildDash is a benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm …

šŸ“Š 1 results
šŸ“ Metrics: Mean IoU

ZJU-RGB-P

Research on semantic segmentation of traffic scenes using color and polarization information (including training and testing sets).

šŸ“Š 13 results
šŸ“ Metrics: mIoU, Frame (fps)

iSAID

iSAID contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The images of iSAID is the same as …

šŸ“Š 15 results
šŸ“ Metrics: mIoU

Skill Generalization

RGB-Stacking

RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance …

šŸ“Š 2 results
šŸ“ Metrics: Group 1, Group 2, Group 3, Group 4, Group 5, Average

Skill Mastery

RGB-Stacking

RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance …

šŸ“Š 2 results
šŸ“ Metrics: Average, Group 1, Group 2, Group 3, Group 4, Group 5

Vision and Language Navigation

RxR

Room-Across-Room (RxR) is a multilingual dataset for Vision-and-Language Navigation (VLN) for Matterport3D environments. In contrast to related datasets such as …

šŸ“Š 6 results
šŸ“ Metrics: ndtw

Touchdown Dataset

Touchdown is a corpus for executing navigation instructions and resolving spatial descriptions in visual real-world environments. The task is to …

šŸ“Š 12 results
šŸ“ Metrics: Task Completion (TC)

map2seq

7,672 human written natural language navigation instructions for routes in OpenStreetMap with a focus on visual landmarks. Validated in Street …

šŸ“Š 5 results
šŸ“ Metrics: Task Completion (TC)

Visual Navigation

AI2-THOR

AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and …

šŸ“Š 2 results
šŸ“ Metrics: SPL (All), SPL (L≄5), Success Rate (All), Success Rate (L≄5)

R2R

R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated …

šŸ“Š 11 results
šŸ“ Metrics: spl

Visual Odometry

EuRoC MAV

EuRoC MAV is a visual-inertial datasets collected on-board a Micro Aerial Vehicle (MAV). The dataset contains stereo images, synchronized IMU …

šŸ“Š 1 results
šŸ“ Metrics: Relative Position Error Translation [cm]