← ML Research Wiki / 2301.00493

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays, Georgia Tech, Ubc, Mit, Cmu (2023)

Paper Information
arXiv ID
Venue
NeurIPS Datasets and Benchmarks
Domain
autonomous driving
SOTA Claim
Yes
Reproducibility
8/10

Abstract

We introduce Argoverse 2 (AV2) -a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry -sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.In the last two years, the Argoverse team has hosted six competitions on 3D tracking, stereo depth estimation, and motion forecasting. We maintain evaluation servers and leaderboards for these tasks, *Equal contribution. † Work completed while at Argo AI. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. arXiv:2301.00493v1 [cs.CV] 2 Jan 2023 1. Bigger isn't always better. Self-driving vehicles capture a flood of sensor data which is logistically difficult to work with. Sensor datasets are several terabytes in size, even when compressed. If standard benchmarks grow further, we risk alienating much of the academic community and leaving progress to well-resourced industry groups. For this reason, we match but do not exceed the scale of sensor data in nuScenes [4] and Waymo Open [45]. 2. Make every instance count. Much of driving is boring. Datasets should focus on the difficult, interesting scenarios where current forecasting and perception systems struggle. Therefore we mine for especially crowded, dynamic, and kinematically unusual scenarios. 3. Diversity matters. Training on data from wintertime Detroit is not sufficient for detecting objects in Miami -Miami has 15 times the frequency of motorcycles and mopeds. Behaviors differ as well, so learned pedestrian motion behavior might not generalize. Accordingly, each of our datasets are drawn from six diverse cities -Austin, Detroit, Miami, Palo Alto, Pittsburgh, and Washington D.C. -and different seasons, as well, from snowy to sunny. 4. Map the world. HD maps are powerful priors for perception and forecasting. Learning-based methods that found clever ways to encode map information [31] performed well in Argoverse competitions. For this reason, we augment our HD map representation with 3D lane geometry, paint markings, crosswalks, higher resolution ground height, and more.5. Self-supervise. Other machine learning domains have seen enormous success from self-supervised learning in recent years. Large-scale lidar data from dynamic scenes, paired with HD maps, could lead to better representations than current supervised approaches. For this reason, we build the largest dataset of lidar sensor data.6. Fight the heavy tail. Passenger vehicles are common, and thus we can assess our forecasting and detection accuracy for cars. However, with existing datasets, we cannot assess forecasting accuracy for buses and motorcycles with their distinct behaviors, nor can we evaluate stroller and wheel chair detection. Thus we introduce the largest taxonomy to date for sensor and forecasting datasets, and we ensure enough samples of rare objects to train and evaluate models.With these guidelines in mind we built the three Argoverse 2 (AV2) datasets. Below, we highlight some of their contributions.1. The 1,000 scenario Sensor dataset has the largest self-driving taxonomy to date -30 categories. 26 categories contain at least 6,000 cuboids to enable diverse taxonomy training and testing. The dataset also has stereo imagery, unlike recent self-driving datasets.2. The 20,000 scenario Lidar dataset is the largest dataset for self-supervised learning on lidar. The only similar dataset, concurrently developed ONCE [36], does not have HD maps.3. The 250,000 scenario Motion Forecasting Dataset has the largest taxonomy -5 types of dynamic actors and 5 types of static actors -and covers the largest mapped area of any such dataset.We believe these datasets will support research into problems such as 3D detection, 3D tracking, monocular and stereo depth estimation, motion forecasting, visual odometry, pose estimation, lane detection, map automation, self-supervised learning, structure from motion, scene flow, optical flow, time to contact estimation, and point cloud forecasting.Related WorkThe last few years have seen rapid progress in self-driving perception and forecasting research, catalyzed by many high quality datasets.Sensor datasets and 3D Object Detection and Tracking. New sensor datasets for 3D object detection[4,45,39,40,24,33,18,14,41,36]have led to influential detection methods such as 1 This count includes private submissions not posted to the public leaderboards. 2 https://github.com/argoverse/argoverse-api

Summary

The Argoverse 2 (AV2) paper presents a series of three datasets aimed at enhancing research in self-driving vehicle perception and forecasting. The Sensor Dataset comprises 1,000 sequences containing multimodal data with high-resolution imagery and lidar point clouds, annotated with 3D cuboid annotations for 26 categories. The Lidar Dataset features 20,000 sequences of unlabeled lidar point clouds, supporting self-supervised learning. The Motion Forecasting Dataset consists of 250,000 challenging scenarios to predict future motion of 'scored actors' within each scene. Designed to fill gaps in existing datasets, AV2 emphasizes diversity across various cities and weather conditions, along with high-definition maps. The datasets are intended for a range of tasks such as 3D tracking, motion forecasting, and self-supervised learning.

Methods

This paper employs the following methods:

  • 3D detection
  • 3D tracking
  • motion forecasting
  • self-supervised learning

Models Used

  • CenterPoint

Datasets

The following datasets were used in this research:

  • Sensor Dataset
  • Lidar Dataset
  • Motion Forecasting Dataset

Evaluation Metrics

  • mean IoU
  • l 1 -norm
  • Chamfer distance
  • minADE
  • minFDE
  • Miss Rate

Results

  • Introduced three new datasets that improve upon previous iterations
  • The Sensor Dataset has the largest self-driving taxonomy to date with 30 categories
  • The Lidar Dataset supports self-supervised learning with the largest lidar collection
  • The Motion Forecasting Dataset aims to elevate the difficulty and realism in forecasting tasks.

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

self-driving datasets perception forecasting HD maps lidar sensor data

Papers Using Similar Methods

External Resources