For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. Hypersim is a photorealistic synthetic dataset for holistic indoor scene understanding. It contains 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.
Source: https://github.com/apple/ml-hypersim
Image Source: https://github.com/apple/ml-hypersim
Variants: Hypersim
This dataset is used in 3 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Semantic Segmentation | EMSANet (2x ResNet-34 NBt1D) | PanopticNDT: Efficient and Robust Panoptic … | 2023-09-24 |
3D Semantic Segmentation | PanopticNDT (10cm) | PanopticNDT: Efficient and Robust Panoptic … | 2023-09-24 |
3D Semantic Segmentation | SemanticNDT (10cm) | PanopticNDT: Efficient and Robust Panoptic … | 2023-09-24 |
Panoptic Segmentation | EMSANet (2x ResNet-34 NBt1D) | PanopticNDT: Efficient and Robust Panoptic … | 2023-09-24 |
Semantic Segmentation | MoCo-v3 (ViT-B) | MultiMAE: Multi-modal Multi-task Masked Autoencoders | 2022-04-04 |
Semantic Segmentation | MultiMAE (ViT-B) | MultiMAE: Multi-modal Multi-task Masked Autoencoders | 2022-04-04 |
Semantic Segmentation | MAE (ViT-B) | MultiMAE: Multi-modal Multi-task Masked Autoencoders | 2022-04-04 |
Semantic Segmentation | DINO (ViT-B) | MultiMAE: Multi-modal Multi-task Masked Autoencoders | 2022-04-04 |
Recent papers with results on this dataset: