4D-OR includes a total of 6734 scenes, recorded by six calibrated RGB-D Kinect sensors 1 mounted to the ceiling of the OR, with one frame-per-second, providing synchronized RGB and depth images. We provide fused point cloud sequences of entire scenes, automatically annotated human 6D poses and 3D bounding boxes for OR objects. Furthermore, we provide SSG annotations for each step of the surgery together with the clinical roles of all the humans in the scenes, e.g., nurse, head surgeon, anesthesiologist.
Variants: 4D-OR
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Scene Graph Generation | MM2SG | MM-OR: A Large Multimodal Operating … | 2025-03-04 |
2D Panoptic Segmentation | MM-OR | MM-OR: A Large Multimodal Operating … | 2025-03-04 |
Scene Graph Generation | ORacle | ORacle: Large Vision-Language Models for … | 2024-04-10 |
Scene Graph Generation | LABRAD-OR | LABRAD-OR: Lightweight Memory Scene Graphs … | 2023-03-23 |
Scene Graph Generation | Pix2SG | Location-Free Scene Graph Generation | 2023-03-20 |
Scene Graph Generation | 4D-OR baseline | 4D-OR: Semantic Scene Graphs for … | 2022-03-22 |
Recent papers with results on this dataset: