Real scenes, Interaction, Contact and Humans
Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for “Real scenes, Interaction, Contact and Humans.” RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high resolution 3D scene scans. A key feature of RICH is that it also contains accurate vertex-level contact labels on the body
Variants: RICH
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
3D Human Pose Estimation | UPose3D (eval only) | UPose3D: Uncertainty-Aware 3D Human Pose … | 2024-04-23 |
3D Human Pose Estimation | SkelFormer (HRNet - eval only) | SkelFormer: Markerless 3D Pose and … | 2024-04-19 |
3D Human Pose Estimation | WHAM (ViT) | WHAM: Reconstructing World-grounded Humans with … | 2023-12-12 |
3D Human Pose Estimation | IPMAN-R | 3D Human Pose Estimation via … | 2023-03-31 |
Recent papers with results on this dataset: