The UT-Interaction dataset contains videos of continuous executions of 6 classes of human-human interactions: shake-hands, point, hug, push, kick and punch. Ground truth labels for these interactions are provided, including time intervals and bounding boxes. There is a total of 20 video sequences whose lengths are around 1 minute. Each video contains at least one execution per interaction, resulting in 8 executions of human activities per video on average. Several participants with more than 15 different clothing conditions appear in the videos. The videos are taken with the resolution of 720*480, 30fps, and the height of a person in the video is about 200 pixels.
Source: https://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Image Source: https://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Variants: UT-Interaction
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Human Interaction Recognition | LSTM-IRN'fc1inter+intra | Interaction Relational Network for Mutual … | 2019-10-11 |
Recent papers with results on this dataset: