While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. To tackle this issue, we identify that the coarse boundary annotations of the popular YouTube-VIS dataset constitute a major limiting factor. To benchmark high-quality mask predictions for VIS, we introduce the HQ-YTVIS dataset as well as Tube-Boundary AP in ECCV 2022. HQ-YTVIS consists of a manually re-annotated test set and our automatically refined training data, which provides training, validation and testing support to facilitate future development of VIS methods aiming at higher mask quality.
Variants: HQ-YTVIS
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Video Instance Segmentation | VMT (Swin-L) | Video Mask Transfiner for High-Quality … | 2022-07-28 |
Video Instance Segmentation | VMT (R101) | Video Mask Transfiner for High-Quality … | 2022-07-28 |
Video Instance Segmentation | VMT (R50) | Video Mask Transfiner for High-Quality … | 2022-07-28 |
Video Instance Segmentation | SeqFormer (Swin-L) | SeqFormer: Sequential Transformer for Video … | 2021-12-15 |
Recent papers with results on this dataset: