There now exists many popular object detectors based on deep learning that can analyze images and extract locations and class labels for occurrences of objects.For image time series (i.e., video or sequences of stills), tracking objects over time and preserving object identity can help to improve object detection performance, and is necessary for many downstream tasks, including classifying and predicting behaviors, and estimating total abundances.Here we present yasmot, a lightweight and flexible object tracker that can process the output from popular object detectors and track objects over time from either monoscopic or stereoscopic camera configurations.In addition, it includes functionality to generate consensus detections from ensembles of object detectors.
The paper presents YASMOT, a lightweight and flexible multi-object tracker designed to process outputs from popular object detectors such as RetinaNet and YOLO for tracking objects in time series images from monocular or stereoscopic configurations. It aims to improve object detection performance through enhanced tracking capabilities, including linking observations between left and right cameras and generating consensus detections from multiple detectors. Key features include using Gaussian distances for detection pairing, controlling sensitivity via parameters, handling missing detections with interpolation, and the ability to process outputs from multiple detectors.
This paper employs the following methods:
The following datasets were used in this research:
- YASMOT improves tracking performance for object detection in video sequences and stereo images
The authors identified the following limitations:
- Performance may decrease with high computational cost when generalizing tracking across multiple frames
- Number of GPUs: None specified
- GPU Type: None specified
- Compute Requirements: None specified