← ML Research Wiki / 2506.17186

YASMOT: Yet another stereo image multi-object tracker

(2025)

Paper Information
arXiv ID

Abstract

There now exists many popular object detectors based on deep learning that can analyze images and extract locations and class labels for occurrences of objects.For image time series (i.e., video or sequences of stills), tracking objects over time and preserving object identity can help to improve object detection performance, and is necessary for many downstream tasks, including classifying and predicting behaviors, and estimating total abundances.Here we present yasmot, a lightweight and flexible object tracker that can process the output from popular object detectors and track objects over time from either monoscopic or stereoscopic camera configurations.In addition, it includes functionality to generate consensus detections from ensembles of object detectors.

Summary

The paper presents YASMOT, a lightweight and flexible multi-object tracker designed to process outputs from popular object detectors such as RetinaNet and YOLO for tracking objects in time series images from monocular or stereoscopic configurations. It aims to improve object detection performance through enhanced tracking capabilities, including linking observations between left and right cameras and generating consensus detections from multiple detectors. Key features include using Gaussian distances for detection pairing, controlling sensitivity via parameters, handling missing detections with interpolation, and the ability to process outputs from multiple detectors.

Methods

This paper employs the following methods:

  • Hungarian algorithm

Models Used

  • RetinaNet
  • YOLO

Datasets

The following datasets were used in this research:

  • None specified

Evaluation Metrics

  • None specified

Results

  • YASMOT improves tracking performance for object detection in video sequences and stereo images

Limitations

The authors identified the following limitations:

  • Performance may decrease with high computational cost when generalizing tracking across multiple frames

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified
  • Compute Requirements: None specified

Papers Using Similar Methods

External Resources