SFSORT: Scene Features-based Simple Online Real-Time Tracker

Read original: arXiv:2404.07553 - Published 4/12/2024 by M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki

SFSORT: Scene Features-based Simple Online Real-Time Tracker

Overview

This paper introduces SFSORT (Scene Features-based Simple Online Real-Time Tracker), a computationally-efficient multi-object tracking algorithm that can operate at high speeds.
SFSORT uses a track-by-detection approach, relying on object detections rather than complex motion models or appearance features to track objects.
The key innovations in SFSORT are its use of scene-level features and a simple data association method that enables real-time performance on resource-constrained devices.

Plain English Explanation

SFSORT is a new way to track multiple objects in real-time that is designed to be efficient and easy to use. Instead of relying on complex algorithms to follow the movement of objects, SFSORT simply matches up objects that are detected in each video frame. It does this by looking at features of the overall scene, like the location and size of objects, rather than trying to model the detailed motion of each individual object.

This makes SFSORT much faster and simpler to run than other multi-object tracking methods. It can operate in real-time, even on devices with limited computing power. This could be useful for applications like self-driving cars, security cameras, or robotics, where you need to quickly identify and track multiple objects in a scene.

The key innovation in SFSORT is how it uses these scene-level features to associate detections across frames, rather than trying to build detailed models of each object's movement or appearance. This allows it to be very efficient, while still maintaining good tracking performance.

Technical Explanation

SFSORT is a real-time multi-object tracking algorithm that takes a track-by-detection approach. Rather than using complex motion models or appearance features to track objects, it instead relies on object detections and simple scene-level features for data association.

The core components of SFSORT are:

Object detection: SFSORT uses a pre-trained object detector to identify objects in each video frame.
Scene feature extraction: It then extracts simple features about the scene, such as the location, size, and aspect ratio of each detected object.
Data association: SFSORT uses these scene features to associate detections across frames, linking them into consistent tracks. It does this through a greedy matching algorithm that pairs detections based on spatial and size similarity.

This approach allows SFSORT to operate in an online, real-time manner, with low computational complexity. The authors demonstrate that it can achieve competitive tracking performance on standard benchmarks, while running at over 100 FPS on modest hardware.

Critical Analysis

The main strength of SFSORT is its computational efficiency and simplicity, which allow it to operate in real-time on resource-constrained devices. By avoiding complex motion models or appearance features, it sidesteps many of the challenges that plague more sophisticated multi-object trackers.

However, this simplicity also comes with some potential limitations. SFSORT may struggle in situations with heavy occlusion, drastic size changes, or rapid object motion, where its basic scene feature-based association could break down. The authors acknowledge these limitations and suggest that SFSORT is best suited for applications where real-time performance is a priority over absolute tracking accuracy.

Additionally, SFSORT's reliance on a pre-trained object detector means that its performance is ultimately bound by the capabilities of that underlying model. If the detector fails to correctly identify objects, SFSORT will have no choice but to also fail.

Further research could explore ways to make SFSORT more robust, perhaps by incorporating some limited motion modeling or appearance features, while still maintaining its computational efficiency. Integrating it with more advanced detectors or exploring self-supervised detection approaches could also be promising directions.

Conclusion

SFSORT is a highly efficient multi-object tracking algorithm that sacrifices some accuracy in favor of real-time performance and simplicity. By focusing on scene-level features rather than complex object models, it can run at over 100 FPS on modest hardware, making it a compelling choice for applications that require fast, resource-constrained tracking.

While SFSORT may not be the most accurate tracker available, its combination of speed, simplicity, and competitive performance on standard benchmarks suggests that it could be a valuable tool in a variety of computer vision and robotics applications where real-time multi-object tracking is a critical requirement.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

SFSORT: Scene Features-based Simple Online Real-Time Tracker

M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki

This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at url{https://github.com/gitmehrdad/SFSORT}.

4/12/2024

FeatureSORT: Essential Features for Effective Tracking

Hamidreza Hashempoor, Rosemary Koikara, Yu Dong Hwang

In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our tracker significantly enhances online tracking accuracy. Additionally, we propose the incorporation of a stronger detector and also provide an advanced post processing methods that further elevate the tracker's performance. During real time operation, we establish measurement to track associated distance function which includes the IoU, direction, color, style, and ReID features similarity information, where each metric is calculated separately. With the design of our feature related distance function, it is possible to track objects through longer period of occlusions, while keeping the number of identity switches comparatively low. Extensive experimental evaluation demonstrates notable improvement in tracking accuracy and reliability, as evidenced by reduced identity switches and enhanced occlusion handling. These advancements not only contribute to the state of the art in object tracking but also open new avenues for future research and practical applications demanding high precision and reliability.

7/8/2024

Deep HM-SORT: Enhancing Multi-Object Tracking in Sports with Deep Features, Harmonic Mean, and Expansion IOU

Matias Gran-Henriksen, Hans Andreas Lindgaard, Gabriel Kiss, Frank Lindseth

This paper introduces Deep HM-SORT, a novel online multi-object tracking algorithm specifically designed to enhance the tracking of athletes in sports scenarios. Traditional multi-object tracking methods often struggle with sports environments due to the similar appearances of players, irregular and unpredictable movements, and significant camera motion. Deep HM-SORT addresses these challenges by integrating deep features, harmonic mean, and Expansion IOU. By leveraging the harmonic mean, our method effectively balances appearance and motion cues, significantly reducing ID-swaps. Additionally, our approach retains all tracklets indefinitely, improving the re-identification of players who leave and re-enter the frame. Experimental results demonstrate that Deep HM-SORT achieves state-of-the-art performance on two large-scale public benchmarks, SportsMOT and SoccerNet Tracking Challenge 2023. Specifically, our method achieves 80.1 HOTA on the SportsMOT dataset and 85.4 HOTA on the SoccerNet-Tracking dataset, outperforming existing trackers in key metrics such as HOTA, IDF1, AssA, and MOTA. This robust solution provides enhanced accuracy and reliability for automated sports analytics, offering significant improvements over previous methods without introducing additional computational cost.

6/19/2024

Engineering an Efficient Object Tracker for Non-Linear Motion

Momir Adv{z}emovi'c, Predrag Tadi'c, Andrija Petrovi'c, Mladen Nikoli'c

The goal of multi-object tracking is to detect and track all objects in a scene while maintaining unique identifiers for each, by associating their bounding boxes across video frames. This association relies on matching motion and appearance patterns of detected objects. This task is especially hard in case of scenarios involving dynamic and non-linear motion patterns. In this paper, we introduce DeepMoveSORT, a novel, carefully engineered multi-object tracker designed specifically for such scenarios. In addition to standard methods of appearance-based association, we improve motion-based association by employing deep learnable filters (instead of the most commonly used Kalman filter) and a rich set of newly proposed heuristics. Our improvements to motion-based association methods are severalfold. First, we propose a new transformer-based filter architecture, TransFilter, which uses an object's motion history for both motion prediction and noise filtering. We further enhance the filter's performance by careful handling of its motion history and accounting for camera motion. Second, we propose a set of heuristics that exploit cues from the position, shape, and confidence of detected bounding boxes to improve association performance. Our experimental evaluation demonstrates that DeepMoveSORT outperforms existing trackers in scenarios featuring non-linear motion, surpassing state-of-the-art results on three such datasets. We also perform a thorough ablation study to evaluate the contributions of different tracker components which we proposed. Based on our study, we conclude that using a learnable filter instead of the Kalman filter, along with appearance-based association is key to achieving strong general tracking performance.

7/2/2024