Engineering an Efficient Object Tracker for Non-Linear Motion

2407.00738

Published 7/2/2024 by Momir Adv{z}emovi'c, Predrag Tadi'c, Andrija Petrovi'c, Mladen Nikoli'c

Engineering an Efficient Object Tracker for Non-Linear Motion

Abstract

The goal of multi-object tracking is to detect and track all objects in a scene while maintaining unique identifiers for each, by associating their bounding boxes across video frames. This association relies on matching motion and appearance patterns of detected objects. This task is especially hard in case of scenarios involving dynamic and non-linear motion patterns. In this paper, we introduce DeepMoveSORT, a novel, carefully engineered multi-object tracker designed specifically for such scenarios. In addition to standard methods of appearance-based association, we improve motion-based association by employing deep learnable filters (instead of the most commonly used Kalman filter) and a rich set of newly proposed heuristics. Our improvements to motion-based association methods are severalfold. First, we propose a new transformer-based filter architecture, TransFilter, which uses an object's motion history for both motion prediction and noise filtering. We further enhance the filter's performance by careful handling of its motion history and accounting for camera motion. Second, we propose a set of heuristics that exploit cues from the position, shape, and confidence of detected bounding boxes to improve association performance. Our experimental evaluation demonstrates that DeepMoveSORT outperforms existing trackers in scenarios featuring non-linear motion, surpassing state-of-the-art results on three such datasets. We also perform a thorough ablation study to evaluate the contributions of different tracker components which we proposed. Based on our study, we conclude that using a learnable filter instead of the Kalman filter, along with appearance-based association is key to achieving strong general tracking performance.

Create account to get full access

Overview

This paper introduces a novel object tracker designed to handle non-linear motion effectively.
The proposed tracker leverages enhanced temporal motion prediction and graph-based optimization to achieve robust and efficient multi-object tracking.
The authors demonstrate the tracker's performance on various benchmarks, showcasing its ability to handle challenging scenarios with non-linear object movements.

Plain English Explanation

In this paper, the researchers present a new object tracking system that is particularly well-suited for situations where the objects being tracked are moving in unpredictable, non-linear ways. Object tracking is an important task in computer vision, with applications ranging from autonomous vehicles to security systems. However, traditional tracking methods often struggle when the objects exhibit complex, non-linear motion patterns.

To address this challenge, the researchers have developed a tracker that uses an enhanced temporal motion prediction model to anticipate the future positions of the objects. This allows the tracker to stay locked on to the targets, even when they are making abrupt changes in direction or speed. Additionally, the tracker employs a graph-based optimization algorithm to efficiently manage the associations between the detected objects and their corresponding tracks.

Through experiments on standard benchmarks, the authors demonstrate that their tracker outperforms existing methods, particularly in scenarios with non-linear motion. This is a significant advancement, as it means the tracker can be deployed in a wider range of real-world applications where the targets may be moving in unpredictable ways.

Technical Explanation

The core of the proposed tracker is an enhanced temporal motion predictor that can accurately forecast the future locations of objects, even when their motion is non-linear. This is achieved by modeling the spatial-temporal relationships between the object's past and future positions using a deep neural network.

To handle multi-object tracking, the researchers employ a graph-based optimization approach that can efficiently associate detected objects with their corresponding tracks. This graph-based technique allows the tracker to adapt to changes in the scene, such as objects entering or exiting the frame, or occlusions, without losing track of the targets.

The tracker also incorporates a re-identification module to handle cases where objects may temporarily disappear from view and then reappear. This helps maintain consistent object identities throughout the sequence.

The authors evaluate their tracker on several benchmarks, including datasets with non-linear motion patterns, and demonstrate state-of-the-art performance, particularly in scenarios with challenging object movements.

Critical Analysis

The paper presents a well-designed and thoroughly evaluated object tracking system that addresses the important challenge of handling non-linear motion. The authors have made several technical innovations, such as the enhanced temporal motion predictor and the graph-based optimization, which appear to be effective in improving tracking performance.

However, the paper does not provide a detailed analysis of the computational complexity and real-time performance of the proposed tracker. This information would be helpful for assessing the practical viability of the system, especially for applications that require low-latency processing, such as autonomous vehicles.

Additionally, the paper does not explore the potential limitations of the tracker, such as how it might perform in crowded scenes with heavy occlusions or in the presence of rapidly changing lighting conditions. Further research in these areas could help identify the boundaries of the tracker's capabilities and guide future improvements.

Conclusion

This paper introduces a novel object tracking system that is specifically engineered to handle non-linear motion patterns effectively. By leveraging enhanced temporal motion prediction and graph-based optimization techniques, the proposed tracker demonstrates superior performance on various benchmarks, making it a promising solution for real-world applications where targets may exhibit complex, unpredictable movements.

The innovations presented in this work contribute to the ongoing advancements in multi-object tracking, and the insights gained could inspire further research into more robust and efficient tracking algorithms. As computer vision systems continue to be deployed in increasingly challenging environments, the ability to reliably track objects with non-linear motion will become increasingly crucial.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking

Xudong Han, Nobuyuki Oishi, Yueying Tian, Elif Ucurum, Rupert Young, Chris Chatwin, Philip Birch

Many Multi-Object Tracking (MOT) approaches exploit motion information to associate all the detected objects across frames. However, many methods that rely on filtering-based algorithms, such as the Kalman Filter, often work well in linear motion scenarios but struggle to accurately predict the locations of objects undergoing complex and non-linear movements. To tackle these scenarios, we propose a motion-based MOT approach with an enhanced temporal motion predictor, ETTrack. Specifically, the motion predictor integrates a transformer model and a Temporal Convolutional Network (TCN) to capture short-term and long-term motion patterns, and it predicts the future motion of individual objects based on the historical motion information. Additionally, we propose a novel Momentum Correction Loss function that provides additional information regarding the motion direction of objects during training. This allows the motion predictor rapidly adapt to motion variations and more accurately predict future motion. Our experimental results demonstrate that ETTrack achieves a competitive performance compared with state-of-the-art trackers on DanceTrack and SportsMOT, scoring 56.4% and 74.4% in HOTA metrics, respectively.

5/27/2024

cs.CV

Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking

Linh Van Ma, Tran Thien Dat Nguyen, Ba-Ngu Vo, Hyunsung Jang, Moongu Jeon

We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance-reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.

5/30/2024

cs.CV cs.IT

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

Zitong Zhang, Yang Liu, Hao Sun

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e.g., trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues' rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e.g., noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.

4/30/2024

cs.CV cs.AI

👨‍🏫

LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking with Point Clouds

Zhenrong Zhang, Jianan Liu, Yuxuan Xia, Tao Huang, Qing-Long Han, Hongbin Liu

Online multi-object tracking (MOT) plays a pivotal role in autonomous systems. The state-of-the-art approaches usually employ a tracking-by-detection method, and data association plays a critical role. This paper proposes a learning and graph-optimized (LEGO) modular tracker to improve data association performance in the existing literature. The proposed LEGO tracker integrates graph optimization and self-attention mechanisms, which efficiently formulate the association score map, facilitating the accurate and efficient matching of objects across time frames. To further enhance the state update process, the Kalman filter is added to ensure consistent tracking by incorporating temporal coherence in the object states. Our proposed method utilizing LiDAR alone has shown exceptional performance compared to other online tracking approaches, including LiDAR-based and LiDAR-camera fusion-based methods. LEGO ranked 1st at the time of submitting results to KITTI object tracking evaluation ranking board and remains 2nd at the time of submitting this paper, among all online trackers in the KITTI MOT benchmark for cars1

5/28/2024

cs.CV cs.AI