Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

Read original: arXiv:2407.08872 - Published 9/4/2024 by Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

Introduction

This paper introduces a novel approach for visual multi-object tracking with pre-identification and occlusion handling. The key innovation is the use of Labeled Random Finite Sets (LRFS) to represent and reason about multiple objects in a scene. LRFS provide a principled way to handle challenges like object occlusion, appearance changes, and varying number of objects over time.

Related Work

The paper situates its work in the context of several related research areas, including:

Track Initialization and Re-identification for 3D Multi-View Tracking: This work explores techniques for initializing and maintaining object tracks in 3D multi-camera setups.
Offline Tracking and Object Permanence: Research on leveraging object permanence to improve tracking performance, especially in occluded scenarios.
Robust 3D Multi-Object Tracking: Techniques for robust 3D multi-object tracking, which is a key challenge addressed by the current paper.
Language-Driven Resamplable Continuous Representation: The use of continuous representations, like those provided by LRFS, to reason about object states and their dynamics.
Training-Free Spatial-Aware Sparse Tracking: Approaches for efficient and effective multi-object tracking that do not require extensive training.

Plain English Explanation

The paper proposes a new way to track multiple objects in visual scenes, such as surveillance cameras or autonomous vehicles. The key idea is to use a mathematical framework called Labeled Random Finite Sets (LRFS) to represent the objects and their locations.

LRFS allow the system to handle challenges like objects becoming temporarily hidden (occluded), changes in how the objects look over time, and the number of objects in the scene changing. This is important for real-world applications where these issues are common.

The LRFS-based approach outperforms existing methods on standard benchmarks, showing its effectiveness at multi-object tracking in complex scenarios. By using this principled mathematical representation, the system can reason about the objects and their relationships more effectively than previous techniques.

Technical Explanation

The paper introduces a LRFS-based framework for visual multi-object tracking. LRFS provide a way to represent a variable number of objects, each with their own state (e.g., position, velocity, appearance), and reason about their dynamics and interactions.

The tracking system operates in two main stages: 1) object detection and 2) multi-object tracking. In the detection stage, the system uses a deep neural network to identify the objects and their states in each frame. The tracking stage then uses the LRFS representation to maintain consistent object identities over time, handle occlusions, and update the object states as the scene evolves.

The key technical contributions include:

Formulating the multi-object tracking problem as LRFS estimation
Developing efficient LRFS prediction and update equations
Designing an end-to-end neural network architecture to implement the LRFS-based tracking
Extensive experiments on challenging benchmarks demonstrating state-of-the-art performance

Critical Analysis

The paper presents a compelling approach to the challenging problem of visual multi-object tracking. The use of LRFS is a principled and powerful way to reason about the variable number of objects and their complex dynamics, including occlusions and appearance changes.

However, the paper does not address some potential limitations of the LRFS-based approach. For example, the computational complexity of the LRFS estimation may limit its scalability to large-scale scenes with many objects. Additionally, the reliance on deep neural networks for object detection could make the system vulnerable to adversarial attacks or domain shift issues.

Further research could explore ways to improve the efficiency and robustness of the LRFS-based tracking, perhaps by integrating it with other complementary techniques like training-free sparse tracking. Additionally, evaluating the approach on a broader range of real-world scenarios, such as crowded urban environments or complex industrial settings, could provide valuable insights into its practical limitations and potential areas for improvement.

Conclusion

This paper presents a novel LRFS-based framework for visual multi-object tracking that can effectively handle challenges like occlusions, appearance changes, and varying object counts. By formulating the tracking problem in this principled mathematical way, the system demonstrates state-of-the-art performance on standard benchmarks.

The LRFS-based approach offers a promising direction for advancing the field of multi-object tracking, with potential applications in areas like autonomous vehicles, surveillance, and robotics. Further research to address the identified limitations and expand the approach's real-world applicability could lead to significant advancements in our ability to reliably perceive and reason about complex dynamic environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause reappearing objects to be initialized as new tracks, especially after long periods of being undetected. In occlusion handling, the filter's efficacy is dictated by trade-offs between the sophistication of the occlusion model and computational demand. Our contribution is a novel modeling method that exploits object features to address reappearing objects whilst maintaining a linear complexity in the number of detections. Moreover, to improve the filter's occlusion handling, we propose a fuzzy detection model that takes into consideration the overlapping areas between tracks and their sizes. We also develop a fast version of the filter to further reduce the computational time. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.

9/4/2024

Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking

Linh Van Ma, Tran Thien Dat Nguyen, Ba-Ngu Vo, Hyunsung Jang, Moongu Jeon

We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance-reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.

5/30/2024

FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Rongzihan Song, Zhenyu Weng, Huiping Zhuang, Jinchang Ren, Yongming Chen, Zhiping Lin

Multiple object tracking (MOT) involves identifying multiple targets and assigning them corresponding IDs within a video sequence, where occlusions are often encountered. Recent methods address occlusions using appearance cues through online learning techniques to improve adaptivity or offline learning techniques to utilize temporal information from videos. However, most existing online learning-based MOT methods are unable to learn from all past tracking information to improve adaptivity on long-term occlusions while maintaining real-time tracking speed. On the other hand, temporal information-based offline learning methods maintain a long-term memory to store past tracking information, but this approach restricts them to use only local past information during tracking. To address these challenges, we propose a new MOT framework called the Feature Adaptive Continual-learning Tracker (FACT), which enables real-time tracking and feature learning for targets by utilizing all past tracking information. We demonstrate that the framework can be integrated with various state-of-the-art feature-based trackers, thereby improving their tracking ability. Specifically, we develop the feature adaptive continual-learning (FAC) module, a neural network that can be trained online to learn features adaptively using all past tracking information during tracking. Moreover, we also introduce a two-stage association module specifically designed for the proposed continual learning-based tracking. Extensive experiment results demonstrate that the proposed method achieves state-of-the-art online tracking performance on MOT17 and MOT20 benchmarks. The code will be released upon acceptance.

9/14/2024

✅

Offline Tracking with Object Permanence

Xianzhong Liu, Holger Caesar

To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion scenarios in the datasets are common yet underexplored in offline auto labeling. In this work, we propose an offline tracking model that focuses on occluded object tracks. It leverages the concept of object permanence which means objects continue to exist even if they are not observed anymore. The model contains three parts: a standard online tracker, a re-identification (Re-ID) module that associates tracklets before and after occlusion, and a track completion module that completes the fragmented tracks. The Re-ID module and the track completion module use the vectorized map as one of the inputs to refine the tracking results with occlusion. The model can effectively recover the occluded object trajectories. It achieves state-of-the-art performance in 3D multi-object tracking by significantly improving the original online tracking result, showing its potential to be applied in offline auto labeling as a useful plugin to improve tracking by recovering occlusions.

5/7/2024