Towards Effective Multi-Moving-Camera Tracking: A New Dataset and Lightweight Link Model

2312.11035

Published 4/24/2024 by Yanting Zhang, Shuanghong Wang, Qingxiang Wang, Cairong Yan, Rui Fan

Towards Effective Multi-Moving-Camera Tracking: A New Dataset and Lightweight Link Model

Abstract

Ensuring driving safety for autonomous vehicles has become increasingly crucial, highlighting the need for systematic tracking of on-road pedestrians. Most vehicles are equipped with visual sensors, however, the large-scale visual data has not been well studied yet. Multi-target multi-camera (MTMC) tracking systems are composed of two modules: single-camera tracking (SCT) and inter-camera tracking (ICT). To reliably coordinate between them, MTMC tracking has been a very complicated task, while tracking across multiple moving cameras makes it even more challenging. In this paper, we focus on multi-target multi-moving-camera (MTMMC) tracking, which is attracting increasing attention from the research community. Observing there are few datasets for MTMMC tracking, we collect a new dataset, called Multi-Moving-Camera Track (MMCT), which contains sequences under various driving scenarios. To address the common problems of identity switch easily faced by most existing SCT trackers, especially for moving cameras due to ego-motion between the camera and targets, a lightweight appearance-free global link model, called Linker, is proposed to mitigate the identity switch by associating two disjoint tracklets of the same target into a complete trajectory within the same camera. Incorporated with Linker, existing SCT trackers generally obtain a significant improvement. Moreover, to alleviate the impact of the image style variations caused by different cameras, a color transfer module is effectively incorporated to extract cross-camera consistent appearance features for pedestrian association across moving cameras for ICT, resulting in a much improved MTMMC tracking system, which can constitute a step further towards coordinated mining of multiple moving cameras. The project page is available at https://dhu-mmct.github.io/.

Create account to get full access

Overview

This paper presents a new dataset and a global link model for multi-target multi-moving camera pedestrian tracking.
The dataset, called MTMCLarge, is a large-scale, real-world, multi-modal dataset that addresses the challenges of multi-camera pedestrian tracking.
The global link model leverages coordinated mining to improve the accuracy and robustness of pedestrian tracking across multiple cameras.

Plain English Explanation

The paper tackles the problem of tracking multiple pedestrians across multiple moving cameras. This is a challenging task because the cameras may be in different locations and orientations, and the pedestrians may move in and out of the cameras' fields of view.

To address this, the researchers created a new dataset called MTMCLarge, which contains a large number of real-world video recordings from multiple moving cameras. This dataset is designed to be more representative of real-world conditions, with challenges like occlusions, camera angle changes, and varying lighting conditions.

The researchers also developed a new algorithm called the global link model, which uses "coordinated mining" to improve the accuracy and reliability of tracking pedestrians across multiple cameras. This model takes into account the relationships between the different cameras and the movements of the pedestrians, allowing it to better predict where a pedestrian will appear next and to more accurately associate the same person across different camera views.

By combining the new dataset and the global link model, the researchers aim to advance the state of the art in multi-target multi-moving camera pedestrian tracking, which has important applications in areas like surveillance, traffic monitoring, and autonomous vehicles.

Technical Explanation

The paper introduces a new dataset called MTMCLarge, a large-scale, real-world, multi-modal dataset for multi-target multi-moving camera pedestrian tracking. The dataset includes video recordings from multiple cameras that are in motion, capturing a variety of real-world scenarios with challenges like occlusions, camera angle changes, and varying lighting conditions.

To address the challenges of tracking pedestrians across these moving cameras, the researchers developed a new algorithm called the global link model. This model leverages "coordinated mining" to improve the accuracy and robustness of pedestrian tracking. The global link model takes into account the relationships between the different cameras and the movements of the pedestrians, allowing it to better predict where a pedestrian will appear next and to more accurately associate the same person across different camera views.

The global link model is inspired by the MLS-Track and Causal Mode Multiplexer approaches, which use multi-level semantic interactions and causal modeling to improve multi-target tracking. The researchers further build on this by incorporating the Semantics-Aware Motion Retargeting and Motion-Guided Dual Camera Tracker techniques to enhance the global link model's ability to track pedestrians across moving cameras.

Critical Analysis

The paper presents a comprehensive approach to addressing the challenges of multi-target multi-moving camera pedestrian tracking, with a strong focus on developing a new large-scale dataset and a global link model that leverages coordinated mining. The dataset and model represent significant advancements in this area of research.

However, the paper does not discuss the potential limitations or drawbacks of the proposed approach. For example, it is not clear how well the global link model would perform in scenarios with a large number of cameras or in environments with significant occlusions or other challenging conditions. Additionally, the paper does not provide details on the computational complexity or real-time performance of the algorithm, which may be important considerations for practical applications.

Further research could explore the robustness and scalability of the global link model, as well as investigate ways to optimize its performance for real-world deployment. Additionally, a more thorough analysis of the dataset and its diversity of scenarios could help identify areas for improvement or additional research directions.

Conclusion

The paper presents a novel approach to multi-target multi-moving camera pedestrian tracking, featuring a new large-scale dataset and a global link model that leverages coordinated mining. The dataset and model represent significant advancements in this field, with the potential to improve the accuracy and robustness of pedestrian tracking in real-world scenarios.

While the paper does not address the potential limitations of the proposed approach, the research represents an important step forward in addressing the challenges of multi-camera pedestrian tracking. The insights and techniques developed in this work could have important implications for a wide range of applications, from surveillance and traffic monitoring to autonomous vehicles and robotics.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📈

City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model

Yuqiang Lin, Sam Lockyer, Adrian Evans, Markus Zarbock, Nic Zhang

Multi-Target Multi-Camera Tracking (MTMCT) has broad applications and forms the basis for numerous future city-wide systems (e.g. traffic management, crash detection, etc.). However, the challenge of matching vehicle trajectories across different cameras based solely on feature extraction poses significant difficulties. This article introduces an innovative multi-camera vehicle tracking system that utilizes a self-supervised camera link model. In contrast to related works that rely on manual spatial-temporal annotations, our model automatically extracts crucial multi-camera relationships for vehicle matching. The camera link is established through a pre-matching process that evaluates feature similarities, pair numbers, and time variance for high-quality tracks. This process calculates the probability of spatial linkage for all camera combinations, selecting the highest scoring pairs to create camera links. Our approach significantly improves deployment times by eliminating the need for human annotation, offering substantial improvements in efficiency and cost-effectiveness when it comes to real-world application. This pairing process supports cross camera matching by setting spatial-temporal constraints, reducing the searching space for potential vehicle matches. According to our experimental results, the proposed method achieves a new state-of-the-art among automatic camera-link based methods in CityFlow V2 benchmarks with 61.07% IDF1 Score.

5/21/2024

cs.CV cs.AI

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting, which limits their ability to model real-world dynamics and generalize to diverse camera configurations. To address this issue, we present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments - campus and factory - across various time, weather, and season conditions. This dataset provides a challenging test-bed for studying multi-camera tracking under diverse real-world complexities and includes an additional input modality of spatially aligned and temporally synchronized RGB and thermal cameras, which enhances the accuracy of multi-camera tracking. MTMMC is a super-set of existing datasets, benefiting independent fields such as person detection, re-identification, and multiple object tracking. We provide baselines and new learning setups on this dataset and set the reference scores for future studies. The datasets, models, and test server will be made publicly available.

4/1/2024

cs.CV

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

5/2/2024

cs.CV

✅

MAML MOT: Multiple Object Tracking based on Meta-Learning

Jiayi Chen, Chunhua Deng

With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.

5/28/2024

cs.CV cs.AI