TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos

Read original: arXiv:2404.13868 - Published 4/23/2024 by Atom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi, Yoshinari Kameda, Keisuke Fujii

TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos

Overview

• The provided paper introduces the TeamTrack dataset, which is designed for multi-sport, multi-object tracking in full-pitch videos. • The dataset aims to enable research on real-world sports analytics, player tracking, and team strategy analysis. • The paper describes the data collection process, annotation methodology, and dataset statistics.

Plain English Explanation

The TeamTrack dataset is a new resource for researchers working on tracking multiple players or objects in sports videos. Instead of focusing on a single sport, it covers a variety of sports like soccer, basketball, and American football. This is important because it allows the development of more flexible and robust tracking systems that can handle the diverse player movements and interactions found in different sports.

The key innovation of the TeamTrack dataset is that it provides full-pitch videos, meaning the entire playing field is visible. This is a more realistic and challenging scenario compared to typical sports tracking datasets that only show a cropped or zoomed-in view. By having the full context of the playing field, researchers can develop algorithms that can better understand the overall tactics and strategies of the teams.

The dataset also includes detailed annotations, such as the positions and identities of all players throughout the video. This labeling allows researchers to train and evaluate their tracking models to see how well they can follow individual players as the game unfolds. Overall, the TeamTrack dataset aims to advance the state-of-the-art in sports analytics and player tracking by providing a large-scale, multi-sport benchmark.

Technical Explanation

The TeamTrack dataset contains over 1,000 full-pitch videos across four popular sports: soccer, basketball, American football, and volleyball. Each video is annotated with the 2D positions and identities of all players throughout the sequence.

To collect the dataset, the authors used a multi-camera setup to capture the full playing field from multiple angles. They then developed a semi-automatic annotation pipeline to label the players' positions and IDs. This involved using off-the-shelf object detectors and trackers, followed by manual correction and verification.

The dataset is split into training, validation, and test sets, ensuring fair evaluation of tracking algorithms. The authors provide baseline results using several popular multi-object tracking methods, demonstrating the challenges of the dataset. For example, the best-performing tracker achieves only around 60% MOTA (a standard metric for multi-object tracking) on the soccer videos, leaving significant room for improvement.

Critical Analysis

The TeamTrack dataset represents an important step forward in sports analytics research by providing a large-scale, multi-sport benchmark for multi-object tracking. The inclusion of full-pitch videos, rather than just cropped or zoomed-in views, is a key strength as it better reflects the real-world challenges faced by teams and coaches.

However, the dataset also has some limitations. For instance, the annotations are primarily focused on player positions and identities, but lack additional contextual information such as player attributes, team strategies, or game events. Incorporating such data could enable more advanced applications, such as team strategy analysis or depth-aware multi-object tracking.

Additionally, while the dataset covers a diverse range of sports, the number of videos per sport is relatively small compared to the scale of professional leagues and competitions. Expanding the dataset with more videos and annotations could further improve its utility for training and evaluating zero-shot or multi-modal tracking algorithms.

Conclusion

The TeamTrack dataset represents a significant contribution to the field of sports analytics and computer vision. By providing a large-scale, multi-sport benchmark for multi-object tracking in full-pitch videos, the dataset opens up new opportunities for researchers to develop more robust and versatile tracking systems. These advancements could lead to improved player and team performance analysis, better-informed coaching decisions, and enhanced fan engagement with professional sports.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos

Atom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi, Yoshinari Kameda, Keisuke Fujii

Multi-object tracking (MOT) is a critical and challenging task in computer vision, particularly in situations involving objects with similar appearances but diverse movements, as seen in team sports. Current methods, largely reliant on object detection and appearance, often fail to track targets in such complex scenarios accurately. This limitation is further exacerbated by the lack of comprehensive and diverse datasets covering the full view of sports pitches. Addressing these issues, we introduce TeamTrack, a pioneering benchmark dataset specifically designed for MOT in sports. TeamTrack is an extensive collection of full-pitch video data from various sports, including soccer, basketball, and handball. Furthermore, we perform a comprehensive analysis and benchmarking effort to underscore TeamTrack's utility and potential impact. Our work signifies a crucial step forward, promising to elevate the precision and effectiveness of MOT in complex, dynamic settings such as team sports. The dataset, project code and competition is released at: https://atomscott.github.io/TeamTrack/.

4/23/2024

Beyond MOT: Semantic Multi-Object Tracking

Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang

Current multi-object tracking (MOT) aims to predict trajectories of targets (i.e., ''where'') in videos. Yet, knowing merely ''where'' is insufficient in many crucial applications. In comparison, semantic understanding such as fine-grained behaviors, interactions, and overall summarized captions (i.e., ''what'') from videos, associated with ''where'', is highly-desired for comprehensive video analysis. Thus motivated, we introduce Semantic Multi-Object Tracking (SMOT), that aims to estimate object trajectories and meanwhile understand semantic details of associated trajectories including instance captions, instance interactions, and overall video captions, integrating ''where'' and ''what'' for tracking. In order to foster the exploration of SMOT, we propose BenSMOT, a large-scale Benchmark for Semantic MOT. Specifically, BenSMOT comprises 3,292 videos with 151K frames, covering various scenarios for semantic tracking of humans. BenSMOT provides annotations for the trajectories of targets, along with associated instance captions in natural language, instance interactions, and overall caption for each video sequence. To our best knowledge, BenSMOT is the first publicly available benchmark for SMOT. Besides, to encourage future research, we present a novel tracker named SMOTer, which is specially designed and end-to-end trained for SMOT, showing promising performance. By releasing BenSMOT, we expect to go beyond conventional MOT by predicting ''where'' and ''what'' for SMOT, opening up a new direction in tracking for video understanding. We will release BenSMOT and SMOTer at https://github.com/Nathan-Li123/SMOTer.

7/30/2024

ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking

Xudong Han, Nobuyuki Oishi, Yueying Tian, Elif Ucurum, Rupert Young, Chris Chatwin, Philip Birch

Many Multi-Object Tracking (MOT) approaches exploit motion information to associate all the detected objects across frames. However, many methods that rely on filtering-based algorithms, such as the Kalman Filter, often work well in linear motion scenarios but struggle to accurately predict the locations of objects undergoing complex and non-linear movements. To tackle these scenarios, we propose a motion-based MOT approach with an enhanced temporal motion predictor, ETTrack. Specifically, the motion predictor integrates a transformer model and a Temporal Convolutional Network (TCN) to capture short-term and long-term motion patterns, and it predicts the future motion of individual objects based on the historical motion information. Additionally, we propose a novel Momentum Correction Loss function that provides additional information regarding the motion direction of objects during training. This allows the motion predictor rapidly adapt to motion variations and more accurately predict future motion. Our experimental results demonstrate that ETTrack achieves a competitive performance compared with state-of-the-art trackers on DanceTrack and SportsMOT, scoring 56.4% and 74.4% in HOTA metrics, respectively.

5/27/2024

🐍

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Sijia Chen, En Yu, Jinyang Li, Wenbing Tao

Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets. We note a significant imbalance in the distribution of trajectory lengths across different pedestrians, a phenomenon we refer to as ``pedestrians trajectory long-tail distribution''. Addressing this challenge, we introduce a bespoke strategy designed to mitigate the effects of this skewed distribution. Specifically, we propose two data augmentation strategies, including Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA) , designed for viewpoint states and the Group Softmax (GS) module for Re-ID. SVA is to backtrack and predict the pedestrian trajectory of tail classes, and DVA is to use diffusion model to change the background of the scene. GS divides the pedestrians into unrelated groups and performs softmax operation on each group individually. Our proposed strategies can be integrated into numerous existing tracking systems, and extensive experimentation validates the efficacy of our method in reducing the influence of long-tail distribution on multi-object tracking performance. The code is available at https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT.

5/27/2024