TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection

Read original: arXiv:2305.16580 - Published 8/28/2024 by Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen

🔎

Overview

Pedestrian detection is crucial for computer vision and traffic safety
Existing methods relying solely on RGB images struggle in low-light conditions
Recent multispectral approaches combining thermal images perform better
However, these methods don't address the issue of false positives from noisy fused feature maps

Plain English Explanation

Detecting pedestrians pedestrian detection is an important task in computer vision, as it helps ensure the safety of people on roads. Existing methods that only use regular color (RGB) images often have trouble detecting pedestrians in low-light conditions, because these images don't have enough useful information in the dark.

To address this, researchers have started using [object Object] approaches that combine RGB images with thermal images. Thermal images can provide additional information that helps identify pedestrians even when it's dark. This has led to better performance in detecting pedestrians.

However, one issue these multispectral methods haven't focused on is the problem of false positives. False positives happen when the system incorrectly identifies something as a pedestrian when it's not. The noisy way the features from the RGB and thermal images are combined can contribute to these false positives.

Technical Explanation

The paper proposes a novel target-aware fusion strategy for multispectral pedestrian detection, called TFDet. TFDet aims to address the problem of false positives in multispectral pedestrian detection.

The key innovation in TFDet is its approach to fusing the features from the RGB and thermal images. Instead of simply combining the features in a noisy way, TFDet uses a target-aware fusion strategy that enhances the contrast between the features corresponding to pedestrians and the background. This helps reduce the false positives caused by the noisy fused feature maps.

TFDet achieves state-of-the-art performance on two benchmark datasets for multispectral pedestrian detection: KAIST and LLVIP. It also outperforms previous approaches on two multispectral object detection benchmarks, FLIR and M3FD.

Importantly, TFDet maintains comparable inference efficiency to previous methods while providing significantly better detection performance, especially in low-light conditions. This is a key advancement for ensuring road safety.

Critical Analysis

The paper provides a thorough analysis of the impact of false positives on multispectral pedestrian detection performance and proposes a novel solution to address this issue. The target-aware fusion strategy seems to be an effective way to enhance the contrast between pedestrians and the background, leading to a reduction in false positives.

However, the paper doesn't delve into the potential limitations of the proposed approach. For example, it's unclear how well TFDet would perform in more complex or crowded scenes, or how it might handle occlusions or other challenging scenarios. Additionally, the paper doesn't discuss the computational cost of the target-aware fusion process and how it might impact the overall efficiency of the system.

Further research could explore the generalization of the TFDet approach to other multispectral object detection tasks, as well as investigate ways to optimize the fusion process for even better efficiency without sacrificing performance.

Conclusion

This paper presents a significant advancement in multispectral pedestrian detection by addressing the issue of false positives. The proposed TFDet approach, with its target-aware fusion strategy, achieves state-of-the-art performance on benchmark datasets while maintaining comparable inference efficiency. This is a crucial step forward in ensuring the safety of pedestrians on roads, especially in low-light conditions. The insights gained from this research could inspire further developments in multispectral object detection and contribute to the broader goal of improved computer vision for real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection

Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen

Pedestrian detection plays a critical role in computer vision as it contributes to ensuring traffic safety. Existing methods that rely solely on RGB images suffer from performance degradation under low-light conditions due to the lack of useful information. To address this issue, recent multispectral detection approaches have combined thermal images to provide complementary information and have obtained enhanced performances. Nevertheless, few approaches focus on the negative effects of false positives caused by noisy fused feature maps. Different from them, we comprehensively analyze the impacts of false positives on the detection performance and find that enhancing feature contrast can significantly reduce these false positives. In this paper, we propose a novel target-aware fusion strategy for multispectral pedestrian detection, named TFDet. TFDet achieves state-of-the-art performance on two multispectral pedestrian benchmarks, KAIST and LLVIP. TFDet can easily extend to multi-class object detection scenarios. It outperforms the previous best approaches on two multispectral object detection benchmarks, FLIR and M3FD. Importantly, TFDet has comparable inference efficiency to the previous approaches, and has remarkably good detection performance even under low-light conditions, which is a significant advancement for ensuring road safety.

8/28/2024

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate modality bias in multispectral pedestrian detection using Large Language Models (LLMs). Accordingly, we design a Multispectral Chain-of-Thought (MSCoT) prompting strategy, which prompts the LLM to perform multispectral pedestrian detection. Moreover, we propose a novel Multispectral Chain-of-Thought Detection (MSCoTDet) framework that integrates MSCoT prompting into multispectral pedestrian detection. To this end, we design a Language-driven Multi-modal Fusion (LMF) strategy that enables fusing the outputs of MSCoT prompting with the detection results of vision-based multispectral pedestrian detection models. Extensive experiments validate that MSCoTDet effectively mitigates modality biases and improves multispectral pedestrian detection.

5/30/2024

Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework

Eliraz Orfaig, Inna Stainvas, Igal Bilik

Vision-based autonomous driving requires reliable and efficient object detection. This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data. Within this framework, ground truth bounding boxes are randomly reshaped as part of the training phase, allowing the model to learn the reverse diffusion process of noise addition. The system methodically enhances a randomly generated set of boxes at the inference stage, guiding them toward accurate final detections. By integrating the textural and color features from RGB images with the spatial depth information from the LiDAR sensors, the proposed framework employs a feature fusion that substantially enhances object detection of automotive targets. The $2.3$ AP gain in detecting automotive targets is achieved through comprehensive experiments using the KITTI dataset. Specifically, the improved performance of the proposed approach in detecting small objects is demonstrated.

6/6/2024

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen

Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, we address this issue by improving the performance of efficient single-branch structures. We revisit the reasons causing the performance gap between these structures. For the first time, we reveal the information interference problem in the naive early-fusion strategy adopted by previous single-branch structures. Besides, we find that the domain gap between multispectral images, and weak feature representation of the single-branch structure are also key obstacles for performance. Focusing on these three problems, we propose corresponding solutions, including a novel shape-priority early-fusion strategy, a weakly supervised learning method, and a core knowledge distillation technique. Experiments demonstrate that single-branch networks equipped with these three contributions achieve significant performance enhancements while retaining high efficiency. Our code will be available at url{https://github.com/XueZ-phd/Efficient-RGB-T-Early-Fusion-Detection}.

5/28/2024