QUEST: Query Stream for Practical Cooperative Perception

Read original: arXiv:2308.01804 - Published 5/24/2024 by Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

🤖

Overview

This paper proposes a novel concept called "query cooperation" to enable flexible and interpretable feature interaction between agents in a cooperative perception system.
The authors develop a framework called QUEST that allows query streams to flow between agents, enabling co-awareness of instances and complementation for individually unaware instances.
The paper evaluates QUEST using a real-world dataset for camera-based vehicle-infrastructure perception, demonstrating its effectiveness and advantages in terms of transmission flexibility and robustness to packet dropout.

Plain English Explanation

In cooperative perception systems, multiple agents (e.g., vehicles or infrastructure sensors) work together to enhance their individual perception capabilities. Existing cooperation paradigms are either interpretable (where agents share their final results) or flexible (where agents share their intermediate features).

The authors of this paper introduce a new concept called query cooperation, which aims to strike a balance between interpretability and flexibility. The key idea is that agents can share their queries with each other, rather than just their final results or intermediate features.

To implement this concept, the researchers developed a framework called QUEST. In QUEST, agents can pass their queries to other agents, who can then use these queries to enhance their own perception. This process involves two main steps:

Co-awareness: Agents can fuse their queries to become aware of the same instances, improving their overall perception.
Complementation: For instances that individual agents are not aware of, the shared queries can help fill in the gaps, allowing the agents to perceive a more complete scene.

The authors tested QUEST using a real-world dataset for camera-based vehicle-infrastructure perception, a common application of cooperative perception. The results show that QUEST is effective and has advantages in terms of transmission flexibility and robustness to packet loss, which are important considerations for practical deployment.

Technical Explanation

The paper proposes a novel query cooperation paradigm for cooperative perception, which enables interpretable instance-level flexible feature interaction between agents. The authors develop a framework called QUEST to implement this concept.

In QUEST, the query streams from different agents are allowed to flow and interact with each other. This interaction happens in two ways:

Co-aware Instances: For instances that are perceived by multiple agents, the cross-agent queries are fused to enhance the overall understanding of these instances.
Individually Unaware Instances: For instances that are only perceived by individual agents, the shared queries help to complement the perception of these instances, providing a more complete scene understanding.

The authors evaluate QUEST using a real-world DAIR-V2X-Seq dataset for camera-based vehicle-infrastructure perception. The results demonstrate the effectiveness of QUEST and reveal the advantages of the query cooperation paradigm in terms of transmission flexibility and robustness to packet dropout, which are critical for the practical deployment of cooperative perception systems in autonomous driving applications.

Critical Analysis

The paper presents a novel and promising approach to cooperative perception, addressing the trade-off between interpretability and flexibility. The query cooperation concept and the QUEST framework are well-designed and the experimental results are compelling.

However, the paper could have provided more details on the specific mechanisms of query interaction and the fusion/complementation processes. Additionally, the researchers could have explored the scalability of the QUEST framework as the number of agents increases, as well as the computational overhead and latency implications.

Furthermore, the paper does not discuss the potential security and privacy implications of sharing queries between agents, which could be an important consideration for real-world deployments. The authors could have also addressed the robustness of QUEST to malicious or faulty agents in the cooperative perception system.

Despite these potential areas for improvement, the overall contribution of this paper is significant, as it introduces a new paradigm for cooperative perception that could pave the way for more advanced and practical applications, particularly in the field of autonomous driving.

Conclusion

This paper presents a novel query cooperation concept and a corresponding QUEST framework for cooperative perception systems. The key idea is to enable flexible and interpretable feature interaction between agents by allowing them to share their queries, rather than just their final results or intermediate features.

The experimental results on a real-world dataset for camera-based vehicle-infrastructure perception demonstrate the effectiveness of QUEST and its advantages in terms of transmission flexibility and robustness to packet dropout. This work represents an important step forward in the field of cooperative perception, with the potential to enhance the performance and practical deployment of autonomous driving systems and other applications that rely on collaborative sensing.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

QUEST: Query Stream for Practical Cooperative Perception

Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice.

5/24/2024

Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

Jiaru Zhong, Haibao Yu, Tianyi Zhu, Jiahui Xu, Wenxian Yang, Zaiqing Nie, Chao Sun

Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. However, cooperative perception still faces numerous challenges, including limited communication bandwidth and practical communication interruptions. In this paper, we propose CTCE, a novel framework for cooperative 3D object detection. This framework transmits queries with temporal contexts enhancement, effectively balancing transmission efficiency and performance to accommodate real-world communication conditions. Additionally, we propose a temporal-guided fusion module to further improve performance. The roadside temporal enhancement and vehicle-side spatial-temporal fusion together constitute a multi-level temporal contexts integration mechanism, fully leveraging temporal information to enhance performance. Furthermore, a motion-aware reconstruction module is introduced to recover lost roadside queries due to communication interruptions. Experimental results on V2X-Seq and V2X-Sim datasets demonstrate that CTCE outperforms the baseline QUEST, achieving improvements of 3.8% and 1.3% in mAP, respectively. Experiments under communication interruption conditions validate CTCE's robustness to communication interruptions.

8/21/2024

Situation-aware Autonomous Driving Decision Making with Cooperative Perception on Demand

Wei Liu

This paper investigates the impact of cooperative perception on autonomous driving decision making on urban roads. The extended perception range contributed by the cooperative perception can be properly leveraged to address the implicit dependencies within the vehicles, thereby the vehicle decision making performance can be improved. Meanwhile, we acknowledge the inherent limitation of wireless communication and propose a Cooperative Perception on Demand (CPoD) strategy, where the cooperative perception will only be activated when the extended perception range is necessary for proper situation-awareness. The situation-aware decision making with CPoD is modeled as a Partially Observable Markov Decision Process (POMDP) and solved in an online manner. The evaluation results demonstrate that the proposed approach can function safely and efficiently for autonomous driving on urban roads.

9/4/2024

Semantic Communication for Cooperative Perception using HARQ

Yucheng Sheng, Le Liang, Hao Ye, Shi Jin, Geoffrey Ye Li

Cooperative perception, offering a wider field of view than standalone perception, is becoming increasingly crucial in autonomous driving. This perception is enabled through vehicle-to-vehicle (V2V) communication, allowing connected automated vehicles (CAVs) to exchange sensor data, such as light detection and ranging (LiDAR) point clouds, thereby enhancing the collective understanding of the environment. In this paper, we leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework that employs intermediate fusion. To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of orthogonal frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies. Furthermore, recognizing the necessity for reliable transmission, especially in the low SNR scenarios, we introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ). Simulation results show that our model surpasses the traditional separate source-channel coding methods in perception performance, both with and without HARQ. Additionally, in terms of throughput, our proposed HARQ schemes demonstrate superior efficiency to the conventional coding approaches.

9/17/2024