Optimizing traffic signs and lights visibility for the teleoperation of autonomous vehicles through ROI compression

2404.02481

Published 4/4/2024 by I. Dror, O. Hadar

🌿

Abstract

Autonomous vehicles are a promising solution to traffic congestion, air pollution, accidents, and wasted time and resources. However, remote driver intervention may be necessary for extreme situations to ensure safe roadside parking or complete remote takeover. In such cases, high-quality real-time video streaming is crucial for practical remote driving. In a preliminary study, we already presented a region of interest (ROI) HEVC data compression where the image was segmented into two categories of ROI and background, allocating more bandwidth to the ROI, yielding an improvement in the visibility of the classes that essential for driving while transmitting the background with lesser quality. However, migrating bandwidth to the large ROI portion of the image doesn't substantially improve the quality of traffic signs and lights. This work categorized the ROIs into either background, weak ROI, or strong ROI. The simulation-based approach uses a photo-realistic driving scenario database created with the Cognata self-driving car simulation platform. We use semantic segmentation to categorize the compression quality of a Coding Tree Unit (CTU) according to each pixel class. A background CTU can contain only sky, trees, vegetation, or building classes. Essentials for remote driving include significant classes such as roads, road marks, cars, and pedestrians. And most importantly, traffic signs and traffic lights. We apply thresholds to decide if the number of pixels in a CTU of a particular category is enough to declare it as belonging to the strong or weak ROI. Then, we allocate the bandwidth according to the CTU categories. Our results show that the perceptual quality of traffic signs, especially textual signs and traffic lights, improves significantly by up to 5.5 dB compared to the only background and foreground partition, while the weak ROI classes at least retain their original quality.

Create account to get full access

Overview

Autonomous vehicles offer solutions to common issues like traffic congestion, pollution, accidents, and wasted time and resources.
Remote driver intervention may be necessary for extreme situations to ensure safe roadside parking or complete remote takeover.
High-quality real-time video streaming is crucial for practical remote driving in these cases.

Plain English Explanation

Self-driving cars have a lot of potential benefits - they could help reduce traffic jams, air pollution, crashes, and time wasted commuting. However, there may be rare situations where a human driver needs to take control remotely. For this to work well, it's important to have a high-quality live video feed from the car to the remote driver.

Previous research has shown that compressing the video feed using a technique that gives more bandwidth to the most important parts of the image (like the road, other vehicles, and pedestrians) can improve the visibility of these crucial elements for driving. But this approach didn't do enough to make sure traffic signs and lights were clearly visible.

This new study looks at an improved way to compress the video feed, dividing the image into background, "weak" important areas, and "strong" important areas. The strong areas, which include traffic signs and signals, get the most bandwidth to ensure they stay high-quality. This helps the remote driver clearly see these vital pieces of information.

Technical Explanation

The researchers used a photo-realistic driving simulation to test their compression approach. They first categorized different parts of the image into background (like sky, trees, and buildings), "weak" important areas (like roads, lane markings, cars, and pedestrians), and "strong" important areas (traffic signs and signals).

They then allocated more bandwidth to the "strong" important areas compared to previous methods. This resulted in a significant improvement of up to 5.5 dB in the perceptual quality of traffic signs and lights, while at least maintaining the original quality of the "weak" important areas.

Critical Analysis

The paper provides a promising approach to improving remote driving capabilities for autonomous vehicles, addressing a key limitation of previous compression methods. However, the research is still at a simulation-based stage, so further real-world testing would be necessary to validate the findings.

Additionally, the paper does not delve into potential privacy or security concerns that may arise from transmitting high-quality video feeds from autonomous vehicles to remote operators. These are important considerations that should be explored in future research.

Conclusion

This study presents an enhanced video compression technique for autonomous vehicle remote driving that prioritizes the visibility of critical elements like traffic signs and signals. By intelligently allocating bandwidth, it can significantly improve the remote driver's ability to safely intervene in extreme situations. While still in the simulation stage, this work represents an important step forward in enabling the safe and practical deployment of autonomous vehicles.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧪

Real-time Lane-wise Traffic Monitoring in Optimal ROIs

Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automated solution is desired. This paper introduces a novel system that learns the locations of highway lanes and traffic directions from these camera feeds automatically. It collects real-time, lane-specific traffic data continuously, even adjusting for changes in camera angle or zoom. This facilitates efficient traffic analysis, decision-making, and improved highway safety.

4/24/2024

cs.CV eess.IV

Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

Jixiang Luo, Yan Wang, Hongwei Qin

Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.

5/24/2024

eess.IV cs.CV

🔎

Audio-Visual Traffic Light State Detection for Urban Robots

Sagar Gupta, Akansel Cosgun

We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors. The fusion method aggregates features across multiple frames in a given timeframe, increasing robustness and adaptability. Results show that our approach effectively addresses the challenge of visual occlusion and surpasses the performance of single-modality solutions when the robot is in motion. This study serves as a proof of concept, highlighting the significant, yet often overlooked, potential of multi-modal perception in robotics.

5/1/2024

cs.RO

MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and use multi-task heads to delineate road centerlines, boundary lines, pedestrian crossings, and other areas. However, these algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded. Therefore, in this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue. We employed map encoder pre-training to enhance the network's geometric encoding capabilities and utilized YOLOX to improve traffic element detection precision. Additionally, for area detection, we innovatively introduced LDTR and auxiliary tasks to achieve higher precision. As a result, our final OLUS score is 0.58.

6/17/2024

cs.CV