Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation

Read original: arXiv:2406.07202 - Published 6/12/2024 by Diwei Sheng, Giles Hamilton-Fletcher, Mahya Beheshti, Chen Feng, John-Ross Rizzo

👀

Overview

This research paper investigates the efficacy of curb segmentation using state-of-the-art foundation models.
Curbs are critical for safe navigation, especially for people with blindness or low vision, but current models struggle to accurately identify them.
The researchers introduce a new large-scale dataset to benchmark leading foundation models and propose solutions to improve curb segmentation performance.

Plain English Explanation

Curbs are the raised edges along sidewalks that help separate pedestrian and vehicle traffic. They are an important feature for safe navigation, especially for people with blindness or low vision, as they provide a physical boundary. However, accurately identifying curbs can be challenging, particularly for artificial intelligence (AI) systems.

The researchers in this paper looked at how well the latest AI foundation models, which are powerful general-purpose models, can perform the task of curb segmentation. Foundation models are trained on large amounts of data and can be adapted to various tasks, but the researchers found they still struggle with curb segmentation.

The researchers created a new, large dataset of curb images to test the performance of these foundation models. They found that the models had high false-positive rates, meaning they often incorrectly identified non-curb areas as curbs. The models also had trouble distinguishing curbs from similar-looking structures like sidewalks.

Additionally, the best-performing model took around 3.7 seconds to process each image, which is too slow to provide real-time assistance for navigation. The researchers proposed solutions like using filtered bounding boxes to improve the accuracy and speed of curb segmentation.

Overall, the research highlights the need for specialized datasets and tailored model training to address the unique challenges of navigation for people with blindness or low vision. While foundation models are flexible, they still have limitations when applied to specific, safety-critical tasks like identifying curbs.

Technical Explanation

The researchers investigated the performance of state-of-the-art foundation models on the task of curb segmentation. Curbs are critical spatial features that delineate safe pedestrian zones from potentially hazardous vehicle traffic, especially for people with blindness or low vision.

To benchmark the models, the researchers introduced the largest curb segmentation dataset to date. They evaluated leading foundation models, including Segformer, Swin Transformer, and HRNet, on this new dataset. The results showed that these models face significant challenges in accurately identifying curbs, with high false-positive rates (up to 95%) and poor performance distinguishing curbs from similar-looking structures like sidewalks.

Additionally, the best-performing model took an average of 3.70 seconds to process each image, which is too slow to provide real-time assistance for navigation. To address these limitations, the researchers proposed solutions such as filtered bounding box selection to improve the accuracy and speed of curb segmentation.

Critical Analysis

While the researchers demonstrated the limitations of current foundation models for the specific task of curb segmentation, they acknowledge that these models are still flexible and powerful general-purpose tools. The performance issues highlighted in the paper may be addressable through further fine-tuning, specialized dataset curation, and architectural modifications.

However, the researchers rightly emphasize the critical need for tailored solutions to address the unique challenges of navigation assistance for people with blindness or low vision. The safety-critical nature of this application requires robust and reliable performance, which may not be easily achievable with off-the-shelf foundation models.

The paper also raises broader questions about the suitability of foundation models for specialized, safety-critical tasks. While these models have shown impressive versatility, the research suggests that domain-specific constraints and requirements may necessitate more targeted model development and training approaches.

Conclusion

This research underscores the challenges of using state-of-the-art foundation models for the specific task of curb segmentation, which is crucial for safe navigation, particularly for people with blindness or low vision. The researchers' introduction of a large-scale curb segmentation dataset and their proposed solutions highlight the need for specialized datasets and tailored model training to address the unique requirements of this application.

The findings of this paper have important implications for the development of assistive technologies and the broader application of foundation models in safety-critical domains. It suggests that while these powerful general-purpose models can be flexible and adaptable, certain tasks may require more focused and domain-specific approaches to achieve the necessary performance and reliability.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation

Diwei Sheng, Giles Hamilton-Fletcher, Mahya Beheshti, Chen Feng, John-Ross Rizzo

Curbs serve as vital borders that delineate safe pedestrian zones from potential vehicular traffic hazards. Curbs also represent a primary spatial hazard during dynamic navigation with significant stumbling potential. Such vulnerabilities are particularly exacerbated for persons with blindness and low vision (PBLV). Accurate visual-based discrimination of curbs is paramount for assistive technologies that aid PBLV with safe navigation in urban environments. Herein, we investigate the efficacy of curb segmentation for foundation models. We introduce the largest curb segmentation dataset to-date to benchmark leading foundation models. Our results show that state-of-the-art foundation models face significant challenges in curb segmentation. This is due to their high false-positive rates (up to 95%) with poor performance distinguishing curbs from curb-like objects or non-curb areas, such as sidewalks. In addition, the best-performing model averaged a 3.70-second inference time, underscoring problems in providing real-time assistance. In response, we propose solutions including filtered bounding box selections to achieve more accurate curb segmentation. Overall, despite the immediate flexibility of foundation models, their application for practical assistive technology applications still requires refinement. This research highlights the critical need for specialized datasets and tailored model training to address navigation challenges for PBLV and underscores implicit weaknesses in foundation models.

6/12/2024

👀

Multi-faceted Sensory Substitution for Curb Alerting: A Pilot Investigation in Persons with Blindness and Low Vision

Ligao Ruan, Giles Hamilton-Fletcher, Mahya Beheshti, Todd E Hudson, Maurizio Porfiri, JR Rizzo

Curbs -- the edge of a raised sidewalk at the point where it meets a street -- crucial in urban environments where they help delineate safe pedestrian zones, from dangerous vehicular lanes. However, curbs themselves are significant navigation hazards, particularly for people who are blind or have low vision (pBLV). The challenges faced by pBLV in detecting and properly orientating themselves for these abrupt elevation changes can lead to falls and serious injuries. Despite recent advancements in assistive technologies, the detection and early warning of curbs remains a largely unsolved challenge. This paper aims to tackle this gap by introducing a novel, multi-faceted sensory substitution approach hosted on a smart wearable; the platform leverages an RGB camera and an embedded system to capture and segment curbs in real time and provide early warning and orientation information. The system utilizes YOLO (You Only Look Once) v8 segmentation model, trained on our custom curb dataset for the camera input. The output of the system consists of adaptive auditory beeps, abstract sonification, and speech, conveying information about the relative distance and orientation of curbs. Through human-subjects experimentation, we demonstrate the effectiveness of the system as compared to the white cane. Results show that our system can provide advanced warning through a larger safety window than the cane, while offering nearly identical curb orientation information.

8/29/2024

CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation

Guoyang Zhao, Fulong Ma, Weiqing Qi, Yuxuan Liu, Ming Liu

Curb detection is a crucial function in intelligent driving, essential for determining drivable areas on the road. However, the complexity of road environments makes curb detection challenging. This paper introduces CurbNet, a novel framework for curb detection utilizing point cloud segmentation. To address the lack of comprehensive curb datasets with 3D annotations, we have developed the 3D-Curb dataset based on SemanticKITTI, currently the largest and most diverse collection of curb point clouds. Recognizing that the primary characteristic of curbs is height variation, our approach leverages spatially rich 3D point clouds for training. To tackle the challenges posed by the uneven distribution of curb features on the xy-plane and their dependence on high-frequency features along the z-axis, we introduce the Multi-Scale and Channel Attention (MSCA) module, a customized solution designed to optimize detection performance. Additionally, we propose an adaptive weighted loss function group specifically formulated to counteract the imbalance in the distribution of curb point clouds relative to other categories. Extensive experiments conducted on 2 major datasets demonstrate that our method surpasses existing benchmarks set by leading curb detection and point cloud segmentation models. Through the post-processing refinement of the detection results, we have significantly reduced noise in curb detection, thereby improving precision by 4.5 points. Similarly, our tolerance experiments also achieved state-of-the-art results. Furthermore, real-world experiments and dataset analyses mutually validate each other, reinforcing CurbNet's superior detection capability and robust generalizability. The project website is available at: https://github.com/guoyangzhao/CurbNet/.

5/31/2024

👀

Fine-tuning vision foundation model for crack segmentation in civil infrastructures

Kang Ge, Chen Wang, Yutao Guo, Yansong Tang, Zhenzhong Hu, Hongbing Chen

Large-scale foundation models have become the mainstream deep learning method, while in civil engineering, the scale of AI models is strictly limited. In this work, a vision foundation model is introduced for crack segmentation. Two parameter-efficient fine-tuning methods, adapter and low-rank adaptation, are adopted to fine-tune the foundation model in semantic segmentation: the Segment Anything Model (SAM). The fine-tuned CrackSAM shows excellent performance on different scenes and materials. To test the zero-shot performance of the proposed method, two unique datasets related to road and exterior wall cracks are collected, annotated and open-sourced, for a total of 810 images. Comparative experiments are conducted with twelve mature semantic segmentation models. On datasets with artificial noise and previously unseen datasets, the performance of CrackSAM far exceeds that of all state-of-the-art models. CrackSAM exhibits remarkable superiority, particularly under challenging conditions such as dim lighting, shadows, road markings, construction joints, and other interference factors. These cross-scenario results demonstrate the outstanding zero-shot capability of foundation models and provide new ideas for developing vision models in civil engineering.

4/24/2024