Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision






Published 4/11/2024 by Mat'ias Mattamala, Jonas Frey, Piotr Libera, Nived Chebrolu, Georg Martius, Cesar Cadena, Marco Hutter, Maurice Fallon



Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field, only using onboard sensing and computing. One of the key ideas to achieve this is the use of high-dimensional features from pre-trained self-supervised models, which implicitly encode semantic information that massively simplifies the learning task. Further, the development of an online scheme for supervision generator enables concurrent training and inference of the learned model in the wild. We demonstrate our approach through diverse real-world deployments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex, previously unseen outdoor terrains. Code: https://bit.ly/498b0CV - Project page:https://bit.ly/3M6nMHH

Create account to get full access


If you already have an account, we'll log you in


  • This paper presents "Wild Visual Navigation (WVN)", an online self-supervised learning system for visual traversability estimation in natural environments like forests and grasslands.
  • The key idea is to use high-dimensional features from pre-trained self-supervised models, which can implicitly encode semantic information to simplify the learning task.
  • The system can continuously adapt from a short human demonstration in the field, using only onboard sensing and computing.
  • An online scheme for supervision generation enables concurrent training and inference of the learned model in the wild.
  • The system can bootstrap traversable terrain segmentation in less than 5 minutes of in-field training, enabling robots to navigate complex, previously unseen outdoor terrains.

Plain English Explanation

Navigating robots through natural environments like forests and grasslands can be challenging because the robots can mistakenly perceive things like tall grass, twigs, or bushes as solid obstacles. To address this, the researchers developed a system called "Wild Visual Navigation (WVN)" that can help robots better understand what is traversable terrain in these complex outdoor settings.

The key innovation is the use of high-dimensional features from pre-trained machine learning models that have been trained on a large amount of general data. These features can implicitly capture semantic information about the environment, which makes it much easier for the robot to learn what is traversable and what is not. The system can quickly adapt to a new environment by having a human demonstrate a short example of what is traversable, and the robot can then continuously refine its understanding as it moves through the environment.

This allows the robot to bootstrap its understanding of the terrain in just 5 minutes, and then navigate through complex, previously unseen outdoor areas that would be difficult for a traditional navigation system to handle. The researchers demonstrate this system working in real-world deployments in forests, parks, and grasslands, showing its ability to help robots safely and effectively navigate through natural environments.

Technical Explanation

The core of the WVN system is a deep neural network that is trained to estimate the traversability of the terrain from visual inputs. To simplify this learning task, the system leverages high-dimensional visual features extracted from pre-trained self-supervised models. These features implicitly encode semantic information about the environment, which makes it much easier for the neural network to learn what terrain is navigable.

The system is designed to adapt continuously from a short human demonstration in the field. An online scheme for supervision generation enables the robot to concurrently train and infer the learned traversability model as it moves through the environment. This allows the robot to quickly bootstrap its understanding of the terrain, requiring only about 5 minutes of in-field training time.

The researchers evaluated WVN through diverse real-world deployments in forests, parks, and grasslands. They show that the system can enable robots to effectively navigate complex, previously unseen outdoor terrains, overcoming the challenges posed by false perceptions of rigid obstacles from tall vegetation. This represents an important step forward for making robots more capable of autonomous navigation in natural environments.

Critical Analysis

The paper presents a compelling approach to addressing a key challenge in robotic navigation through natural environments. By leveraging pre-trained visual features, the system is able to significantly simplify the learning task, allowing for rapid adaptation to new settings.

However, the paper does not provide a detailed analysis of the system's performance, such as quantitative metrics or comparisons to existing methods. The evaluation is primarily qualitative, showing the system's ability to navigate through various outdoor environments. More rigorous experimentation and benchmarking would help to better understand the system's strengths, weaknesses, and limitations.

Additionally, the paper does not address potential issues around robustness and generalization. It's unclear how well the system would perform in more extreme or unseen environments, or how it would handle changes in weather, lighting, or other environmental factors. Exploring these aspects would be valuable for understanding the practical deployment capabilities of the WVN system.

Overall, the paper presents an interesting and potentially impactful approach to visual traversability estimation, but additional research and evaluation would be needed to fully assess its capabilities and limitations. Encouraging readers to think critically about the research and form their own opinions is important, as [the field of robotics and navigation continues to evolve rapidly.


The "Wild Visual Navigation (WVN)" system presented in this paper offers a promising approach to enabling robots to navigate effectively through complex natural environments like forests and grasslands. By leveraging high-dimensional visual features from pre-trained self-supervised models, the system can quickly adapt to new settings and learn to accurately estimate terrain traversability, overcoming the challenges posed by false perceptions of rigid obstacles.

The real-world deployments demonstrated in the paper suggest that this technology could have significant practical applications, particularly for tasks that require robots to operate autonomously in unstructured outdoor environments. As the field of robotic navigation continues to evolve, innovations like WVN will be crucial for expanding the capabilities of robots and enabling them to interact with and navigate through the natural world in more seamless and effective ways.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers


ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Pascal Roth, Julian Nubert, Fan Yang, Mayank Mittal, Marco Hutter





Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.

Read more



ForestTrav: Accurate, Efficient and Deployable Forest Traversability Estimation for Autonomous Ground Vehicles

Fabio Ruetz, Nicholas Lawrance, Emili Hern'andez, Paulo Borges, Thierry Peynot





Autonomous navigation in unstructured vegetated environments remains an open challenge. To successfully operate in these settings, ground vehicles must assess the traversability of the environment and determine which vegetation is pliable enough to push through. In this work, we propose a novel method that combines a high-fidelity and feature-rich 3D voxel representation while leveraging the structural context and sparseness of SCNN's to assess Traversability Estimation (TE) in densely vegetated environments. The proposed method is thoroughly evaluated on an accurately-labeled real-world data set that we provide to the community. It is shown to outperform state-of-the-art methods by a significant margin (0.59 vs. 0.39 MCC score at 0.1m voxel resolution) in challenging scenes and to generalize to unseen environments. In addition, the method is economical in the amount of training data and training time required: a model is trained in minutes on a desktop computer. We show that by exploiting the context of the environment, our method can use different feature combinations with only limited performance variations. For example, our approach can be used with lidar-only features, whilst still assessing complex vegetated environments accurately, which was not demonstrated previously in the literature in such environments. In addition, we propose an approach to assess a traversability estimator's sensitivity to information quality and show our method's sensitivity is low.

Read more


Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Matthew Gadd, Daniele De Martini, Luke Pitt, Wayne Tubby, Matthew Towlson, Chris Prahacs, Oliver Bartlett, John Jackson, Man Qi, Paul Newman, Andrew Hector, Roberto Salguero-G'omez, Nick Hawes





We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings.

Read more


Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang





Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VLN models trained with panoramic observation, perform better but are difficult to deploy on most monocular robots. For this case, we propose a sim-to-real transfer approach to endow the monocular robots with panoramic traversability perception and panoramic semantic understanding, thus smoothly transferring the high-performance panoramic VLN models to the common monocular robots. In this work, the semantic traversable map is proposed to predict agent-centric navigable waypoints, and the novel view representations of these navigable waypoints are predicted through the 3D feature fields. These methods broaden the limited field of view of the monocular robots and significantly improve navigation performance in the real world. Our VLN system outperforms previous SOTA monocular VLN methods in R2R-CE and RxR-CE benchmarks within the simulation environments and is also validated in real-world environments, providing a practical and high-performance solution for real-world VLN.

Read more
