Visuo-Tactile based Predictive Cross Modal Perception for Object Exploration in Robotics

2405.12634

Published 5/24/2024 by Anirvan Dutta, Etienne Burdet, Mohsen Kaboli

📉

Abstract

Autonomously exploring the unknown physical properties of novel objects such as stiffness, mass, center of mass, friction coefficient, and shape is crucial for autonomous robotic systems operating continuously in unstructured environments. We introduce a novel visuo-tactile based predictive cross-modal perception framework where initial visual observations (shape) aid in obtaining an initial prior over the object properties (mass). The initial prior improves the efficiency of the object property estimation, which is autonomously inferred via interactive non-prehensile pushing and using a dual filtering approach. The inferred properties are then used to enhance the predictive capability of the cross-modal function efficiently by using a human-inspired `surprise' formulation. We evaluated our proposed framework in the real-robotic scenario, demonstrating superior performance.

Create account to get full access

Overview

Robots operating in unstructured environments need to be able to explore and understand the physical properties of novel objects, such as stiffness, mass, center of mass, friction, and shape.
This paper introduces a new framework that uses both visual and tactile information to efficiently estimate these object properties.
The framework uses an initial visual observation to get a prior estimate of the object's mass, which helps improve the efficiency of the subsequent interactive exploration and property estimation.
The inferred properties are then used to enhance the system's ability to predict the object's behavior, inspired by the human concept of "surprise".
The framework is evaluated in a real-world robotic scenario and demonstrates superior performance.

Plain English Explanation

Robots that operate in unpredictable, real-world environments need to be able to understand the physical properties of the objects they encounter, such as how stiff or heavy they are, where their center of mass is, how much friction they have, and their overall shape. This paper describes a new system that helps robots do this more efficiently.

The system uses both visual and touch-based (tactile) information to estimate these object properties. It starts by taking an initial look at the object's shape, which gives it a rough idea of the object's mass. This initial estimate helps the system become more efficient at the next step, which is actively interacting with the object through gentle pushing motions to learn more about its properties.

As the robot pushes on the object, it uses a special algorithm to infer the object's actual mass, stiffness, friction, and other characteristics. Importantly, the system also learns from this interaction in a way that's inspired by how humans become "surprised" when they encounter something unexpected. This helps the system get better at predicting how the object will behave in the future.

The researchers tested this framework in a real-world robotic scenario and found that it performed very well, better than previous approaches. This could be a big help for robots that need to continuously explore and understand their environment to operate effectively.

Technical Explanation

This paper presents a novel visuo-tactile based predictive cross-modal perception framework for autonomously exploring the unknown physical properties of novel objects. The framework uses an initial visual observation of an object's shape to obtain a prior estimate of its mass, which helps improve the efficiency of the subsequent interactive exploration and property estimation.

The system uses a dual filtering approach to infer the object's mass, stiffness, center of mass, and friction coefficient through non-prehensile pushing interactions. The inferred properties are then used to enhance the predictive capability of the cross-modal function, inspired by the human concept of "surprise" as described in this related work.

The authors evaluated their proposed framework in a real-world robotic scenario, demonstrating superior performance compared to previous approaches. This builds on earlier research on active exploration and haptic feedback for robotic manipulation and integrating visual and tactile sensing for teleoperation.

Critical Analysis

The paper presents a compelling approach to autonomous exploration and property estimation of novel objects, which is an important capability for robots operating in unstructured environments. The use of both visual and tactile information, along with the "surprise" formulation, seems promising for improving the system's predictive abilities.

However, the paper does not delve deeply into the limitations of the approach. For example, it's unclear how the system would handle highly deformable or complex-shaped objects, or how robust it is to noisy or incomplete sensory information. Additionally, the experiments were conducted in a relatively constrained real-world scenario, so further testing in more diverse environments would be helpful to fully assess the framework's performance.

It would also be interesting to see how this approach compares to other recent developments in multi-modal perception and interactive learning for robotics, such as end-to-end learning frameworks or reinforcement learning-based exploration strategies.

Overall, this research represents an important step forward in enabling robots to better understand and interact with their environment, but there are still opportunities for further refinement and exploration.

Conclusion

This paper introduces a novel visuo-tactile based predictive cross-modal perception framework that allows robots to autonomously explore and estimate the physical properties of unknown objects. By using an initial visual observation to obtain a prior estimate of the object's mass, the system can more efficiently infer other key properties like stiffness, friction, and center of mass through interactive exploration.

The inferred properties are then used to enhance the system's predictive capabilities, inspired by the human concept of "surprise". Tested in a real-world robotic scenario, the framework demonstrated superior performance compared to previous approaches.

This work represents an important advancement in enabling robots to better understand and interact with their environment, which is crucial for autonomous systems operating in unstructured settings. While the paper identifies some potential avenues for further research, the core ideas introduced here could have significant implications for the field of robotic perception and manipulation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multi-modal perception for soft robotic interactions using generative models

Enrico Donato, Egidio Falotico, Thomas George Thuruthel

Perception is essential for the active interaction of physical agents with the external environment. The integration of multiple sensory modalities, such as touch and vision, enhances this perceptual process, creating a more comprehensive and robust understanding of the world. Such fusion is particularly useful for highly deformable bodies such as soft robots. Developing a compact, yet comprehensive state representation from multi-sensory inputs can pave the way for the development of complex control strategies. This paper introduces a perception model that harmonizes data from diverse modalities to build a holistic state representation and assimilate essential information. The model relies on the causality between sensory input and robotic actions, employing a generative model to efficiently compress fused information and predict the next observation. We present, for the first time, a study on how touch can be predicted from vision and proprioception on soft robots, the importance of the cross-modal generation and why this is essential for soft robotic interactions in unstructured environments.

4/8/2024

cs.RO cs.AI cs.LG

🌀

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

Shoujie Li, Haixin Yu, Wenbo Ding, Houde Liu, Linqi Ye, Chongkun Xia, Xueqian Wang, Xiao-Ping Zhang

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.

6/11/2024

cs.RO cs.AI

📈

Integrating Visuo-tactile Sensing with Haptic Feedback for Teleoperated Robot Manipulation

Noah Becker, Erik Gattung, Kay Hansel, Tim Schneider, Yaonan Zhu, Yasuhisa Hasegawa, Jan Peters

Telerobotics enables humans to overcome spatial constraints and allows them to physically interact with the environment in remote locations. However, the sensory feedback provided by the system to the operator is often purely visual, limiting the operator's dexterity in manipulation tasks. In this work, we address this issue by equipping the robot's end-effector with high-resolution visuotactile GelSight sensors. Using low-cost MANUS-Gloves, we provide the operator with haptic feedback about forces acting at the points of contact in the form of vibration signals. We propose two different methods for estimating these forces; one based on estimating the movement of markers on the sensor surface and one deep-learning approach. Additionally, we integrate our system into a virtual-reality teleoperation pipeline in which a human operator controls both arms of a Tiago robot while receiving visual and haptic feedback. We believe that integrating haptic feedback is a crucial step for dexterous manipulation in teleoperated robotic systems.

5/1/2024

cs.RO

✅

AcTExplore: Active Tactile Exploration of Unknown Objects

Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermuller, Yiannis Aloimonos

Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes. Project Webpage: https://prg.cs.umd.edu/AcTExplore

6/24/2024

cs.RO cs.CV