MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

Read original: arXiv:2310.16917 - Published 9/6/2024 by Kelin Yu, Yunhai Han, Qixian Wang, Vaibhav Saxena, Danfei Xu, Ye Zhao

🌿

Overview

Tactile sensing is crucial for fine-grained, contact-rich manipulation tasks like insertion and assembly.
Prior research has shown the potential of learning tactile-guided policies from teleoperated demonstration data.
However, human users often rely on visual feedback to control the robot, creating a gap between the sensing modality used for control (visual) and the modality of interest (tactile).
To bridge this gap, the paper introduces MimicTouch, a framework for learning policies directly from demonstrations provided by human users with their hands.

Plain English Explanation

The paper introduces a novel framework called MimicTouch that aims to bridge the gap between visual and tactile sensing in robotic manipulation tasks. In many contact-rich manipulation tasks, such as inserting a part into an assembly or delicately maneuvering objects, tactile feedback is crucial for the robot to perform the task successfully. However, when humans demonstrate these tasks to the robot, they often rely on visual feedback to control the robot, rather than using their sense of touch.

The MimicTouch framework addresses this issue by allowing the robot to learn directly from the tactile feedback and control strategies used by human demonstrators. The key innovations are:

A human tactile data collection system that captures multi-modal tactile data from the human's hand movements, which can be used to understand the human's tactile-guided control strategy.
An imitation learning-based framework for the robot to learn the human's tactile-guided control strategy from the collected data.
An online residual reinforcement learning (RL) framework to bridge the embodiment gap between the human hand and the robot gripper.

By leveraging the human's tactile-guided control strategy, the MimicTouch framework aims to enable the robot to perform contact-rich manipulation tasks more effectively.

Technical Explanation

The key technical innovations of the MimicTouch framework are:

Human Tactile Data Collection System: The authors developed a system to capture multi-modal tactile data from the human user's hand movements during task demonstrations. This includes information from various tactile sensors embedded in a glove worn by the human, as well as visual and proprioceptive data.
Imitation Learning-based Framework: The authors used an imitation learning approach to train the robot to mimic the human's tactile-guided control strategy. They designed an architecture that takes the human's tactile data as input and learns to predict the corresponding control commands for the robot.
Online Residual Reinforcement Learning: To bridge the embodiment gap between the human hand and the robot gripper, the authors employed an online residual RL framework. This allowed the robot to fine-tune the imitation-learned policy by interacting with the environment and receiving tactile feedback, further improving its performance on the target task.

Through comprehensive experiments, the authors demonstrated the efficacy of the MimicTouch framework in resolving contact-rich manipulation tasks by leveraging the human's tactile-guided control strategy.

Critical Analysis

The MimicTouch framework presents an innovative approach to addressing the gap between visual and tactile sensing in robotic manipulation tasks. By learning directly from human tactile demonstrations, the framework aims to enable more effective contact-rich manipulation.

One potential limitation of the approach is the need for a specialized data collection system to capture the human's tactile data. This may limit the scalability and accessibility of the framework, as it may require specialized hardware and setup.

Additionally, the authors mention that there is still an embodiment gap between the human hand and the robot gripper, which the online residual RL framework aims to address. This suggests that further research may be needed to fully bridge this gap and ensure seamless transfer of the learned tactile-guided control strategy to the robot.

It would also be interesting to see how the MimicTouch framework performs on a wider range of manipulation tasks, beyond the specific ones evaluated in the paper. Assessing its generalizability and robustness would be valuable for understanding the broader applicability of the approach.

Conclusion

The MimicTouch framework presents a novel approach to leveraging human tactile demonstrations for learning effective robotic manipulation policies. By bridging the gap between visual and tactile sensing, the framework aims to enable robots to better perform contact-rich tasks, such as insertion and assembly.

The key innovations of the framework, including the human tactile data collection system, the imitation learning-based policy learning, and the online residual RL, provide a comprehensive solution to this challenge. The promising results showcased in the paper suggest that this approach could have significant implications for advancing the field of robotic manipulation and interaction.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

Kelin Yu, Yunhai Han, Qixian Wang, Vaibhav Saxena, Danfei Xu, Ye Zhao

Tactile sensing is critical to fine-grained, contact-rich manipulation tasks, such as insertion and assembly. Prior research has shown the possibility of learning tactile-guided policy from teleoperated demonstration data. However, to provide the demonstration, human users often rely on visual feedback to control the robot. This creates a gap between the sensing modality used for controlling the robot (visual) and the modality of interest (tactile). To bridge this gap, we introduce MimicTouch, a novel framework for learning policies directly from demonstrations provided by human users with their hands. The key innovations are i) a human tactile data collection system which collects multi-modal tactile dataset for learning human's tactile-guided control strategy, ii) an imitation learning-based framework for learning human's tactile-guided control strategy through such data, and iii) an online residual RL framework to bridge the embodiment gap between the human hand and the robot gripper. Through comprehensive experiments, we highlight the efficacy of utilizing human's tactile-guided control strategy to resolve contact-rich manipulation tasks. The project website is at https://sites.google.com/view/MimicTouch.

9/6/2024

🤿

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek

Contact-rich tasks continue to present a variety of challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact rich tasks that involve relative motion (slipping/sliding) between the end-effector and object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door opening tasks with a variety of observation and method configurations to study the utility of our proposed improvements and multimodal visuotactile sensing. Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

6/27/2024

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

8/1/2024

Learning Tactile Insertion in the Real World

Daniel Palenicek, Theo Gruner, Tim Schneider, Alina Bohm, Janis Lenz, Inga Pfenning, Eric Kramer, Jan Peters

Humans have exceptional tactile sensing capabilities, which they can leverage to solve challenging, partially observable tasks that cannot be solved from visual observation alone. Research in tactile sensing attempts to unlock this new input modality for robots. Lately, these sensors have become cheaper and, thus, widely available. At the same time, the question of how to integrate them into control loops is still an active area of research, with central challenges being partial observability and the contact-rich nature of manipulation tasks. In this study, we propose to use Reinforcement Learning to learn an end-to-end policy, mapping directly from tactile sensor readings to actions. Specifically, we use Dreamer-v3 on a challenging, partially observable robotic insertion task with a Franka Research 3, both in simulation and on a real system. For the real setup, we built a robotic platform capable of resetting itself fully autonomously, allowing for extensive training runs without human supervision. Our preliminary results indicate that Dreamer is capable of utilizing tactile inputs to solve robotic manipulation tasks in simulation and reality. Furthermore, we find that providing the robot with tactile feedback generally improves task performance, though, in our setup, we do not yet include other sensing modalities. In the future, we plan to utilize our platform to evaluate a wide range of other Reinforcement Learning algorithms on tactile tasks.

8/1/2024