Gaze-Based Intention Recognition for Human-Robot Collaboration

Read original: arXiv:2405.07570 - Published 5/14/2024 by Valerio Belcamino, Miwa Takase, Mariya Kilina, Alessandro Carf`i, Akira Shimada, Sota Shimizu, Fulvio Mastrogiovanni

👁️

Overview

This paper explores intent recognition in human-robot collaborative assembly scenarios, specifically for the interactive assembly of a wooden stool.
The researchers compared two approaches: using wearable sensors to track the user's hand movements, and using eye tracking to monitor the user's gaze.
The goal is to improve synchronization between the human and robot during the assembly process by recognizing the user's current state (e.g., idle, active).
The researchers tested these methods with 10 participants and collected data on their effectiveness and user preferences.

Plain English Explanation

The researchers wanted to develop a way for a robot and a human to work together smoothly when assembling a piece of furniture, like a wooden stool. Specifically, they looked at how the robot could recognize what the human was doing, such as if they were waiting for the next piece or actively putting parts together.

To do this, they tested two different technologies:

Wearable sensors on the human's hands and wrists to track their movements
Eye tracking using a head-mounted display to see where the human was looking

The idea is that by understanding the human's current state or "intent," the robot could better coordinate its actions and retrieve the next piece at the right time. The researchers had 10 people try out both systems and asked for their feedback on which one they preferred and how well each one worked.

The key benefit of being able to recognize the human's intent is that it helps the robot and human work together more seamlessly during the assembly process. This could make the task go more efficiently and feel more natural for the human.

Technical Explanation

The paper explores intent recognition in the context of human-robot collaborative assembly, using the example of building a wooden stool. The researchers compared two distinct approaches for detecting the user's current state (e.g., idle, active):

A wearable sensing module that uses raw data from 9-axis Inertial Measurement Units (IMUs) on the user's wrists and hands as input to a Long Short-Term Memory (LSTM) network.
An eye tracking system that relies on a Head Mounted Display (HMD) and the Unreal Engine.

Both of these sensing modalities are integrated into a flexible planning architecture based on Hierarchical Task Networks. At runtime, the system aims to ensure better synchronization between the human and robot during the assembly process.

The researchers tested the two methods with 10 participants, who each tried both options in a counterbalanced order. They collected both explicit feedback (user experience questionnaires) and implicit metrics (classification time, overall assembly time) to evaluate the effectiveness and user preference of the two approaches.

The results indicate that the two methods can achieve comparable performance in terms of both efficacy and user preference. The paper suggests that future work could explore combining the two approaches to enable recognition of more complex user activities and better anticipate their actions, similar to HOI4ABot and MoveTouch.

Critical Analysis

The paper provides a thorough evaluation of two distinct approaches for intent recognition in human-robot collaborative assembly. By testing the methods with human participants and collecting both explicit and implicit feedback, the researchers were able to gain valuable insights into the strengths and limitations of each approach.

One potential limitation of the study is the focus on a relatively simple assembly task (a wooden stool). While this allowed for a controlled comparison of the two techniques, it's unclear how well the methods would scale to more complex assembly scenarios with a greater variety of user actions and potential for confusion.

Additionally, the paper does not delve deeply into the specific performance characteristics of the LSTM network and eye tracking system used in the study. More details on the training, accuracy, and robustness of these components would help readers better understand the practical feasibility of deploying such systems in real-world applications.

Further research could also explore integrating the wearable sensors and eye tracking into a multimodal approach, as suggested by the authors. This could potentially lead to more reliable and comprehensive intent recognition capabilities.

Overall, the paper presents a thoughtful comparison of two promising approaches for intent recognition in human-robot collaboration, highlighting the tradeoffs and opportunities for future development in this important area of research.

Conclusion

This paper explores the use of wearable sensors and eye tracking to improve the intent recognition capabilities of a robot collaborating with a human on an assembly task. The researchers found that both methods can achieve comparable performance in terms of effectiveness and user preference, suggesting that they could be viable solutions for enhancing synchronization between the human and robot during the assembly process.

The findings of this study have the potential to contribute to the development of more natural and efficient human-robot collaboration, which could have applications in a variety of industries, from manufacturing to healthcare. By better understanding the human's current state and anticipating their actions, robots can adapt their behavior to provide more meaningful assistance and create a more seamless, intuitive working relationship.

Future research in this area could explore combining the two sensing modalities, as well as scaling the techniques to handle more complex assembly tasks. Continued advancements in intent recognition could pave the way for increasingly sophisticated human-robot teamwork and unlock new possibilities for collaborative problem-solving.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Gaze-Based Intention Recognition for Human-Robot Collaboration

Valerio Belcamino, Miwa Takase, Mariya Kilina, Alessandro Carf`i, Akira Shimada, Sota Shimizu, Fulvio Mastrogiovanni

This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine. We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time. The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions.

5/14/2024

🌀

iCub Detecting Gazed Objects: A Pipeline Estimating Human Attention

Shiva Hanifi, Elisa Maiettini, Maria Lombardi, Lorenzo Natale

This research report explores the role of eye gaze in human-robot interactions and proposes a learning system for detecting objects gazed at by humans using solely visual feedback. The system leverages face detection, human attention prediction, and online object detection, and it allows the robot to perceive and interpret human gaze accurately, paving the way for establishing joint attention with human partners. Additionally, a novel dataset collected with the humanoid robot iCub is introduced, comprising over 22,000 images from ten participants gazing at different annotated objects. This dataset serves as a benchmark for the field of human gaze estimation in table-top human-robot interaction (HRI) contexts. In this work, we use it to evaluate the performance of the proposed pipeline and examine the performance of each component. Furthermore, the developed system is deployed on the iCub, and a supplementary video showcases its functionality. The results demonstrate the potential of the proposed approach as a first step to enhance social awareness and responsiveness in social robotics, as well as improve assistance and support in collaborative scenarios, promoting efficient human-robot collaboration.

5/10/2024

Constraint-Aware Intent Estimation for Dynamic Human-Robot Object Co-Manipulation

Yifei Simon Shao, Tianyu Li, Shafagh Keyvanian, Pratik Chaudhari, Vijay Kumar, Nadia Figueroa

Constraint-aware estimation of human intent is essential for robots to physically collaborate and interact with humans. Further, to achieve fluid collaboration in dynamic tasks intent estimation should be achieved in real-time. In this paper, we present a framework that combines online estimation and control to facilitate robots in interpreting human intentions, and dynamically adjust their actions to assist in dynamic object co-manipulation tasks while considering both robot and human constraints. Central to our approach is the adoption of a Dynamic Systems (DS) model to represent human intent. Such a low-dimensional parameterized model, along with human manipulability and robot kinematic constraints, enables us to predict intent using a particle filter solely based on past motion data and tracking errors. For safe assistive control, we propose a variable impedance controller that adapts the robot's impedance to offer assistance based on the intent estimation confidence from the DS particle filter. We validate our framework on a challenging real-world human-robot co-manipulation task and present promising results over baselines. Our framework represents a significant step forward in physical human-robot collaboration (pHRC), ensuring that robot cooperative interactions with humans are both feasible and effective.

9/4/2024

Predicting the Intention to Interact with a Service Robot:the Role of Gaze Cues

Simone Arreghini, Gabriele Abbate, Alessandro Giusti, Antonio Paolillo

For a service robot, it is crucial to perceive as early as possible that an approaching person intends to interact: in this case, it can proactively enact friendly behaviors that lead to an improved user experience. We solve this perception task with a sequence-to-sequence classifier of a potential user intention to interact, which can be trained in a self-supervised way. Our main contribution is a study of the benefit of features representing the person's gaze in this context. Extensive experiments on a novel dataset show that the inclusion of gaze cues significantly improves the classifier performance (AUROC increases from 84.5% to 91.2%); the distance at which an accurate classification can be achieved improves from 2.4 m to 3.2 m. We also quantify the system's ability to adapt to new environments without external supervision. Qualitative experiments show practical applications with a waiter robot.

4/3/2024