Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction

2402.04768

Published 4/9/2024 by Esteve Valls Mascaro, Yashuai Yan, Dongheui Lee

🛸

Abstract

Integrating robots into populated environments is a complex challenge that requires an understanding of human social dynamics. In this work, we propose to model social motion forecasting in a shared human-robot representation space, which facilitates us to synthesize robot motions that interact with humans in social scenarios despite not observing any robot in the motion training. We develop a transformer-based architecture called ECHO, which operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. Contrary to prior works, we reformulate the social motion problem as the refinement of the predicted individual motions based on the surrounding agents, which facilitates the training while allowing for single-motion forecasting when only one human is in the scene. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin while being efficient and performing in real-time. Additionally, our qualitative results showcase the effectiveness of our approach in generating human-robot interaction behaviors that can be controlled via text commands. Webpage: https://evm7.github.io/ECHO/

Create account to get full access

Overview

This paper provides guidelines for preparing papers to be submitted to IEEE-sponsored conferences and symposia.
It covers the key elements of the paper submission process, including selecting a template, formatting requirements, and submission procedures.
The guidelines aim to help authors ensure their papers are properly formatted and adhere to IEEE's standards for conference publications.

Plain English Explanation

This paper outlines the steps authors should follow when preparing a paper for submission to an IEEE-sponsored conference or symposium. The first key step is selecting the appropriate IEEE paper template to use for formatting the paper. The guidelines then cover details like setting the proper page size, font, and spacing, as well as including required elements like the title, author information, and abstract.

The paper also explains the submission process, including registering the paper, uploading the final files, and adhering to any deadlines set by the conference organizers. By following these guidelines, authors can ensure their papers are properly formatted and have the best chance of being accepted for presentation at the IEEE event.

Technical Explanation

The paper begins by emphasizing the importance of following the provided guidelines when preparing a paper for an IEEE-sponsored conference or symposium. It then outlines the key steps in the paper submission process.

First, authors must select the appropriate IEEE paper template to use for formatting their paper. The guidelines specify details like the required page size, font, and spacing, as well as the inclusion of elements like the title, author information, and abstract.

The paper also covers the submission process, including registering the paper with the conference, uploading the final files, and adhering to any deadlines set by the organizers. Proper formatting and timely submission are crucial to ensure a paper is considered for acceptance and presentation at the IEEE event.

Critical Analysis

The guidelines provided in this paper are comprehensive and clearly outlined, which should be helpful for authors preparing papers for IEEE-sponsored conferences and symposia. The step-by-step instructions and attention to formatting details are likely to ensure a level of consistency across the published papers.

However, one potential limitation is that the guidelines may not account for the unique requirements or preferences of individual conferences. Authors may need to closely review the specific call for papers or submission guidelines for each event to ensure they are fully complying with the expected formatting and submission procedures.

Additionally, the guidelines do not provide much insight into the broader review and selection process for IEEE conference papers. Authors may benefit from further guidance on how to craft a compelling and technically sound paper that is more likely to be accepted for presentation.

Conclusion

This paper offers a comprehensive set of guidelines for authors preparing papers to be submitted to IEEE-sponsored conferences and symposia. By following the instructions for proper formatting and the submission process, authors can increase the chances of their work being accepted and published in the IEEE proceedings.

While the guidelines are thorough, authors should also carefully review the specific requirements for each individual conference to ensure full compliance. Additionally, guidance on crafting a strong, technically sound paper that is likely to be selected for presentation would further benefit prospective authors.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Modeling social interaction dynamics using temporal graph networks

J. Taery Kim, Archit Naik, Isuru Jayarathne, Sehoon Ha, Jouh Yeong Chew

Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employing an adapted Temporal Graph Networks to comprehensively represent social interaction dynamics while enabling its practical implementation. Our method incorporates temporal multi-modal behavioral data including gaze interaction, voice activity and environmental context. This representation of social interaction dynamics is trained as a link prediction problem using annotated gaze interaction data. The F1-score outperformed the baseline model by 37.0%. This improvement is consistent for a secondary task of next speaker prediction which achieves an improvement of 29.0%. Our contributions are two-fold, including a model to representing social interaction dynamics which can be used for many downstream human-robot interaction tasks like human state inference and next speaker prediction. More importantly, this is achieved using a more concise yet efficient message passing method, significantly reducing it from 768 to 14 elements, while outperforming the baseline model.

4/11/2024

cs.HC cs.SI

HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/

6/18/2024

cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

📈

Robust Human Motion Forecasting using Transformer-based Model

Esteve Valls Mascaro, Shuo Ma, Hyemin Ahn, Dongheui Lee

Comprehending human motion is a fundamental challenge for developing Human-Robot Collaborative applications. Computer vision researchers have addressed this field by only focusing on reducing error in predictions, but not taking into account the requirements to facilitate its implementation in robots. In this paper, we propose a new model based on Transformer that simultaneously deals with the real time 3D human motion forecasting in the short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently exploit the spatio-temporal information of a shortly observed sequence (400ms) and generates a competitive accuracy against the current state-of-the-art. 2CH-TR stands out for the efficient performance of the Transformer, being lighter and faster than its competitors. In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment. Our experiment results show that the proposed 2CH-TR outperforms the ST-Transformer, which is another state-of-the-art model based on the Transformer, in terms of reconstruction and prediction under the same conditions of input prefix. Our model reduces in 8.89% the mean squared error of ST-Transformer in short-term prediction, and 2.57% in long-term prediction in Human3.6M dataset with 400ms input prefix. Webpage: https://evm7.github.io/2CHTR-page/

4/9/2024

cs.CV

🤷

ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space

Yashuai Yan, Esteve Valls Mascaro, Dongheui Lee

This paper introduces a novel deep-learning approach for human-to-robot motion retargeting, enabling robots to mimic human poses accurately. Contrary to prior deep-learning-based works, our method does not require paired human-to-robot data, which facilitates its translation to new robots. First, we construct a shared latent space between humans and robots via adaptive contrastive learning that takes advantage of a proposed cross-domain similarity metric between the human and robot poses. Additionally, we propose a consistency term to build a common latent space that captures the similarity of the poses with precision while allowing direct robot motion control from the latent space. For instance, we can generate in-between motion through simple linear interpolation between two projected human poses. We conduct a comprehensive evaluation of robot control from diverse modalities (i.e., texts, RGB videos, and key poses), which facilitates robot control for non-expert users. Our model outperforms existing works regarding human-to-robot retargeting in terms of efficiency and precision. Finally, we implemented our method in a real robot with self-collision avoidance through a whole-body controller to showcase the effectiveness of our approach. More information on our website https://evm7.github.io/UnsH2R/

4/9/2024

cs.RO cs.AI