EgoPet: Egomotion and Interaction Data from an Animal's Perspective

2404.09991

Published 4/16/2024 by Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

cs.RO cs.CV

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Abstract

Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets.

Create account to get full access

Overview

The paper introduces the EgoPet dataset, which captures egomotion and interaction data from the perspective of a pet animal.
The dataset includes visual, inertial, and interaction data recorded using sensors attached to a pet.
The goal is to enable research on understanding animal behavior, egocentric perception, and human-animal interaction from the animal's point of view.

Plain English Explanation

The EgoPet dataset provides a unique window into the world as seen by a pet animal. By equipping a pet with sensors, the researchers were able to capture visual, motion, and interaction data from the animal's perspective. This dataset aims to enable new insights into animal behavior, how animals perceive their environments, and how they interact with humans and other animals.

Traditionally, studies of animal behavior and cognition have relied on observing the animals from the outside. The EgoPet dataset flips this approach by allowing researchers to see the world through the eyes of the animal itself. This can provide valuable information about the animal's subjective experience and decision-making processes.

The data collected includes video, inertial measurement unit (IMU) data, and records of the animal's interactions with its surroundings. Analyzing this multisensory data can shed light on topics like how animals navigate their environments, how they perceive and respond to human interactions, and how their egocentric viewpoint shapes their understanding of the world.

Overall, the EgoPet dataset represents an innovative approach to studying animal cognition and behavior, with the potential to drive new discoveries and a better understanding of our animal companions.

Technical Explanation

The EgoPet dataset was created by equipping pets with a sensor package that included a camera, inertial measurement unit (IMU), and proximity sensors. This allowed the researchers to capture high-quality video, motion data, and records of the animal's interactions with its surroundings from the pet's own perspective.

The dataset includes data from multiple pets across a variety of settings, such as the home, outdoor environments, and interactions with their owners. The video, IMU, and interaction data are synchronized and annotated, enabling researchers to analyze the relationships between the animal's movements, visual perception, and behaviors.

Some of the key technical capabilities enabled by the EgoPet dataset include:

Understanding animal navigation and exploration strategies through the lens of egocentric motion data
Analyzing how animals perceive and respond to human interactions and other environmental stimuli
Developing models for predicting animal behavior and affordance perception based on the egocentric sensory input

By providing this rich, multimodal dataset from the animal's point of view, the researchers hope to catalyze new breakthroughs in animal cognition research and the development of technologies that can better understand and interact with our animal companions.

Critical Analysis

The EgoPet dataset represents an innovative and promising approach to studying animal behavior and cognition. By capturing data from the animal's own perspective, it has the potential to reveal new insights that may be difficult to obtain through traditional observational methods.

However, it's important to acknowledge some of the potential limitations and challenges associated with this type of data collection. For example, the sensor package worn by the animals may influence their natural behavior to some degree, and the dataset is necessarily limited to the specific pets and environments included in the study.

Additionally, the interpretation and analysis of the egocentric data may require the development of specialized techniques and models, which could present technical challenges for researchers. There may also be ethical considerations around the use of animal-borne sensors and the potential impact on the animals' wellbeing.

Despite these potential limitations, the EgoPet dataset opens up exciting new avenues for research on human-animal interaction, animal cognition, and the development of more intuitive and effective technologies for understanding and interacting with our animal companions. As the field continues to evolve, it will be important for researchers to address the methodological and ethical challenges in a thoughtful and responsible manner.

Conclusion

The EgoPet dataset represents a groundbreaking approach to studying animal behavior and cognition from the animal's own perspective. By equipping pets with a suite of sensors, the researchers have captured a wealth of data on the animals' visual, motion, and interaction experiences.

This dataset has the potential to drive new discoveries in our understanding of animal perception, decision-making, and interaction with humans and the environment. It could also pave the way for the development of more effective and intuitive technologies for interacting with and caring for our animal companions.

As the field of animal cognition research continues to evolve, the EgoPet dataset and similar approaches will likely play an increasingly important role in unlocking the mysteries of the animal mind and fostering a deeper appreciation for the rich inner lives of our furry, feathered, and scaled friends.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang

Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a pre-defined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research. Refer to our project page: https://ego-gen.github.io/.

4/12/2024

cs.CV cs.AI

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao

Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates the human demonstration following process, in which individuals record egocentric videos as they execute tasks guided by demonstration videos. Focusing on the potential applications in daily assistance and professional support, EgoExoLearn contains egocentric and demonstration video data spanning 120 hours captured in daily life scenarios and specialized laboratories. Along with the videos we record high-quality gaze data and provide detailed multimodal annotations, formulating a playground for modeling the human ability to bridge asynchronous procedural actions from different viewpoints. To this end, we present benchmarks such as cross-view association, cross-view action planning, and cross-view referenced skill assessment, along with detailed analysis. We expect EgoExoLearn can serve as an important resource for bridging the actions across views, thus paving the way for creating AI agents capable of seamlessly learning by observing humans in the real world. Code and data can be found at: https://github.com/OpenGVLab/EgoExoLearn

6/6/2024

cs.CV

HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/

6/18/2024

cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

🤔

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel expert commentary done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/

4/30/2024

cs.CV cs.AI