Spatial and Surface Correspondence Field for Interaction Transfer

Read original: arXiv:2405.03221 - Published 5/7/2024 by Zeyu Huang, Honghao Xu, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu
Total Score

0

🔄

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • Introduces a new method for the task of interaction transfer
  • Given an example interaction between a source object and an agent, the method can automatically infer both surface and spatial relationships for the agent and target objects within the same category
  • Yields more accurate and valid transfers compared to existing methods

Plain English Explanation

This paper presents a new approach for transferring interactions from one set of objects to another. For example, if you have an example of a human interacting with a chair, this method can automatically figure out how that interaction would look with a different object, like a mug.

The key idea is to use a combined representation of the spatial relationships and surface properties between the agent (e.g., the human) and the source object (e.g., the chair). This representation is then used to "map" the interaction to a new target object (e.g., the mug) by finding the corresponding spatial and surface relationships.

This allows the method to handle larger variations in the geometry and shape between the source and target objects, which is a limitation of previous approaches. The experiments on transferring human-chair and hand-mug interactions show that this new method significantly outperforms existing state-of-the-art techniques.

Technical Explanation

The method characterizes the example interaction using a combined spatial and surface representation. It then learns a correspondence field that maps the agent points and object points from the source interaction to the target object space, representing the objects as deformed and rotated signed distance fields.

With the corresponded points, an optimization is performed under the constraints of the spatial and surface interaction representation, as well as additional regularization terms. This allows the method to transfer the interaction accurately, even when there are large differences in the geometry and topology between the source and target shapes.

The experiments on human-chair and hand-mug interaction transfer tasks demonstrate the effectiveness of this approach, showing significant improvements over the state-of-the-art methods.

Critical Analysis

The paper provides a detailed explanation of the method and thorough experimental validation. However, it would be interesting to see how the approach handles more complex, multi-part objects or interactions involving multiple agents, as the current experiments are limited to single-object interactions.

Additionally, the paper does not discuss the computational complexity or runtime performance of the method, which could be an important consideration for real-world applications. Further research could also explore the robustness of the method to noise or variations in the input data.

Overall, the proposed interaction transfer approach represents a significant advancement in the field and could have important applications in areas like robotic manipulation, animation, and virtual reality.

Conclusion

This paper introduces a novel method for interaction transfer that can automatically infer both surface and spatial relationships between an agent and target object, enabling more accurate and valid transfers across a wide range of object geometries and topologies.

The key technical contributions are the combined spatial and surface representation of interactions, and the learned correspondence field that maps the source interaction to the target object space. The experimental results demonstrate the effectiveness of this approach, which outperforms existing state-of-the-art methods.

This work could have important implications for applications like robotic manipulation, animation, and virtual reality, where the ability to transfer realistic interactions between different objects is crucial.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔄

Total Score

0

Spatial and Surface Correspondence Field for Interaction Transfer

Zeyu Huang, Honghao Xu, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combined spatial and surface representation. We correspond the agent points and object points related to the representation to the target object space using a learned spatial and surface correspondence field, which represents objects as deformed and rotated signed distance fields. With the corresponded points, an optimization is performed under the constraints of our spatial and surface interaction representation and additional regularization. Experiments conducted on human-chair and hand-mug interaction transfer tasks show that our approach can handle larger geometry and topology variations between source and target shapes, significantly outperforming state-of-the-art methods.

Read more

5/7/2024

GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Total Score

0

GEARS: Local Geometry-aware Hand-object Interaction Synthesis

Keyang Zhou, Bharat Lal Bhatnagar, Jan Eric Lenssen, Gerard Pons-moll

Generating realistic hand motion sequences in interaction with objects has gained increasing attention with the growing interest in digital humans. Prior work has illustrated the effectiveness of employing occupancy-based or distance-based virtual sensors to extract hand-object interaction features. Nonetheless, these methods show limited generalizability across object categories, shapes and sizes. We hypothesize that this is due to two reasons: 1) the limited expressiveness of employed virtual sensors, and 2) scarcity of available training data. To tackle this challenge, we introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. The sensor queries for object surface points in the neighbourhood of each hand joint. As an important step towards mitigating the learning complexity, we transform the points from global frame to hand template frame and use a shared module to process sensor features of each individual joint. This is followed by a spatio-temporal transformer network aimed at capturing correlation among the joints in different dimensions. Moreover, we devise simple heuristic rules to augment the limited training sequences with vast static hand grasping samples. This leads to a broader spectrum of grasping types observed during training, in turn enhancing our model's generalization capability. We evaluate on two public datasets, GRAB and InterCap, where our method shows superiority over baselines both quantitatively and perceptually.

Read more

5/14/2024

Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Total Score

0

Improving Detection in Aerial Images by Capturing Inter-Object Relationships

Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng

In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines.

Read more

4/8/2024

InterTrack: Tracking Human Object Interaction without Object Templates
Total Score

0

InterTrack: Tracking Human Object Interaction without Object Templates

Xianghui Xie, Jan Eric Lenssen, Gerard Pons-Moll

Tracking human object interaction from videos is important to understand human behavior from the rapidly growing stream of video data. Previous video-based methods require predefined object templates while single-image-based methods are template-free but lack temporal consistency. In this paper, we present a method to track human object interaction without any object shape templates. We decompose the 4D tracking problem into per-frame pose tracking and canonical shape optimization. We first apply a single-view reconstruction method to obtain temporally-inconsistent per-frame interaction reconstructions. Then, for the human, we propose an efficient autoencoder to predict SMPL vertices directly from the per-frame reconstructions, introducing temporally consistent correspondence. For the object, we introduce a pose estimator that leverages temporal information to predict smooth object rotations under occlusions. To train our model, we propose a method to generate synthetic interaction videos and synthesize in total 10 hour videos of 8.5k sequences with full 3D ground truth. Experiments on BEHAVE and InterCap show that our method significantly outperforms previous template-based video tracking and single-frame reconstruction methods. Our proposed synthetic video dataset also allows training video-based methods that generalize to real-world videos. Our code and dataset will be publicly released.

Read more

8/27/2024