Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space

Read original: arXiv:2403.11469 - Published 7/24/2024 by Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu

Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space

Overview

The paper proposes a method for generative motion stylization, which involves transferring the style of one motion sequence to another.
It introduces a "canonical motion space" that can represent different motion styles in a unified framework.
The method uses a neural network to learn a mapping between the canonical motion space and the input motion styles, allowing for cross-style and cross-structure motion generation.

Plain English Explanation

The paper presents a way to take the "style" of one motion sequence and apply it to another motion sequence. For example, you could take the smooth, graceful movements of a ballet dancer and apply them to a character in a video game, making their animations look more fluid and elegant.

The key idea is to create a "canonical motion space" - a way to represent different motion styles in a standardized format. The method uses a neural network to learn how to translate between this canonical space and the specific motion styles. This allows the system to transfer motion styles across different characters or structures, like applying a human running animation to a four-legged animal.

Technical Explanation

The paper introduces a "Canonical Motion Space" (CMS) that can represent different motion styles in a unified framework. This CMS is defined by a set of basis motion sequences that capture the diversity of motion styles.

The method uses a neural network to learn a mapping between the CMS and the input motion styles. This allows the system to generate new motion sequences by decomposing the input motions into the CMS basis, applying style transfer, and reconstructing the new motion.

The network is trained on a dataset of motion capture data spanning a variety of motion styles. It learns to encode the input motions into the CMS representation, and then decode this representation to generate new motions with a desired style.

Key technical contributions include the CMS formulation, the neural network architecture for style transfer, and experiments demonstrating the ability to transfer styles across different motion structures (e.g. from human to quadruped).

Critical Analysis

The paper demonstrates compelling results for motion style transfer, but there are a few potential limitations:

The method relies on having a diverse dataset of motion capture data spanning many styles. In practice, such comprehensive datasets may not always be available.
The paper focuses on transferring styles between similar motion structures (e.g. human to human, quadruped to quadruped). Transferring styles across vastly different structures (e.g. human to bird) may be more challenging.
The evaluation is primarily qualitative, lacking detailed quantitative metrics to assess the fidelity and authenticity of the generated motions.

Further research could explore ways to address these limitations, such as few-shot style transfer or cross-modal style transfer. Additionally, incorporating physical and dynamic constraints could help produce more realistic and natural-looking motions.

Conclusion

This paper presents a novel approach to generative motion stylization, introducing the concept of a "Canonical Motion Space" to enable cross-style and cross-structure motion generation. The results demonstrate the potential for this technique to enhance the expressiveness and diversity of animated motions, with applications in areas like video games, virtual reality, and digital content creation.

While the method has some limitations, the core ideas and technical contributions are significant and could inspire further advancements in the field of motion synthesis and style transfer. As AI-powered animation continues to evolve, techniques like the one described in this paper will play an increasingly important role in creating more engaging and lifelike virtual experiences.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space

Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu

Stylized motion breathes life into characters. However, the fixed skeleton structure and style representation hinder existing data-driven motion synthesis methods from generating stylized motion for various characters. In this work, we propose a generative motion stylization pipeline, named MotionS, for synthesizing diverse and stylized motion on cross-structure characters using cross-modality style prompts. Our key insight is to embed motion style into a cross-modality latent space and perceive the cross-structure skeleton topologies, allowing for motion stylization within a canonical motion space. Specifically, the large-scale Contrastive-Language-Image-Pre-training (CLIP) model is leveraged to construct the cross-modality latent space, enabling flexible style representation within it. Additionally, two topology-encoded tokens are learned to capture the canonical and specific skeleton topologies, facilitating cross-structure topology shifting. Subsequently, the topology-shifted stylization diffusion is designed to generate motion content for the particular skeleton and stylize it in the shifted canonical motion space using multi-modality style descriptions. Through an extensive set of examples, we demonstrate the flexibility and generalizability of our pipeline across various characters and style descriptions. Qualitative and quantitative comparisons show the superiority of our pipeline over state-of-the-arts, consistently delivering high-quality stylized motion across a broad spectrum of skeletal structures.

7/24/2024

SMooDi: Stylized Motion Diffusion Model

Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang

We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-motion model for stylization. Specifically, we propose style guidance to ensure that the generated motion closely matches the reference style, alongside a lightweight style adaptor that directs the motion towards the desired style while ensuring realism. Experiments across various applications demonstrate that our proposed framework outperforms existing methods in stylized motion generation.

7/18/2024

🔄

On-the-fly Learning to Transfer Motion Style with Diffusion Models: A Semantic Guidance Approach

Lei Hu, Zihao Zhang, Yongjing Ye, Yiwen Xu, Shihong Xia

3D Human motion style transfer is a fundamental problem in computer graphic and animation processing. Existing AdaIN- based methods necessitate datasets with balanced style distribution and content/style labels to train the clustered latent space. However, we may encounter a single unseen style example in practical scenarios, but not in sufficient quantity to constitute a style cluster for AdaIN-based methods. Therefore, in this paper, we propose a novel two-stage framework for few-shot style transfer learning based on the diffusion model. Specifically, in the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior so that it can cope with various content motion inputs. In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer. The key idea is regarding the reverse process of diffusion as a motion-style translation process since the motion styles can be viewed as special motion variations. During the fine-tuning for style transfer, a simple yet effective semantic-guided style transfer loss coordinated with style example reconstruction loss is introduced to supervise the style transfer in CLIP semantic space. The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications.

8/9/2024

WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds

Peizhuo Li, Sebastian Starke, Yuting Ye, Olga Sorkine-Hornung

We present a new approach for understanding the periodicity structure and semantics of motion datasets, independently of the morphology and skeletal structure of characters. Unlike existing methods using an overly sparse high-dimensional latent, we propose a phase manifold consisting of multiple closed curves, each corresponding to a latent amplitude. With our proposed vector quantized periodic autoencoder, we learn a shared phase manifold for multiple characters, such as a human and a dog, without any supervision. This is achieved by exploiting the discrete structure and a shallow network as bottlenecks, such that semantically similar motions are clustered into the same curve of the manifold, and the motions within the same component are aligned temporally by the phase variable. In combination with an improved motion matching framework, we demonstrate the manifold's capability of timing and semantics alignment in several applications, including motion retrieval, transfer and stylization. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/walkthedog.

7/30/2024