AI model creator details for TMElyralab

🎲

lyraChatGLM

108

lyraChatGLM is the fastest available version of the ChatGLM-6B model. It has achieved a 300x acceleration over the original model through various optimizations. The model uses the original ChatGLM-6B weights released by THUDM and is designed to run on Nvidia GPUs with Ampere or Volta architecture, such as the A100, A10, and V100. The maximum batch size supported by lyraChatGLM is 256 on the A100, a significant improvement over the original model. Model inputs and outputs Inputs Text prompts for conversational interactions Outputs Responses to the provided text prompts, generated in a conversational style Capabilities lyraChatGLM has been further optimized to reach speeds of up to 9000 tokens/s on the A100 and 3900 tokens/s on the V100, around 5.5x faster than the up-to-date official version. The memory usage has also been optimized, allowing for a batch size of up to 256 on the A100. What can I use it for? The maintainer's description indicates that lyraChatGLM is suitable for a wide range of conversational AI applications. Its high performance and low memory requirements make it an attractive option for deploying large language models in production environments. Companies or individuals working on chatbots, virtual assistants, or other conversational AI projects may find lyraChatGLM a valuable tool. Things to try One interesting aspect of lyraChatGLM is its support for INT8 weight-only post-training quantization (PTQ). This allows for further memory and performance optimizations, which could be beneficial for deploying the model on lower-end hardware or in resource-constrained environments.

Updated 5/28/2024

Text-to-Text

🔄

MuseV

TMElyralab

83

MuseV is a diffusion-based virtual human video generation framework developed by TMElyralab. It supports infinite-length and high-fidelity virtual human video generation using a novel Visual Conditioned Parallel Denoising scheme. The model is compatible with the Stable Diffusion ecosystem, including base models, LoRAs, and ControlNets. It also supports various multi-reference image techniques like IPAdapter, ReferenceOnly, ReferenceNet, and IPAdapterFaceID. Similar models like I2VGen-XL and text-to-video-ms-1.7b from Ali-ViLab also focus on high-quality video generation, but MuseV is specifically designed for virtual human video generation. Model inputs and outputs Inputs Image**: The model can take an image as input and generate a virtual human video based on it. Text**: The model can generate a virtual human video based on a text prompt describing the desired content. Video**: The model can take a video as input and generate a new virtual human video based on it. Outputs Virtual human video**: The model outputs a virtual human video that matches the input image, text, or video. Capabilities MuseV can generate virtual human videos of infinite length with high fidelity. The model's parallel denoising scheme allows it to generate videos without the typical artifacts or discontinuities seen in other video generation models. The model's compatibility with the Stable Diffusion ecosystem also enables versatile applications, such as conditioning on various visual cues or adapting the model to specific domains through techniques like LoRA. What can I use it for? MuseV can be useful for a variety of applications, such as virtual character animation, interactive virtual experiences, and content creation for games, films, or marketing. The model's ability to generate high-quality virtual human videos can be particularly valuable in industries like entertainment, gaming, and advertising, where realistic virtual characters are in high demand. Things to try One interesting aspect of MuseV is its ability to generate virtual human videos of infinite length. This can be particularly useful for creating long-form virtual experiences or narratives. Additionally, exploring the model's compatibility with Stable Diffusion techniques like LoRA and ControlNet could lead to interesting customizations and adaptations for specific use cases.

Updated 5/28/2024

Video-to-Video

⛏️

MusePose

TMElyralab

60

MusePose is an image-to-video generation framework for virtual human characters. It can generate dance videos of a human character in a reference image under a given pose sequence. This model builds upon previous work like AnimateAnyone and Moore-AnimateAnyone, with several key improvements. The maintainers of MusePose, TMElyralab, have released the model and pretrained checkpoints, and plan to continue enhancing it with features like a "pose align" algorithm and improved model architecture. Model inputs and outputs Inputs Reference image of a human character Sequence of poses to drive the character's movement Outputs Video of the human character in the reference image performing the specified poses Capabilities MusePose can generate high-quality dance videos of a virtual human character, exceeding the performance of many existing open-source models in this domain. The "pose align" algorithm allows users to align arbitrary dance videos to arbitrary reference images, significantly improving inference performance and usability. What can I use it for? The MusePose model can be used to create virtual dance performances, animated videos, and other applications where a human character needs to be generated and driven by a sequence of poses. This could be useful for game development, film/TV production, social media content creation, and more. By combining MusePose with other models like MuseV and MuseTalk, the community can work towards the vision of generating fully animated, interactive virtual humans. Things to try One interesting aspect of MusePose is the ability to align arbitrary dance videos to reference images. This could allow for creative mixing and matching of different dance styles and character models. Additionally, exploring the limits of the model's pose generation capabilities, such as more complex or dynamic movements, could lead to new and compelling virtual human animations.

Updated 7/8/2024

Image-to-Video

👀

MuseTalk

TMElyralab

56

MuseTalk is a real-time high-quality audio-driven lip-syncing model developed by TMElyralab. It can be applied with input videos, such as those generated by MuseV, to create a complete virtual human solution. The model is trained in the latent space of ft-mse-vae and can modify an unseen face according to the input audio, with a face region size of 256 x 256. MuseTalk supports audio in various languages, including Chinese, English, and Japanese, and can run in real-time at 30fps+ on an NVIDIA Tesla V100 GPU. Model inputs and outputs Inputs Audio in various languages (e.g., Chinese, English, Japanese) A face region of size 256 x 256 Outputs A modified face region with synchronized lip movements based on the input audio Capabilities MuseTalk can generate realistic lip-synced animations in real-time, making it a powerful tool for creating virtual human experiences. The model supports modification of the center point of the face region, which significantly affects the generation results. Additionally, a checkpoint trained on the HDTF dataset is available. What can I use it for? MuseTalk can be used to bring static images or videos to life by animating the subjects' lips in sync with the audio. This can be particularly useful for creating virtual avatars, dubbing videos, or enhancing the realism of computer-generated characters. The model's real-time capabilities make it suitable for live applications, such as virtual presentations or interactive experiences. Things to try Experiment with MuseTalk by using it to animate the lips of various subjects, from famous portraits to your own photos. Try adjusting the center point of the face region to see how it impacts the generation results. Additionally, you can explore integrating MuseTalk with other virtual human solutions, such as MuseV, to create a complete virtual human experience.

Updated 7/31/2024

Image-to-Image

👀

MuseTalk

TMElyralab

56

MuseTalk is a real-time high-quality audio-driven lip-syncing model developed by TMElyralab. It can be applied with input videos, such as those generated by MuseV, to create a complete virtual human solution. The model is trained in the latent space of ft-mse-vae and can modify an unseen face according to the input audio, with a face region size of 256 x 256. MuseTalk supports audio in various languages, including Chinese, English, and Japanese, and can run in real-time at 30fps+ on an NVIDIA Tesla V100 GPU. Model inputs and outputs Inputs Audio in various languages (e.g., Chinese, English, Japanese) A face region of size 256 x 256 Outputs A modified face region with synchronized lip movements based on the input audio Capabilities MuseTalk can generate realistic lip-synced animations in real-time, making it a powerful tool for creating virtual human experiences. The model supports modification of the center point of the face region, which significantly affects the generation results. Additionally, a checkpoint trained on the HDTF dataset is available. What can I use it for? MuseTalk can be used to bring static images or videos to life by animating the subjects' lips in sync with the audio. This can be particularly useful for creating virtual avatars, dubbing videos, or enhancing the realism of computer-generated characters. The model's real-time capabilities make it suitable for live applications, such as virtual presentations or interactive experiences. Things to try Experiment with MuseTalk by using it to animate the lips of various subjects, from famous portraits to your own photos. Try adjusting the center point of the face region to see how it impacts the generation results. Additionally, you can explore integrating MuseTalk with other virtual human solutions, such as MuseV, to create a complete virtual human experience.

Updated 7/31/2024

Image-to-Image

Tmelyralab

Models by this creator

lyraChatGLM

MuseV

MusePose

MuseTalk

MuseTalk