Prompt2Fashion: An automatically generated fashion dataset

Read original: arXiv:2409.06442 - Published 9/16/2024 by Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou

Prompt2Fashion: An automatically generated fashion dataset

Overview

Prompt2Fashion is a new fashion dataset generated automatically from text prompts
The dataset contains diverse fashion images along with captions and other metadata
It was created to support research in multimodal fashion AI and generation of fashion imagery

Plain English Explanation

Prompt2Fashion is a dataset of fashion images that were generated automatically from text descriptions, or "prompts." This means the researchers used AI systems to create the images based on the prompts, rather than having humans take and curate the photos.

The dataset includes not just the generated images, but also the text captions and other metadata associated with each one. This makes it a useful resource for researchers working on multimodal fashion AI - systems that can understand and generate both visual and textual content related to fashion.

The researchers created Prompt2Fashion to help advance the field of automatically generating fashion imagery using AI. Rather than having to manually create or curate a large dataset, they were able to generate diverse fashion images at scale by using text prompts as the input.

Technical Explanation

The Prompt2Fashion dataset was created by training large language models on a corpus of fashion-related text from the internet. These models were then used to generate text prompts describing diverse fashion items, outfits, and scenes.

Next, the researchers used text-to-image generation models such as DALL-E 2 to create corresponding images based on the prompts. The resulting dataset contains over 1 million fashion images, each paired with a descriptive caption and other metadata.

The researchers evaluated the quality and diversity of the generated images through both human evaluation and automated metrics. They found that Prompt2Fashion covers a wide range of fashion styles, garments, and contexts, making it a valuable resource for training and benchmarking multimodal fashion AI systems.

Critical Analysis

The Prompt2Fashion dataset represents an innovative approach to creating large-scale fashion datasets, but it also has some potential limitations. Since the images are generated from text prompts rather than real-world photos, they may not fully capture the nuances and complexities of actual fashion items and outfits.

Additionally, the text prompts used to generate the images could reflect biases or stereotypes present in the language model's training data. This could result in the dataset containing unrealistic or unrepresentative fashion imagery. The researchers acknowledge this as an area for further investigation and mitigation.

Another potential concern is the environmental impact of the computational resources required to generate such a large dataset. While this approach may be more efficient than manual curation, the energy and carbon footprint of training large language and image generation models should be carefully considered.

Despite these caveats, Prompt2Fashion represents an exciting advancement in the field of automatically generating fashion imagery and multimodal fashion AI. As the technology continues to evolve, it will be important for researchers to address potential biases and environmental concerns while leveraging the scalability and diversity that this approach enables.

Conclusion

The Prompt2Fashion dataset represents a significant advancement in the field of automatically generated fashion imagery and multimodal fashion AI. By leveraging large language models and text-to-image generation, the researchers were able to create a diverse and scalable dataset of fashion-related images, captions, and metadata.

This resource has the potential to drive further progress in areas like continual deepfake detection, artistic style exploration, and other applications where large, high-quality fashion datasets are needed. However, it will be important for the research community to carefully consider and address potential biases and environmental concerns as this technology continues to evolve.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Prompt2Fashion: An automatically generated fashion dataset

Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou

Despite the rapid evolution and increasing efficacy of language and vision generative models, there remains a lack of comprehensive datasets that bridge the gap between personalized fashion needs and AI-driven design, limiting the potential for truly inclusive and customized fashion solutions. In this work, we leverage generative models to automatically construct a fashion image dataset tailored to various occasions, styles, and body types as instructed by users. We use different Large Language Models (LLMs) and prompting strategies to offer personalized outfits of high aesthetic quality, detail, and relevance to both expert and non-expert users' requirements, as demonstrated by qualitative analysis. Up until now the evaluation of the generated outfits has been conducted by non-expert human subjects. Despite the provided fine-grained insights on the quality and relevance of generation, we extend the discussion on the importance of expert knowledge for the evaluation of artistic AI-generated datasets such as this one. Our dataset is publicly available on GitHub at https://github.com/georgiarg/Prompt2Fashion.

9/16/2024

Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models

Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou

The advent of artificial intelligence has contributed in a groundbreaking transformation of the fashion industry, redefining creativity and innovation in unprecedented ways. This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation. Emphasizing adaptability in AI-driven fashion creativity, we depart from traditional approaches and focus on prompting techniques, such as zero-shot and few-shot learning, as well as Chain-of-Thought (CoT), which results in a variety of colors and textures, enhancing the diversity of the outputs. Central to our methodology is Retrieval-Augmented Generation (RAG), enriching models with insights from fashion sources to ensure contemporary representations. Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles. Among the participants, RAG and few-shot learning techniques are preferred for their ability to produce more relevant and appealing fashion descriptions. Our code is provided at https://github.com/georgiarg/AutoFashion.

7/23/2024

Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

Matthew Zheng, Enis Simsar, Hidir Yesiltepe, Federico Tombari, Joel Simon, Pinar Yanardag

Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These models have been widely employed across various domains, particularly in art generation, where they facilitate a broad spectrum of creative expression and democratize access to artistic creation. In this paper, we introduce texttt{STYLEBREEDER}, a comprehensive dataset of 6.8M images and 1.8M prompts generated by 95K users on Artbreeder, a platform that has emerged as a significant hub for creative exploration with over 13M users. We introduce a series of tasks with this dataset aimed at identifying diverse artistic styles, generating personalized content, and recommending styles based on user interests. By documenting unique, user-generated styles that transcend conventional categories like 'cyberpunk' or 'Picasso,' we explore the potential for unique, crowd-sourced styles that could provide deep insights into the collective creative psyche of users worldwide. We also evaluate different personalization methods to enhance artistic expression and introduce a style atlas, making these models available in LoRA format for public use. Our research demonstrates the potential of text-to-image diffusion models to uncover and promote unique artistic expressions, further democratizing AI in art and fostering a more diverse and inclusive artistic community. The dataset, code and models are available at https://stylebreeder.github.io under a Public Domain (CC0) license.

6/24/2024

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation

Xiangyu Zhao, Yuehan Zhang, Wenlong Zhang, Xiao-Ming Wu

The fashion domain encompasses a variety of real-world multimodal tasks, including multimodal retrieval and multimodal generation. The rapid advancements in artificial intelligence generated content, particularly in technologies like large language models for text generation and diffusion models for visual generation, have sparked widespread research interest in applying these multimodal models in the fashion domain. However, tasks involving embeddings, such as image-to-text or text-to-image retrieval, have been largely overlooked from this perspective due to the diverse nature of the multimodal fashion domain. And current research on multi-task single models lack focus on image generation. In this work, we present UniFashion, a unified framework that simultaneously tackles the challenges of multimodal generation and retrieval tasks within the fashion domain, integrating image generation with retrieval tasks and text generation tasks. UniFashion unifies embedding and generative tasks by integrating a diffusion model and LLM, enabling controllable and high-fidelity generation. Our model significantly outperforms previous single-task state-of-the-art models across diverse fashion tasks, and can be readily adapted to manage complex vision-language tasks. This work demonstrates the potential learning synergy between multimodal generation and retrieval, offering a promising direction for future research in the fashion domain. The source code is available at https://github.com/xiangyu-mm/UniFashion.

8/22/2024