MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Read original: arXiv:2405.07090 - Published 5/14/2024 by Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

📊

Overview

This paper introduces a novel approach to automatically mine user interface (UI) data from Android apps using Large Language Models (LLMs).
Existing UI datasets are often outdated and contain noisy data, presenting challenges in modeling UI understanding.
The proposed method leverages LLMs to mimic human-like exploration and employs best practices in UI noise filtering and human annotation to ensure dataset quality.
The result is a large dataset, MUD, containing 18,000 human-annotated UIs from 3,300 apps, which is useful for common UI modeling tasks like element detection and UI retrieval.

Plain English Explanation

Designing good mobile user interfaces (UIs) is crucial, but it requires high-quality data to train the AI models that help with this task. Unfortunately, existing UI datasets are often outdated and messy, making it hard to accurately model how people interact with UIs in the real world.

To address this, the researchers in this paper developed a new way to automatically collect UI data from Android apps. They used large language models (LLMs), which are AI systems trained on vast amounts of text data, to mimic how a human would explore and interact with different apps. This allowed them to gather a large dataset of UI screenshots, while also using techniques to filter out low-quality or noisy data.

The end result is a new dataset called MUD, which contains over 18,000 UI screenshots from 3,300 different apps. This dataset has been carefully annotated by humans to ensure it is high-quality and useful for training AI models. The researchers demonstrate how MUD can be used for two common UI-related tasks: detecting individual UI elements and retrieving similar UIs.

Overall, this work provides a new, modern, and well-curated dataset that can help advance research into understanding and modeling mobile user interfaces, which is crucial for building better and more usable mobile apps.

Technical Explanation

The key innovation in this paper is the use of LLMs to automate the process of gathering UI data from Android apps. Traditionally, creating high-quality UI datasets has been a labor-intensive task, requiring manual exploration and annotation of apps.

To address this, the researchers developed a system that uses LLMs to mimic human-like app exploration. The LLMs are trained on a large corpus of text data, which allows them to understand natural language and make decisions about where to navigate within an app, similar to how a human user would. This automated exploration is combined with techniques to filter out low-quality or irrelevant UI screenshots, ensuring the resulting dataset (MUD) is clean and useful for downstream tasks.

The dataset curation process also includes a final validation step where human annotators review the collected UI screenshots and provide additional metadata, such as labels for individual UI elements. This human-in-the-loop approach helps ensure the dataset meets high standards of quality and accuracy.

The researchers demonstrate the usefulness of the MUD dataset through experiments on two common UI modeling tasks: element detection and UI retrieval. They show that models trained on MUD outperform those trained on existing datasets, highlighting the value of having a large, modern, and well-curated dataset for advancing research in this area.

Critical Analysis

One potential limitation of this work is the reliance on LLMs, which can be opaque and difficult to interpret. While the researchers demonstrate the effectiveness of their LLM-based approach, there may be concerns about the reproducibility and transparency of the data collection process. It would be valuable to see further analysis or ablation studies to better understand the contribution of the LLM component compared to other aspects of the system.

Additionally, the paper does not provide a detailed analysis of the types of UI elements or app categories represented in the MUD dataset. It would be helpful to understand the diversity and distribution of the data, as this could impact the generalizability of the models trained on this dataset.

Finally, while the researchers mention the potential for MUD to support multimodal UI understanding and large-scale data manipulation, these applications are not explored in depth. Further research could investigate how MUD can be leveraged for these and other emerging areas of UI-related research.

Conclusion

This paper presents a novel approach to automatically mining high-quality UI data from Android apps using Large Language Models. The resulting MUD dataset, containing over 18,000 human-annotated UI screenshots, has the potential to serve as a valuable resource for advancing research in mobile UI understanding and modeling. By addressing the challenges of outdated and noisy UI datasets, this work lays the foundation for developing more robust and effective AI-powered tools for designing and optimizing mobile user interfaces.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.

5/14/2024

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Jason Wu, Eldon Schoop, Alan Leung, Titus Barik, Jeffrey P. Bigham, Jeffrey Nichols

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset using an original model, applying automated tools to aggressively filter, score, and de-duplicate the data into a refined higher quality dataset. The original LLM is improved by finetuning on this refined dataset. We applied our approach to several open-source LLMs and compared the resulting performance to baseline models with both automated metrics and human preferences. Our evaluation shows the resulting models outperform all other downloadable baselines and approach the performance of larger proprietary models.

6/13/2024

🤖

Predicting the usability of mobile applications using AI tools: the rise of large user interface models, opportunities, and challenges

Abdallah Namoun, Ahmed Alrehaili, Zaib Un Nisa, Hani Almoamari, Ali Tufail

This article proposes the so-called large user interface models (LUIMs) to enable the generation of user interfaces and prediction of usability using artificial intelligence in the context of mobile applications.

5/8/2024

UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven experienced designers. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

8/15/2024