SmartFlow: Robotic Process Automation using LLMs

Read original: arXiv:2405.12842 - Published 5/22/2024 by Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig, Gautam Shroff

➖

Overview

Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities.
Traditional RPA systems rely on pixel-level encoding, which can be inflexible to changes in user interfaces and input data.
The paper presents SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) and deep learning-based image understanding to adapt to new scenarios without human intervention.

Plain English Explanation

SmartFlow is an AI-powered system designed to automate a wide range of business processes, such as form filling, customer service, invoice processing, and back-office operations. Unlike traditional RPA systems that rely on rigid pixel-level encoding, SmartFlow uses a combination of computer vision and natural language processing to understand the graphical user interface (GUI) and generate the necessary actions to complete a task.

The key innovation in SmartFlow is its ability to adapt to changes in the user interface and variations in input data without requiring human intervention. This is achieved by leveraging pre-trained large language models (LLMs) and deep learning-based image understanding.

Instead of simply recording and replaying a series of mouse clicks and keyboard inputs, SmartFlow first perceives the visible elements on the GUI and converts them into a textual representation. This information is then used by the LLMs to generate a sequence of actions that are executed by a scripting engine to complete the assigned task.

This approach allows SmartFlow to enhance the task-solving capabilities of LLMs and automate a wider range of business processes compared to traditional RPA systems. By automating more screen-based workflows, SmartFlow can help organizations increase productivity and efficiency.

Technical Explanation

The key components of SmartFlow are:

Computer Vision: SmartFlow uses deep learning-based image understanding to perceive and extract textual information from the graphical user interface (GUI).
Natural Language Processing: The textual information extracted from the GUI is then processed by pre-trained large language models (LLMs) to generate a sequence of actions required to complete the task.
Scripting Engine: The actions generated by the LLMs are executed by a scripting engine to automate the business process.

To assess the effectiveness of SmartFlow, the researchers developed a dataset that includes a set of generic enterprise applications with diverse layouts. This dataset is being released for research use.

The evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications, allowing it to automate a wide range of business processes.

Critical Analysis

The researchers acknowledge that while SmartFlow represents a significant advancement in RPA technology, there are still some limitations and areas for further research:

Handling Complex Workflows: The paper focuses on automating relatively simple business processes, and it remains to be seen how well SmartFlow can handle more complex, multi-step workflows.
Contextual Understanding: The current system relies primarily on textual information and may struggle with tasks that require deeper contextual understanding or reasoning about the overall purpose and logic of the application.
Generalization to New Applications: The evaluation was performed on a limited set of enterprise applications, and further research is needed to assess SmartFlow's ability to generalize to a broader range of software systems.

Additionally, it would be interesting to see how SmartFlow compares to other AI-powered automation solutions and LLM-facilitated workflows in terms of performance, adaptability, and real-world deployment.

Conclusion

SmartFlow represents a significant advancement in Robotic Process Automation (RPA) by leveraging pre-trained large language models and deep learning-based image understanding to automate a wide range of business processes. This approach allows SmartFlow to adapt to changes in user interfaces and input data without the need for human intervention, making it a more flexible and scalable solution compared to traditional RPA systems.

The public release of the evaluation dataset and the demonstrated robustness of SmartFlow across diverse enterprise applications suggest that this technology has the potential to enhance productivity and efficiency for organizations by automating an even larger fraction of screen-based workflows.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

➖

SmartFlow: Robotic Process Automation using LLMs

Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig, Gautam Shroff

Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at https://smartflow-4c5a0a.webflow.io/.

5/22/2024

FlowMind: Automatic Workflow Generation with LLMs

Zhen Zeng, William Watson, Nicole Cho, Saba Rahimi, Shayleen Reynolds, Tucker Balch, Manuela Veloso

The rapidly evolving field of Robotic Process Automation (RPA) has made significant strides in automating repetitive processes, yet its effectiveness diminishes in scenarios requiring spontaneous or unpredictable tasks demanded by users. This paper introduces a novel approach, FlowMind, leveraging the capabilities of Large Language Models (LLMs) such as Generative Pretrained Transformer (GPT), to address this limitation and create an automatic workflow generation system. In FlowMind, we propose a generic prompt recipe for a lecture that helps ground LLM reasoning with reliable Application Programming Interfaces (APIs). With this, FlowMind not only mitigates the common issue of hallucinations in LLMs, but also eliminates direct interaction between LLMs and proprietary data or code, thus ensuring the integrity and confidentiality of information - a cornerstone in financial services. FlowMind further simplifies user interaction by presenting high-level descriptions of auto-generated workflows, enabling users to inspect and provide feedback effectively. We also introduce NCEN-QA, a new dataset in finance for benchmarking question-answering tasks from N-CEN reports on funds. We used NCEN-QA to evaluate the performance of workflows generated by FlowMind against baseline and ablation variants of FlowMind. We demonstrate the success of FlowMind, the importance of each component in the proposed lecture recipe, and the effectiveness of user interaction and feedback in FlowMind.

4/23/2024

AutoFlow: Automated Workflow Generation for Large Language Model Agents

Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usually used to guide the working mechanism of agents. However, manually designing the workflows requires considerable efforts and domain knowledge, making it difficult to develop and deploy agents on massive scales. To address these issues, we propose AutoFlow, a framework designed to automatically generate workflows for agents to solve complex tasks. AutoFlow takes natural language program as the format of agent workflow and employs a workflow optimization procedure to iteratively optimize the workflow quality. Besides, this work offers two workflow generation methods: fine-tuning-based and in-context-based methods, making the AutoFlow framework applicable to both open-source and closed-source LLMs. Experimental results show that our framework can produce robust and reliable agent workflows. We believe that the automatic generation and interpretation of workflows in natural language represent a promising paradigm for solving complex tasks, particularly with the rapid development of LLMs. The source code of this work is available at https://github.com/agiresearch/AutoFlow.

7/19/2024

📊

Optimizing Structured Data Processing through Robotic Process Automation

Vivek Bhardwaj, Ajit Noonia, Sandeep Chaurasia, Mukesh Kumar, Abdulnaser Rashid, Mohamed Tahar Ben Othman

Robotic Process Automation (RPA) has emerged as a game-changing technology in data extraction, revolutionizing the way organizations process and analyze large volumes of documents such as invoices, purchase orders, and payment advices. This study investigates the use of RPA for structured data extraction and evaluates its advantages over manual processes. By comparing human-performed tasks with those executed by RPA software bots, we assess efficiency and accuracy in data extraction from invoices, focusing on the effectiveness of the RPA system. Through four distinct scenarios involving varying numbers of invoices, we measure efficiency in terms of time and effort required for task completion, as well as accuracy by comparing error rates between manual and RPA processes. Our findings highlight the significant efficiency gains achieved by RPA, with bots completing tasks in significantly less time compared to manual efforts across all cases. Moreover, the RPA system consistently achieves perfect accuracy, mitigating the risk of errors and enhancing process reliability. These results underscore the transformative potential of RPA in optimizing operational efficiency, reducing human labor costs, and improving overall business performance.

8/28/2024