Bridging Domain Knowledge and Process Discovery Using Large Language Models

Read original: arXiv:2408.17316 - Published 9/2/2024 by Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst

💬

Overview

Automated process discovery methods often overlook valuable domain knowledge.
This paper leverages Large Language Models (LLMs) to integrate domain knowledge directly into process discovery.
The framework creates a bridge between natural language process knowledge and the discovery of robust process models.
A case study with the UWV employee insurance agency demonstrates the practical benefits and effectiveness of the approach.

Plain English Explanation

When organizations try to analyze and improve their internal processes, they often use automated process discovery methods. These methods can uncover patterns and insights from data about how a process actually works. However, they may miss out on important domain knowledge - information that subject matter experts have about how the process should work, or detailed process documentation.

This research paper proposes a new way to incorporate that valuable domain knowledge directly into the process discovery process. The key is using large language models - powerful AI systems that can understand and generate human-like text. By using rules derived from these language models, the researchers were able to guide the automated discovery of process models so they align with both the real-world data and the domain expertise.

This creates a bridge between the natural language knowledge that people have about a process, and the detailed process models that can be used for further analysis and improvement. The researchers demonstrated the practical benefits of this approach through a case study with a government insurance agency, showing how it can enhance standard process discovery techniques.

Technical Explanation

The paper presents a framework that leverages large language models (LLMs) to enhance automated process discovery. LLMs are used to extract rules and constraints from domain knowledge, which are then integrated directly into the model construction process.

The key steps are:

Domain Knowledge Extraction: LLM-based prompts are used to elicit relevant process knowledge from subject matter experts and documentation.
Rule Derivation: The extracted knowledge is used to derive a set of rules and constraints that should guide the process model.
Model Construction: An automated discovery algorithm incorporates the LLM-derived rules to construct a process model that aligns with both the data and the domain knowledge.

This approach creates a feedback loop between the natural language process knowledge and the formal process models, allowing domain expertise to directly shape the discovered models. The researchers evaluated their framework through a case study with the UWV employee insurance agency, demonstrating its practical benefits and effectiveness.

Critical Analysis

The paper makes a compelling case for the value of integrating domain knowledge into automated process discovery. By leveraging LLMs, the framework provides a systematic way to bridge the gap between informal process knowledge and the formal models used for analysis.

One limitation discussed is the reliance on subject matter experts to provide the initial domain knowledge. This could introduce biases or incomplete information. Further research could explore ways to more automatically extract relevant knowledge from a broader set of sources.

Additionally, the case study focuses on a single organizational context. More evaluations across diverse domains would help establish the generalizability of the approach. Potential challenges around scaling the framework to larger, more complex processes could also be investigated.

Overall, the paper presents an innovative solution to a significant challenge in process mining and analysis. By combining the strengths of automated discovery and domain expertise, it offers a promising path forward for organizations looking to gain deeper insights from their process data.

Conclusion

This research paper introduces a novel framework that leverages large language models to enhance automated process discovery. By integrating domain knowledge directly into the model construction process, the approach creates a bridge between natural language process expertise and the formal models used for analysis and improvement.

The demonstrated case study highlights the practical benefits of this approach, showing how it can produce process models that are better aligned with both real-world data and the insights of subject matter experts. As organizations increasingly rely on data-driven process analysis, this work represents an important step forward in unlocking the full value of their process knowledge.

Further research to address the identified limitations and expand the approach to additional contexts could lead to even more transformative advances in the field of process mining and analysis.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Bridging Domain Knowledge and Process Discovery Using Large Language Models

Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst

Discovering good process models is essential for different process analysis tasks such as conformance checking and process improvements. Automated process discovery methods often overlook valuable domain knowledge. This knowledge, including insights from domain experts and detailed process documentation, remains largely untapped during process discovery. This paper leverages Large Language Models (LLMs) to integrate such knowledge directly into process discovery. We use rules derived from LLMs to guide model construction, ensuring alignment with both domain knowledge and actual process executions. By integrating LLMs, we create a bridge between process knowledge expressed in natural language and the discovery of robust process models, advancing process discovery methodologies significantly. To showcase the usability of our framework, we conducted a case study with the UWV employee insurance agency, demonstrating its practical benefits and effectiveness.

9/2/2024

Leveraging Large Language Models for Enhanced Process Model Comprehension

Humam Kourani, Alessandro Berti, Jasmin Henrich, Wolfgang Kratsch, Robin Weidlich, Chiao-Yun Li, Ahmad Arslan, Daniel Schuster, Wil M. P. van der Aalst

In Business Process Management (BPM), effectively comprehending process models is crucial yet poses significant challenges, particularly as organizations scale and processes become more complex. This paper introduces a novel framework utilizing the advanced capabilities of Large Language Models (LLMs) to enhance the interpretability of complex process models. We present different methods for abstracting business process models into a format accessible to LLMs, and we implement advanced prompting strategies specifically designed to optimize LLM performance within our framework. Additionally, we present a tool, AIPA, that implements our proposed framework and allows for conversational process querying. We evaluate our framework and tool by i) an automatic evaluation comparing different LLMs, model abstractions, and prompting strategies and ii) a user study designed to assess AIPA's effectiveness comprehensively. Results demonstrate our framework's ability to improve the accessibility and interpretability of process models, pioneering new pathways for integrating AI technologies into the BPM field.

8/22/2024

💬

LLM4PM: A case study on using Large Language Models for Process Modeling in Enterprise Organizations

Clara Ziche, Giovanni Apruzzese

We investigate the potential of using Large Language Models (LLM) to support process model creation in organizational contexts. Specifically, we carry out a case study wherein we develop and test an LLM-based chatbot, PRODIGY (PROcess moDellIng Guidance for You), in a multinational company, the Hilti Group. We are particularly interested in understanding how LLM can aid (human) modellers in creating process flow diagrams. To this purpose, we first conduct a preliminary user study (n=10) with professional process modellers from Hilti, inquiring for various pain-points they encounter in their daily routines. Then, we use their responses to design and implement PRODIGY. Finally, we evaluate PRODIGY by letting our user study's participants use PRODIGY, and then ask for their opinion on the pros and cons of PRODIGY. We coalesce our results in actionable takeaways. Through our research, we showcase the first practical application of LLM for process modelling in the real world, shedding light on how industries can leverage LLM to enhance their Business Process Management activities.

7/26/2024

A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models

Julian Neuberger, Lars Ackermann, Han van der Aa, Stefan Jablonski

Over the past decade, extensive research efforts have been dedicated to the extraction of information from textual process descriptions. Despite the remarkable progress witnessed in natural language processing (NLP), information extraction within the Business Process Management domain remains predominantly reliant on rule-based systems and machine learning methodologies. Data scarcity has so far prevented the successful application of deep learning techniques. However, the rapid progress in generative large language models (LLMs) makes it possible to solve many NLP tasks with very high quality without the need for extensive data. Therefore, we systematically investigate the potential of LLMs for extracting information from textual process descriptions, targeting the detection of process elements such as activities and actors, and relations between them. Using a heuristic algorithm, we demonstrate the suitability of the extracted information for process model generation. Based on a novel prompting strategy, we show that LLMs are able to outperform state-of-the-art machine learning approaches with absolute performance improvements of up to 8% $F_1$ score across three different datasets. We evaluate our prompting strategy on eight different LLMs, showing it is universally applicable, while also analyzing the impact of certain prompt parts on extraction quality. The number of example texts, the specificity of definitions, and the rigour of format instructions are identified as key for improving the accuracy of extracted information. Our code, prompts, and data are publicly available.

7/29/2024