Stochastic Directly-Follows Process Discovery Using Grammatical Inference

Read original: arXiv:2312.05433 - Published 4/17/2024 by Hanan Alkhammash, Artem Polyvyanyy, Alistair Moffat

🤯

Overview

The paper focuses on the challenge of process discovery, which is the task of constructing a simple model that describes an unknown process based on a collection of execution traces.
The researchers propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces.
The approach aims to promote the discovery of small graphs that accurately represent the input traces and their frequencies.

Plain English Explanation

Imagine you have a bunch of recordings of a process being carried out, but you don't know the details of the actual process. The challenge is to create a simple model that captures the key steps and how they're sequenced, based only on the available recordings.

The researchers developed a new method to address this challenge. Their approach looks at the order in which different actions occur in the recordings, and uses a technique called grammatical inference to build a graph-based model that represents the process.

The key goals are to create a model that is as simple as possible, while still accurately describing the recordings and how frequently different sequences of actions occur. To achieve this, the researchers designed a genetic algorithm that helps the model converge to the most interesting and representative structure.

Experiments on real-world data showed that this new approach can produce smaller models that better capture the details and patterns in the input recordings, compared to existing techniques. This allows for more meaningful reasoning about the underlying process based on the model.

Technical Explanation

The paper proposes a new approach for discovering Directly-Follows Graphs from a collection of process execution traces. Directly-Follows Graphs are a type of model where nodes represent observable actions, and directed arcs indicate possible execution order relationships between the actions.

The key innovation is grounding the discovery process in grammatical inference over the input traces. This allows the approach to construct small graphs that accurately represent the traces and their frequencies. To achieve this, the researchers designed a genetic algorithm that guides the inference parameters towards the discovery of interesting models.

Experiments on real-world datasets demonstrate that the new approach can construct smaller models that better capture the details and patterns in the input traces, compared to state-of-the-art techniques. This enables more meaningful reasoning over the frequencies of the encoded traces, due to the stochastic semantics of the proposed action graphs.

Critical Analysis

The paper presents a novel and promising approach for process discovery, with several strengths. The use of grammatical inference and genetic algorithms is a thoughtful and principled way to balance model simplicity and accuracy. The experimental results on real-world data are compelling, showing clear improvements over existing methods.

However, the paper also acknowledges some limitations. The approach is designed for discovering Directly-Follows Graphs, which may not be expressive enough to capture all types of processes. Additionally, the genetic algorithm introduces additional hyperparameters that require careful tuning.

Further research could explore extending the approach to discover more expressive process models, as well as investigating ways to make the hyperparameter tuning more robust and automated. Comparisons to a broader range of existing techniques, including those that leverage additional information beyond just the trace data, could also provide valuable insights.

Overall, this work represents an important step forward in the challenge of process discovery, with promising directions for continued development and application.

Conclusion

This paper presents a new approach for constructing simple yet accurate models of unknown processes based on collections of execution traces. By grounding the discovery process in grammatical inference and using a genetic algorithm to optimize the model structure, the researchers were able to produce smaller graphs that better capture the details and patterns in the input data.

These advancements open up new possibilities for meaningful reasoning about the underlying processes, which could have valuable applications in domains like business operations, healthcare, and software engineering. As the field continues to evolve, further innovations in process discovery are likely to yield important insights and practical benefits.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

Stochastic Directly-Follows Process Discovery Using Grammatical Inference

Hanan Alkhammash, Artem Polyvyanyy, Alistair Moffat

Starting with a collection of traces generated by process executions, process discovery is the task of constructing a simple model that describes the process, where simplicity is often measured in terms of model size. The challenge of process discovery is that the process of interest is unknown, and that while the input traces constitute positive examples of process executions, no negative examples are available. Many commercial tools discover Directly-Follows Graphs, in which nodes represent the observable actions of the process, and directed arcs indicate execution order possibilities over the actions. We propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces. To promote the discovery of small graphs that also describe the process accurately we design and evaluate a genetic algorithm that supports the convergence of the inference parameters to the areas that lead to the discovery of interesting models. Experiments over real-world datasets confirm that our new approach can construct smaller models that represent the input traces and their frequencies more accurately than the state-of-the-art technique. Reasoning over the frequencies of encoded traces also becomes possible, due to the stochastic semantics of the action graphs we propose, which, for the first time, are interpreted as models that describe the stochastic languages of action traces.

4/17/2024

🚀

The WHY in Business Processes: Discovery of Causal Execution Dependencies

Fabiana Fournier, Lior Limonad, Inna Skarbovsky, Yuval David

Unraveling the causal relationships among the execution of process activities is a crucial element in predicting the consequences of process interventions and making informed decisions regarding process improvements. Process discovery algorithms exploit time precedence as their main source of model derivation. Hence, a causal view can supplement process discovery, being a new perspective in which relations reflect genuine cause-effect dependencies among the tasks. This calls for faithful new techniques to discover the causal execution dependencies among the tasks in the process. To this end, our work offers a systematic approach to the unveiling of the causal business process by leveraging an existing causal discovery algorithm over activity timing. In addition, this work delves into a set of conditions under which process mining discovery algorithms generate a model that is incongruent with the causal business process model, and shows how the latter model can be methodologically employed for a sound analysis of the process. Our methodology searches for such discrepancies between the two models in the context of three causal patterns, and derives a new view in which these inconsistencies are annotated over the mined process model. We demonstrate our methodology employing two open process mining algorithms, the IBM Process Mining tool, and the LiNGAM causal discovery technique. We apply it on a synthesized dataset and on two open benchmark data sets.

5/17/2024

Probabilistic Forecasting with Stochastic Interpolants and Follmer Processes

Yifan Chen, Mark Goldstein, Mengjian Hua, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden

We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a generative model between an arbitrary base distribution and the target. We design a fictitious, non-physical stochastic dynamics that takes as initial condition the current system state and produces as output a sample from the target conditional distribution in finite time and without bias. This process therefore maps a point mass centered at the current state onto a probabilistic ensemble of forecasts. We prove that the drift coefficient entering the stochastic differential equation (SDE) achieving this task is non-singular, and that it can be learned efficiently by square loss regression over the time-series data. We show that the drift and the diffusion coefficients of this SDE can be adjusted after training, and that a specific choice that minimizes the impact of the estimation error gives a Follmer process. We highlight the utility of our approach on several complex, high-dimensional forecasting problems, including stochastically forced Navier-Stokes and video prediction on the KTH and CLEVRER datasets.

8/29/2024

🤷

Sample, estimate, aggregate: A recipe for causal discovery foundation models

Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola

Causal discovery, the task of inferring causal structure from data, promises to accelerate scientific research, inform policy making, and more. However, causal discovery algorithms over larger sets of variables tend to be brittle against misspecification or when data are limited. To mitigate these challenges, we train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables, along with other statistical hints like inverse covariance. Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets. Theoretically, we show that this model is well-specified, in the sense that it can recover a causal graph consistent with graphs over subsets. Empirically, we train the model to be robust to erroneous estimates using diverse synthetic data. Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift, and can be adapted at low cost to different discovery algorithms or choice of statistics.

5/24/2024