A Scalable and Near-Optimal Conformance Checking Approach for Long Traces

Read original: arXiv:2406.05439 - Published 6/11/2024 by Eli Bogdanov, Izack Cohen, Avigdor Gal

A Scalable and Near-Optimal Conformance Checking Approach for Long Traces

Overview

This paper presents a scalable and near-optimal approach for conformance checking of long event logs, which are common in process mining and business process management.
The authors develop an algorithm that efficiently aligns observed process executions (traces) with a reference process model, even for traces that are thousands of events long.
The proposed technique relies on a divide-and-conquer strategy and novel algorithmic techniques to achieve significant performance improvements over existing methods.

Plain English Explanation

When analyzing business processes, researchers and companies often have access to long logs of events that describe how a process was executed over time. Conformance checking is the task of comparing these observed event logs against a reference process model to understand how well the real-world execution matches the expected behavior.

However, conformance checking can be computationally challenging, especially for event logs with thousands of recorded activities. The authors of this paper have developed a new algorithm that can efficiently perform this analysis, even on very long traces of events.

Their key insight is to break the problem into smaller, more manageable pieces, and then stitch the results back together in a smart way. This divide-and-conquer strategy, combined with other novel algorithmic techniques, allows their approach to scale much better than previous methods. As a result, businesses and researchers can now gain deeper insights into their processes by analyzing much richer event data.

Technical Explanation

The paper introduces a novel conformance checking algorithm called Divide-and-Conquer Alignment (DCA). DCA works by recursively partitioning a long event trace into smaller, overlapping segments, aligning each segment independently, and then stitching the local alignments together to form a global alignment.

The authors prove that DCA is guaranteed to find an alignment that is within a small constant factor of the optimal global alignment. This near-optimality property, combined with significant empirical speedups over state-of-the-art techniques, make DCA a compelling approach for conformance checking of real-world business processes.

Key innovations in DCA include:

An efficient algorithm for computing local alignments within each trace segment
A novel technique for merging local alignments into a global alignment
Heuristics for intelligently partitioning traces to balance computational load

The paper evaluates DCA on a range of benchmark datasets, demonstrating its scalability and accuracy compared to existing conformance checking methods. The results show that DCA can handle event logs with millions of events, orders of magnitude larger than what was previously possible.

Critical Analysis

The authors thoroughly discuss the limitations and potential issues with their approach. For example, they note that DCA may not be well-suited for processes with highly interdependent activities, as the divide-and-conquer strategy could miss important dependencies.

Additionally, the authors acknowledge that their near-optimality guarantee relies on certain assumptions about the structure of the process model, which may not always hold in practice. They encourage further research to relax these assumptions and extend the applicability of the DCA algorithm.

One area that could benefit from more discussion is the impact of noisy or incomplete event data, which is common in real-world business applications. The paper focuses on the algorithmic performance, but does not explore how DCA might handle common challenges like missing events or imprecise timestamps.

Overall, the paper presents a compelling and well-executed approach to a significant problem in process mining. The authors have made a substantial contribution by developing a scalable conformance checking technique that maintains strong theoretical guarantees. Further research to address the noted limitations could further strengthen the practical applicability of this work.

Conclusion

This paper introduces a novel algorithm called Divide-and-Conquer Alignment (DCA) that enables efficient conformance checking of long event logs against reference process models. By recursively partitioning traces, computing local alignments, and then stitching them together, DCA achieves significant performance improvements over existing methods.

The authors have shown that DCA can handle event logs with millions of events, orders of magnitude larger than what was previously possible. This advance unlocks the potential for businesses and researchers to gain deeper insights into their processes by analyzing richer data.

While the approach has some limitations, the authors have made a substantial contribution to the field of process mining. DCA represents an important step forward in scaling conformance checking to meet the growing demands of real-world business applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A Scalable and Near-Optimal Conformance Checking Approach for Long Traces

Eli Bogdanov, Izack Cohen, Avigdor Gal

Long traces and large event logs that originate from sensors and prediction models are becoming more common in our data-rich world. In such circumstances, conformance checking, a key task in process mining, can become computationally infeasible due to the exponential complexity of finding an optimal alignment. This paper introduces a novel sliding window approach to address these scalability challenges while preserving the interpretability of alignment-based methods. By breaking down traces into manageable subtraces and iteratively aligning each with the process model, our method significantly reduces the search space. The approach uses global information that captures structural properties of the trace and the process model to make informed alignment decisions, discarding unpromising alignments even if they are optimal for a local subtrace. This improves the overall accuracy of the results. Experimental evaluations demonstrate that the proposed method consistently finds optimal alignments in most cases and highlight its scalability. This is further supported by a theoretical complexity analysis, which shows the reduced growth of the search space compared to other common conformance checking methods. This work provides a valuable contribution towards efficient conformance checking for large-scale process mining applications.

6/11/2024

Object-Centric Conformance Alignments with Synchronization (Extended Version)

Alessandro Gianola, Marco Montali, Sarah Winkler

Real-world processes operate on objects that are inter-dependent. To accurately reflect the nature of such processes, object-centric process mining techniques are needed, notably conformance checking. However, while the object-centric perspective has recently gained traction, few concrete process mining techniques have been presented so far. Moreover, existing approaches are severely limited in their abilities to keep track of object identity and object dependencies. Consequently, serious problems in logs remain undetected. In this paper, we present a new formalism that combines the key modelling features of two existing approaches, in particular the ability of object-centric Petri nets to capture one-to-many relations and the one of Petri nets with identifiers to compare and synchronize objects based on their identity. We call the resulting formalism 'object-centric Petri nets with identifiers', and define alignments and the conformance checking task for this setting. We propose a conformance checking approach for such nets based on an encoding in satisfiability modulo theories (SMT), and illustrate how it can be effectively used to overcome shortcomings of earlier work. To assess its practicality, we perform an evaluation on data from the literature.

4/8/2024

Process Variant Analysis Across Continuous Features: A Novel Framework

Ali Norouzifar, Majid Rafiei, Marcus Dees, Wil van der Aalst

Extracted event data from information systems often contain a variety of process executions making the data complex and difficult to comprehend. Unlike current research which only identifies the variability over time, we focus on other dimensions that may play a role in the performance of the process. This research addresses the challenge of effectively segmenting cases within operational processes based on continuous features, such as duration of cases, and evaluated risk score of cases, which are often overlooked in traditional process analysis. We present a novel approach employing a sliding window technique combined with the earth mover's distance to detect changes in control flow behavior over continuous dimensions. This approach enables case segmentation, hierarchical merging of similar segments, and pairwise comparison of them, providing a comprehensive perspective on process behavior. We validate our methodology through a real-life case study in collaboration with UWV, the Dutch employee insurance agency, demonstrating its practical applicability. This research contributes to the field by aiding organizations in improving process efficiency, pinpointing abnormal behaviors, and providing valuable inputs for process comparison, and outcome prediction.

6/10/2024

Conformance Checking of Fuzzy Logs against Declarative Temporal Specifications

Ivan Donadello, Paolo Felli, Craig Innes, Fabrizio Maria Maggi, Marco Montali

Traditional conformance checking tasks assume that event data provide a faithful and complete representation of the actual process executions. This assumption has been recently questioned: more and more often events are not traced explicitly, but are instead indirectly obtained as the result of event recognition pipelines, and thus inherently come with uncertainty. In this work, differently from the typical probabilistic interpretation of uncertainty, we consider the relevant case where uncertainty refers to which activity is actually conducted, under a fuzzy semantics. In this novel setting, we consider the problem of checking whether fuzzy event data conform with declarative temporal rules specified as Declare patterns or, more generally, as formulae of linear temporal logic over finite traces (LTLf). This requires to relax the assumption that at each instant only one activity is executed, and to correspondingly redefine boolean operators of the logic with a fuzzy semantics. Specifically, we provide a threefold contribution. First, we define a fuzzy counterpart of LTLf tailored to our purpose. Second, we cast conformance checking over fuzzy logs as a verification problem in this logic. Third, we provide a proof-of-concept, efficient implementation based on the PyTorch Python library, suited to check conformance of multiple fuzzy traces at once.

6/19/2024