Business process conformance checking (a.k.a. conformance checking for short) is a family of

process mining Process mining is a family of techniques relating the fields of data science and process management to support the analysis of operational processes based on event logs. The goal of process mining is to turn event data into insights and actions. ...

techniques to compare a

process model The term process model is used in various contexts. For example, in business process modeling the enterprise process model is often referred to as the ''business process model''. Overview Process models are processes of the same nature that a ...

with an event log of the same process. It is used to check if the actual execution of a

business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...

, as recorded in the event log, conforms to the model and vice versa. For instance, there may be a process model indicating that

purchase order A purchase order is a commercial document and first official offer issued by a buyer to a seller, indicating types, quantities, and agreed prices for products or services. It is used to control the purchasing of products and services from externa ...

s of more than one million euros require two checks. Analysis of the event log will show whether this rule is followed or not. Another example is the checking of the so-called “ four-eyes” principle stating that particular activities should not be executed by one and the same person. By scanning the event log using a model specifying these requirements, one can discover potential cases of

fraud In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover monetary compens ...

. Hence, conformance checking may be used to detect, locate and explain deviations, and to measure the severity of these deviations.

Overview

Conformance checking techniques take as input a process model and event log and return a set of differences between the behavior captured in the process model and the behavior captured in the event log. These differences may be represented visually (e.g. overlaid on top of the process model) or textually as lists of natural language statements (e.g., activity x is executed multiple times in the log, but this is not allowed according to the model). Some techniques may also produce a normalized measures (between 0 and 1) indicating to what extent the process model and the event log match each other. The interpretation of non-conformance depends on the purpose of the model: * If the model is intended to be

descriptive In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All acad ...

, discrepancies between model and log indicate that the model needs to be improved to capture reality better. * If the model is

normative Normative generally means relating to an evaluative standard. Normativity is the phenomenon in human societies of designating some actions or outcomes as good, desirable, or permissible, and others as bad, undesirable, or impermissible. A norm in ...

, then such discrepancies may be interpreted in two ways: they may expose undesirable deviations (i.e., conformance checking signals the need for a better control of the process). or may reveal desirable deviations (i.e., workers may deviate to serve the customers better or to handle circumstances not foreseen by the process model).

Techniques

The purpose of conformance checking is to identify two types of discrepancies: * Unfitting log behavior: behavior observed in the log that is not allowed by the model. * Additional model behavior: behavior allowed in the model but never observed in the log. There are broadly three families of techniques for detecting unfitting log behavior: replay, trace alignment and behavioral alignment. In ''replay'' techniques, each trace is replayed against the process model one event at a time. When a replay error is detected, it is reported and a local correction is made to resume the replay procedure. The local correction may be for example to skip/ignore a task in the process model or to skip/ignore an event in the log. A general limitation of replay methods is that error recovery is performed locally each time that an error is encountered. Hence, these methods might not identify the minimum number of errors that can explain the unfitting log behavior. This limitation is addressed by ''trace alignment'' techniques. These latter techniques identify, for each trace in the log, the closest corresponding trace that can be parsed by the model. Trace alignment techniques also compute an alignment showing the points of divergence between these two traces. The output is a set of pairs of aligned traces. Each pair shows a trace in the log that does not match exactly a trace in the model, together with the corresponding closest trace(s) produced by the model. Trace alignment techniques do not explicitly handle concurrent tasks nor cyclic behavior (repetition of tasks). If for example four tasks can occur only in a fixed order in the process model (e.g. , B, C, D, but they can occur concurrently in the log (i.e. in any order), this difference cannot directly detected by trace alignment, because it cannot be observed at the level of individual traces. Other methods to identify additional

behavior Behavior (American English) or behaviour (British English) is the range of actions and mannerisms made by individuals, organisms, systems or artificial entities in some environment. These systems can include other systems or organisms as wel ...

are based on negative events . These methods start by enhancing the traces in the log by inserting fake (negative) events in all or some traces of the log. A negative event is inserted after a given prefix of a trace if this event is never observed preceded by that prefix anywhere in the log. For example, if event C is never observed after prefix AB, then C can be inserted as a negative event after AB. Thereafter, the log enhanced with negative events is replayed against the process model. If the process model can replay the negative events, it means that there is behavior captured in the process model that is not captured in the log (since the negative events correspond to behavior that is never observed in the log).

Notable algorithms

Comparing footprint matrices

Footprint matrices display the causal dependency of two activities in an event log, e.g., if in an event log, activity a is followed by activity b in all traces but activity b is never followed by b. Toward this kind of dependency, a list of ''ordering relations'' is declared: Let ''L'' be an event log associated with the list ''A'' of all activities. Let a, b be two activities in ''A.'' * a ᐳ_''L'' b if and only if there is a trace σ in ''L,'' in which the pattern (a, b) occurs. * a →_''L'' b if and only if a ᐳ_''L'' b and not b ᐳ_''L'' a. * a #_''L'' b if and only if not a ᐳ_''L'' b and not b ᐳ_''L'' a. * a , , _''L'' b if and only if a ᐳ_''L'' b and b ᐳ_''L'' a. For a process model, such a matrix can also be derived on top of the execution sequences by using the play-out technique. Therefore, based on the footprint matrices, one can reason that if an event log conforms with a regarded process model, the two footprint matrices representing the log and the model are identical, i.e., the behaviors recorded in the model (in this case is the causal dependency) appear at least once in the event log. ''Example'': Let ''L'' be: and a model ''M'' of ''L''. Assume the two matrices are as follows: We can notice that, in the footprint matrix of model ''M,'' the pattern (a, d) is allowed to occur, hence, it causes a deviation in comparison with the event log. The fitness between the event log and the model is computed as follows:

1-\frac

In this example, the fitness is

1-\frac = 0.875

Token-replay technique

Token-based replay is a technique that uses 4 counters (produced tokens, consumed tokens, missing tokens and remaining tokens) to compute the fitness of an observation trace based on a given process model in Petri-net notation. These 4 counters record the status of tokens when a trace is replayed on the Petri net. When a token is produced by a transition, ''produced tokens'' is increased by 1. When a token is consumed to fire a transition, ''consumed tokens'' is increased by 1. When a token is missing to fire a transition, ''missing tokens'' is increased by 1. Remaining tokens records the total ''remaining tokens'' after the trace is complete. The trace conforms with the process model if and only if there are no missing tokens during the replay and no remaining tokens at the end. The fitness between an event log and a process model is computed as follows:

\frac\biggl(1-\frac\biggr) + \frac\biggl(1-\frac\biggr)

where ''m'' is the number of missing tokens, ''c'' is the number of consumed tokens, ''r'' is the number of remaining tokens, ''p'' is the number of produced tokens.

Alignments

Although the token-replay technique is efficient and easy to understand, the approach is designed for Petri net notation and doesn't consider the suitable path generated by the model for the unfit cases. Alignments were introduced to solve the limitations and is considered a highly accurate conformance checking technique and can be applied for any process modeling notation.{{Cite journal, last=van der Aalst, first=Wil, last2=Adriansyah, first2=Arya, last3=van Dongen, first3=Boudewijn, date=2012-01-30, title=Replaying history on process models for conformance checking and performance analysis, url=http://dx.doi.org/10.1002/widm.1045, journal=WIREs Data Mining and Knowledge Discovery, volume=2, issue=2, pages=182–192, doi=10.1002/widm.1045, issn=1942-4787 The idea is that the algorithm performs an exhaustive search to find out the optimal alignment between the observed trace and the process model. Hence, it is guaranteed to find out the most related model run in comparison to the trace.

References

Process mining