Quick viewing(Text Mode)

Efficient Transition Adjacency Relation Computation for Process Model Similarity 3

Efficient Transition Adjacency Relation Computation for Process Model Similarity 3

IEEE TRANSACTIONS ON SERVICES , . X, NO. X, X 2018 1 Efficient Transition Adjacency Relation Computation for Process Model Similarity

Jisheng Pei, Lijie Wen, Xiaojun Ye, and Akhil Kumar

Abstract—Many activities in business process management, such as process retrieval, process mining and process integration, need to determine the similarity between business processes. Along with many other relational behavior semantics, Transition Adjacency Relation (abbr. ) has been proposed as a kind of behavioral gene of process models and a useful perspective for process similarity measurement. In this article we explain why it is still relevant and necessary to improve TAR or pTAR (i.e., projected TAR) computation efficiency and put forward a novel approach for TAR computation based on Petri unfolding. This approach not only improves the efficiency of TAR computation, but also enables the long-expected combined usage of TAR and Behavior Profiles (abbr. BP) in process model similarity estimation.

Index Terms—Transition Adjacency Relation, Petri Net, Unfolding, Process Model Similarity. !

1 INTRODUCTION ing and similarity measurement, researchers have noticed Since its first introduction [1], Transition Adjacency Relation the limitations to its expressive powers [9]. It has been (TAR) has been applied in a wide range of business process discussed in several works that while some process models model analysis scenarios such as similarity measurement [2– may have identical behavior profiles but produce different 6], process model retrieval [7, 8], and process mining [9, 10]. output traces. Different from conformance checking between a process As a result, a comprehensive relational semantics called model and an event log [11], a process similarity algo- 4C Spectrum has been proposed in [13] to make up for the rithm based on TAR (TAR similarity in short) takes two missing expressive power of BP. Nevertheless, although 4C process models as the input and computes a similarity Spectrum is strong in terms of expressive power, it is a value ranging from 0.0 to 1.0 between them. Furthermore, heavy-weight solution and contains hundreds of complex TAR similarity is also different from workflow inheritance relations (resulted from the combination of co-occurrence techniques [12], which also takes two process models as the and causal/concurrent patterns), which is costly to compute input but tries to determine whether one process model is and difficult for users to grasp the insight intuitively. As a inherited from another. TAR in a Petri net simply defines result, [13] has addressed the computation of only two of these relations, leaving most of the 4C Spectrum relations the set of ordered pairs ht1, t2i that can be executed one after another, but despite of its simplicity, the set of TARs unavailable for application this stage. reveal a useful perspective of the behaviors of the business The above developments of relational semantics moti- process model. Later on, TAR was extended to projected vate us to look for a light-weight solution to significantly Transition Adjacency Relation (pTAR) [2], which neglects improve current relational semantics. In [6], it has been the silent transitions so that users may selectively focus on suggested that similarity estimation might benefit a lot from a set of business-relevant transitions, and neglect the rest the combined usage of BP and TAR. Also, it is worthy of by marking them as silent transitions. However, both the notice that there are only 4 types of relations in total even if arXiv:1512.00428v6 [cs.OH] 10 Mar 2020 original TAR computation algorithm [1] and pTAR compu- consider BP together with TAR. However, this idea has not tation algorithm [2] rely on exploiting the state-space of the been broadly applied yet because TAR computation based process model by constructing its reachability graph (RG), on reachability graphs is extremely low in efficiency, and which leaves the state-explosion caused by concurrencies as users may not want to sacrifice the high efficiency of BP an unsolved problem. even though considering TAR additionally may result in On the other hand, Behavior Profiles (BP) [3] have been improvement of similarity estimation quality. To illustrate accepted in recent years as a most widely used relational this, we will first demonstrate the advantage of TAR over semantics for business process analysis. It provides great BP. Motivated by this, we will go deeper into the issue of descriptive power for process behaviors and can be effi- how to improve TAR computation efficiency. ciently computed with unfolding [4]). However, despite its In the above figure, the nets in Fig. 1a and Fig. 1b various applications in scenarios such as consistency check- have identical BP (we refer readers to the original work of [3] for definitions of BP), that is, for both nets we have T1 → {T2,T3,T4} → T5 and {T2,T3,T4} are mutually • L. Wen is the corresponding author with School of Software, Tsinghua University, Beijing 100084, China. in k. Also, for both nets in Fig. 1c and Fig. 1d it holds E-mail: [email protected] T1 → T2 → T3. However, these nets clearly have different • J. Pei, X. Ye are with Tsinghua University and A. Kumar is with Penn behaviors. T2,T3,T4 in Fig. 1a form a parallel structure, State University. whereas in Fig. 1b they form a loop. 2 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

preferable relational semantics for enhancement should be efficiently computable by re-using information from the CFP structure built previously. As a consequence, we are motivated to put TAR’s com- putation in the context of unfolding to save the marginal cost and establish the potential of TAR as such a desir- able addition to BP. This article explores unfolding based TAR/pTAR computation in three stages. First, we confirm (a) Parallel Structure that TAR computation can generally be reduced to the classic coverability problem and therefore solvable using the ‘co’ relations among conditions in the CFP , but the reduc- tion of pTAR computation to coverability problems is not always trivial (Sect. 3). By re-using the causal information in the CFP substantially, the computation of both TAR (b) Loop Structure and pTAR can be further accelerated using a handful of derivation rules (Sect. 4). However, the incompleteness of unfolding information in CFP caused by cut-off events still (c) Sequential Structure restricts the applicability of these rules sometimes. There- fore, in Sect. 5, we describe our findings about a novel kind of finite extension of CFP, in which adequate information is preserved, so that TAR/pTAR can be derived from this extension completely using rules similar to those in Sect. 4. An overview of our work is summarized in Fig. 2. We (d) Sequential Structure with Skips set out to exploit all information available in CFP for TAR computation to the best of our knowledge, and achieved Fig. 1: (a) and (b) (as well as (c) and (d)) have identical BPs the leading performance among existing methods. With this but different behaviors work, we attempt to pave the way for further researches of the computation of TAR/pTAR and its application in business process behavior analysis, especially through com- Compared to BP, which describes the global ordering binations with other behavior abstractions such as BP. between transitions, Transition Adjacency Relation (TAR) This paper is an extended version of our previous captures more ‘local’ features among transitions that occur work [15], but with significant amount of new contributions immediately one after another. This makes it a potential en- to the problem of transition adjacency relation (TAR) com- hancement to BP based behavior description. For example, putation. A new of cut-off criteria for constructing finite in Fig. 1a, T4 may occur immediately after T2 (denoted as prefixes of Petri net unfolding has been provided in this T2

Causal relations between events are substantially re-used

Built CFP Other Abstractions Only co relations between conditions (e.g. BP) are re-used (for coverability check)

Original Petri Net General TAR Computation (Sect. 3)

Reduce TAR/pTAR Check to Coverability Coverability Pairs of transitions Potential No Derived TARs Applications via Can causal relations Yes Accelerated TAR/pTAR Combinations be utilized? Computation

How to fully exploit and Pruning redundant Acceleration Based on Cut-off Handling with continued unfolding How to deal with re-use CFP information Causal Relation Patterns Continued Unfolding with redirected CFP cut-off events? for acceleration? Accelerated TAR Computation (Sect. 4 & 5)

Fig. 2: An overview of our research in unfolding-based TAR computation

relation). Denote X = P ∪ T , for a node x ∈ X, paths s · t1... · x1 and s · t2... · x2 starting at the same place •x = {y ∈ X|(y, x) ∈ F }, x• = {y ∈ X|(x, y) ∈ F }. s, and such that t1 6= t2. Finally, x and y are in concurrency relation, denoted by x co y, if neither x < y nor y < x nor Definition 2.2 (Petri net semantics). Let N = (P,T,F ) be a x#y. Petri net, then Definition 2.3 (Occurrence net). An occurrence net is a net • M : P → N is a marking of N, N is the set of non- O = (C,E,G), in which places are called conditions (C), negative integers. M denotes all possible markings of transitions are called events (E), and: N. M(p) denotes the number of tokens in place p. • O is acyclic, or equivalently, the causal relation is a • A transition t is enabled in M, iff ∀p ∈ •t : M(p) ≥ 1. partial order. If t is enabled at M then it can be fired, and then a • ∀c ∈ C : | • c| ≤ 1, and for all x ∈ C ∪ E it holds new marking M 0 is reachable from M by firing t, ¬(x#x) and the set {y ∈ C ∪ E|y < x} is finite. t 0 0 denoted by M −→ M , such that M = (M\• t) ∪ t•. • For an occurrence net O = (C,E,G), Min(O) de- • A net system is a pair S = (N,M0), where N is a net notes the set of minimal elements of C ∪ E with and M0 is the initial marking of N. respect to <. • A firing sequence σ = ht1, ..., tn−1i leads N from t1 t2 The relation between a net system S = (N,M0) with marking M1 to marking Mn such that M1 −→ M2 −→ tn−1 σ N = (P,T,F ) and an occurrence net O = (C,E,G) is M ...M −−−→ M , also denoted as M −→ M . 3 n−1 n 1 n defined as a homomorphism h : C ∪ E 7−→ P ∪ T such We use σ(i) to denote the ith transition in sequence that h(C) ⊆ P and h(E) ⊆ T . σ. A branching process of S = (N,M0) is a tuple π = (O, h) • S(N,M ) M For a net system 0 , is a reachable mark- with O = (C,E,G) being an occurrence net and h being ing if from initial marking M0 there exists a firing σ a homomorphism from O to S. The maximal branching sequence σ such that M0 −→ M. process of S is called unfolding [17]. The unfolding of a net • A net system is bounded if it has a finite number of system can be truncated once all markings of the original reachable markings. net system and all enabled transitions are represented. This • S(N,M0) is 1-safe iff for each reachable marking M, yields the complete finite prefix (abbr. CFP). M(p) ≤1 for every place p, and is free- iff Definition 2.4 (Complete Finite Prefix). Let S = (N,M ) ∀t , t ∈ T : •t ∩ •t 6= ∅ ⇒ •t = •t . 0 1 2 1 2 1 2 be a system and π = (O, h) a branching process with N = (P,T,F ) O = (C,E,G) The unfolding of a Petri net is derived from its related and . 0 Occurrence Net, whose definition is based on the concepts • A set of events E ⊆ E is a configuration, iff ∀e, f ∈ of causal, conflict, and concurrency relations between Petri net E0 : ¬(e#f) and ∀e ∈ E0 : f < e ⇒ f ∈ E0. The nodes [17]. In a Petri net, two nodes x and y are in causal local configuration [e] for an event e ∈ E is defined relation, denoted by x < y, if the net contains a path with at as {x ∈ E|x < e ∨ x = e}. least one arc leading from x to y. Furthermore, x and y are • A set of conditions X ⊆ C is called co-set, iff for all 0 in conflict relation, denoted by x#y, if the net contains two distinct c1, c2 ∈ X it holds c1 co c2. If X is maximal 4 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

T2 P1 P4 P5 T T1 T3 4 P T P0 2 P6 P7 7 P8 P11 T T T 0 5 6 T11 T8 P P 3 T9 9 P T10 10 Fig. 4: Illustration of TAR and pTAR (a) Original Net

Mark([T7-11])= {P1,P3,P8}, thus T7-11 is a cut-off event, and T8-10 is its corresponding event. T2-6 is a cut-off event caused by looping, whose corresponding event is T0-1.

A reachability graph (RG) is prone to encounter state- explosions with concurrent behaviors (because it enumer- ates all possible execution traces), whereas the CFP of a Petri net only ‘outlines’ the net’s behavior and is much more (b) Reachability Graph compact, as illustrated in Fig. 3b and 3c. In this paper, we adopt Esparza’s algorithm [17] for CFP construction, which can generate more compact CFPs T10 -7 T9- 3 compared to the original CFP construction technique. We T2 -6 T4 -9 T11 -12 also refer readers to [17] for more examples on unfolding. T -1 T -2 0 1 T3- 5 T8- 10 Let us now present the definition of Transition Adjacency Relation (TAR) and Projected Transition Adjacency Relation T - 4 5 T6- 8 T7- 11 (pTAR): (c) Corresponding CFP of (a) Definition 2.5 (Transition Adjacency Relation). Let S = (N,M ) be a net system. Let t , t be two transitions of Fig. 3: CFP as a more compact representation for Petri net 0 1 2 S. We say that t , t are in Transition Adjacency Relation, behavior analysis 1 2 denoted as t1

3.1 Reduction of TAR Computation to Coverability non-trivial, because there can be numerous possible firing Problem paths between a pair of transitions. First we show that the computation of TAR can be reduced to the classic coverability problem, that is, checking if a 4 EXPLOITING MORE CFPRELATIONS FOR AC- certain set of places can be ‘covered’ by some reachable CELERATIONS marking of a Petri net. We prove this as follows. Theorem 3.1. For two transitions t1, t2 in a bounded net Although being complete and correct, Algorithm 3.1 in- system: t1

Now we consider the case of (2)t1 • ∩ • t2 6= ∅. To solve this case, we introduce the concept of Max-Event Set. (b) An intuitive illustration of Max-Event Adjacent structure Definition 4.1 (Max-Event Set). Let E0 be a subset of the events in a CFP. We call the set of maximal events in E0 with respect to causal relation as its max-event set, 0 0 denoted as Max(E0) = {e ∈ E0|∀e ∈ E0 : e 6< e }. Given this definition, we have discovered the following results (see [19]).

Lemma 4.1. Let C be a configuration. Let Max(C) = (c) T2 and T4 are both in TAR with T5

{em1 , ..., emk }. Then (1) C\{e} is a configuration if and only if e ∈ Max(C).

(2) [em1 ] ∪ ... ∪ [emk ] = C. Lemma 4.2. Let X be a co-set in a CFP. Let Max(•X) =

{em1 , ..., emk }. Then C = [em1 ] ∪ ... ∪ [emk ] is a configu- ration, and X ⊆ Cut(C), Max(C) = Max(•X). Lemma 4.3. e ∈ Max(C) ⇒ •e ⊆ Cut(C\{e}). (d) T4-5 may ‘re-use’ the MEA structures between T2-4 and T5-6

In case (2), when t1 • ∩ • t2 6= ∅, we capture the intuition Fig. 5: Applying Max-Event Adjacent Structure for TAR of ‘non-blocking’ by introducing the following concept of Derivation Max-Event Adjacent (MEA). Definition 4.2 (Max-Event Adjacent). For two events e1 and e2 in a CFP such that e1 < e2, e1 is said to be Max- Moreover, it is easy to prove that Max([e2]\{e2}) ⊆ Event Adjacent (MEA) with e2, denoted as e1 . e2, iff •(•e2). From e1 . e2, we know e1 ∈ Max(•(•e2)), and 0 e1 ∈ Max(•(•e2)). e1 . e2 can be equivalently defined therefore e1 ∈ Max([e2]\{e2}) = Max(C ). Consequently 00 0 as ∀c ∈ e1• : c < e2 ⇒ c ∈ •e2 (more convenient and C = C \{e1} is also a configuration. And by Lemma 4.3 it 00 efficient for implementation). holds that: (b) •e1 ⊆ Cut(C ). Let the marking of π corresponding to Cut(C00), Cut(C0) Intuitively, e1 . e2 means e1 is directly connected to 0 and Cut(C) be denoted as M1,M1,M2 respectively. By (a) e , but with an additional restriction that no third event e1 e2 2 and (b), it holds that M −→ M 0 −→ M and therefore ‘blocks’ any between them. For example, in Fig. 5b, 1 1 2 h(e ) < h(e ). which shows the CFP of the net in Fig. 5a, we have 1 tar 2  •(•T -5) = {T -1,T -3}, but only T -3 is MEA with T -5. 3 0 2 2 3 However, MEA based TAR checking requires that the T -1 is not MEA with T -5 because T -1 < T -3 and con- 0 3 0 2 whole MEA structure is preserved in the CFP. In Fig. 5d, as sequently T -1 6∈ Max(•(•T -5)) (intuitively, T -3 ‘blocks’ 0 3 2 T -5 is a cut-off event, Thm. 4.2 cannot be directly applied. one of the paths between T -1 and T -5). Likewise, we have 4 0 3 To deal with such cases, we provide the following result: T2-4 .T4-6 and ¬(T1-2 .T4-6). MEA (.) is a sufficient condition to detect TARs in case Theorem 4.3. Let π be the CFP of a bounded net system. Let 0 (2), as we can prove using Lemma 4.1 and Lemma 4.3 that e1 be a cut-off event and e1 its corresponding event of π 0 0 for any bounded Petri net’s CFP: and h(e1•) = h(e1•). Then e1 . e2 ⇒ h(e1)

0 t1 t2 applying Lemma 4.3 we know that Mark([e1]) −→ M1 −→ e1 < eik (eik < e1 does not hold because e1 can be added tk 0 e e < e e ... −→ Mark([e2]\{e2}). From Mark([e1]) = Mark([e1]) before ik ). Similarly, we know that 1 2, because 1 co t1 t2 tk e2 must not hold (otherwise, we would have t1

p < q. As e1 .s e2 it holds that ei1 = e1 and Given the definition of MEA structures, in cases when {h(ei2 ), h(ei3 ), ..., h(ein )} are all silent transitions. By t1 • ∩ • t2 6= ∅, we may quickly confirm t1

(a) T7

Fig. 6: A challenging cut-off event example that MEA rule does not cover

transitions for pTAR computation and avoid unnecessary extensions.

P1

T2-z 5.1 Continued Unfolding After Cut-off Events T1-x TINV-y P2 Definition 5.1 (Local Silent Extension). Let β be the full unfolding of a net system S. The local silent extensions of P3 β with respect to an event e1, denoted as LSE(β, e1), is defined as a set of events {e = (t, B)}, in which for each

Tc-u P1 event e = (t, B), B is a co-set of conditions in β and t is a transitions of S, h(B) = •t such that T2-w T -v P corr 2 (1) ∀c ∈ B :(e1 .s c) ∨ (e1 co c), and ∃c ∈ B : e1 .s c (2) β contains no event e satisfying h(e) = t and •e = B P ... 3 Intuitively, LSE(β, e1) extends the original branching process β by only adding any event that is in silent adja- Fig. 7: The handling of cut-off event ‘chains’ is a major cency relation with e1. By recursively doing this, we get an challenge extended CFP in which we shall find all the silent adjacency structures we need, that is: Definition 5.2 (Silently Continued Branching Process). Let π be the CFP of a net system S and e1 an event in π. The silently continued branching processes after e1, denoted as Algorithm 4.1 Accelerated TAR Derivation (IMPROVED) β ∈ L(e1) are a set of branching processes of S such that: 1: function IMPROVED(S, π) - π ∈ L(e) 2: T AR ← ∅ 0 0 0 - ∀β : β ∈ L(e1) and e = (t, B) ∈ LSE(β , e1), 3: for each e1 ∈ π do β = β0 ∪ {e} ∪ e• ∈ L(e) . 4: for each e2 :(e1 . e2) ∨ (e1 co e2) ∨ (e1 .s e2) do 5: T AR ← T AR ∪ {hh(e1), h(e2)i} 0 Intuitively, silently continued branching processes are 6: for each cut-off event e and its corr. event e do 0 derived by recursively extending the original CFP behind e 7: if h(e•) = h(e •) then 1 0 using LSE. We call the maximal silently continued branch- 8: for each e2 : e . e2 do T AR ← T AR ∪ ing process w.r.t. a cut-off event e1 in CFP its silently contin- {hh(e), h(e2)i} ued unfolding, denoted as βe , which can be constructed with 9: for each t1, t2 ∈ S do 1 Algorithm 5.1. 10: /*we can further restrict to t1, t2 : t1 • ∩ • t2 6= ∅ if we do not compute pTAR*/ Similar to the original unfolding algorithm [17], Algo- rithm 5.1 does not necessarily terminate. This suggest we 11: if hh(e1), h(e2)i 6∈ T AR then need to introduce a new kind of termination condition for 12: Reduce t1

0 Algorithm 5.1 Deriving silently continued unfolding after Consider EM = {e|e1 < e < c, c ∈ B}. If EM = ∅, e1 then in the same manner of (a) we can conclude that both 1: function SILENTLY-CONTINUED-UNFOLDING(S, π, e1) condition (1), (2) and (3) of Def. 4.3 will be satisfied at Line 0 2: β ← π e β 0 e1 3 of Algorithm 5.1, and ik will be added in e if it is not 0 1 3: lse ← LSE(βe1 , e1) already in π.Otherwise, if EM = {e|e1 < e < c, c ∈ B} 4: while lse 6= ∅ do E ⊆ {e0 , ..., e0 } is not empty, then it must hold that M i1 ik−1 . 5: e = (t, B) lse c ∈ Cut(C0 ⊕ {e0 , e0 , ..., e0 }) take an event out of and add it to (This is because 1 i1 ik−1 , and in β (c, e) p t C0 ⊕ {e0 , e0 , ..., e0 } {e0 , ..., e0 } e1 with a condition for every output place of . 1 i1 ik−1 , i1 ik−1 are the only events 6: lse ← LSE(β , e ) C ⊕ {e0 } {h(e0 ), ..., h(e0 )} e1 1 added after 1 .) As i1 ik−1 are all silent 7: endwhile transitions, condition (2) for LSE also holds. 8: end function C0 ⊕ {e0 , e0 , ..., e0 } π If 1 i1 ik−1 are already contained in in e0 lse the beginning, then ik can be added to at Line 3 and eventually into βe0 at Line 5. Otherwise, by our induction 0 0 1 0 0 β, there exists an event e1 : h(e1) = h(e1) in π, such that hypothesis, the events in {ei , ..., ei −1} which are not in π 0 0 0 0 1 k in βe , there is an event e : h(e ) = h(e2) and e . e . β 0 1 2 2 1 2 will be added into e1 by Line 5. Immediately after the last one of such events are added into βe0 at Line 5, condition Proof. Let E = {e : e < e < e }, 1 ≤ k ≤ n 1 ik 1 ik 2 LSE e0 lse (3) for is satisfied, and ik will be added to and and restrict, without loss of generality, that eip < eiq ⇒ eventually into βe0 at Line 5. p < q. Since [e ] is a configuration, by recursively applying 1 2 By (a) and (b), by induction we know that Lemma 4.1, we know that C0 ⊕ {e0 , e0 , ..., e0 } π 1 i1 ik is either contained in , or can be added to β 0 in the loop of Line 4-8. C = [e2]\{e2}\{eik }\{eik−1 }\...\{ei1 }\{e1} e1 is a configuration. Consequently, it is easy to prove C0⊕{e0 , e0 , ..., e0 }⊕{e0 } As we have previously shown, 1 i1 ik 2 that C ⊕ {e1},C ⊕ {e1, ei1 },...,C ⊕ {e1, ei1 , ..., ein } and is also a configuration. By the similar reasoning 0 0 0 0 0 C ⊕ {e1, ei1 , ..., ein , e2} = [e2] are all configurations. C ⊕ {e , e , ..., e } ⊕ {e } in (b), we know that 1 i1 ik 2 β 0 will eventually be contained in e1 , and that By Property 4.1, we know that in π there exists a config- e0 < e0 , {e|e0 < e < e0 } ⊆ {e0 , ..., e0 } 1 2 1 2 i1 ik . As uration C0 such that Mark(C0) = Mark(C) and C0 does not {h(e0 ), ..., h(e0 )} i1 ik are all silent transitions, we have contain any cut-off events. 0 0 e1 .s e2.  As Mark(C0) = Mark(C), ⇑ C0 is isomorphic with 2 0 ⇑ C. Let I1 be the isomorphism between C and C . Let 0 2 0 0 e1 = I1 (e1). Since C ⊕ {e1} is a configuration, C ⊕ {e1} is also a configuration. And as C0 does not contain any cut-off 5.2 Determine When to Stop a Silently Continued Un- 0 0 folding event, C ∪ {e1} is contained in π. Being similar with Esparza’s original unfolding algorithm e0 = I2(i ) (1 ≤ k ≤ n) e0 = I2(e ) C ⊕ Let ik 1 k , and 2 1 2 . Since [17], Algorithm 5.1 does not necessarily terminate. To {e , e , ..., e , e } C0⊕{e0 , e0 , ..., e0 , e0 } 1 i1 ik 2 is a configuration, 1 i1 ik 2 avoid state-explosion and infinite unfolding, we should stop should also be a configuration in β (but not necessarily fully silently continued unfolding when enough information has contained in π). For Algorithm 5.1 , we now prove that C0 ⊕ been unfolded for complete pTAR computation. In order to 0 0 0 0 0 0 {e , e , ..., e , e } β 0 C ∪ {e } ⊆ π ⊆ 1 i1 in 2 are in e . First, because 1 do this, we introduce a new type of cut-off events in silently 0 1 0 β 0 e β 0 {e } e1 , 1 is contained in e1 . For events in ik , we prove by continued unfolding as follows: induction: Definition 5.3. (Local Cut-off Events in Silently Continued (a) When k = 1, because (C ∪ {e1}) ⊕ {ei } is a con- 1 Unfolding) Given two events ea, eb in βe \π, let Ea = figuration, (C ∪ {e0 }) ⊕ {e0 } is also a configuration. By 1 1 i1 {e ∈ [e ]: e 6∈ π}, E = {e ∈ [e ]: e 6∈ π}. We say that e < e e0 < e0 a b b isomorphism, as 1 i1 , it holds that 1 i1 . Therefore 0 0 0 eb is in the set of local cut-off events in silently continued B = •ei ⊆ Cut(C ⊕ {e1}) ⊆ βe0 and ∃c ∈ B : e1 < c 1 1 unfolding βe \π, denoted as cut-offe , if (condition (1) of Def. 4.3). For any c in B, we have 1 1 0 {e|e1 < e < c} = ∅, so condition (2) of Def. 4.3 holds (1) Min(•Ea) ⊆ Min(•Eb), β 0 trivially. Therefore, at Line 3 of Algorithm 5.1, if e1 does (2) h((Ea •\• Ea) ∪ (Min(•Eb)\Min(•Ea))) = h(Eb • e0 = (h(e0 ),B) not already contain i1 i1 , then condition (3) is \• Eb), e0 lse satisfied as well, and i1 will be added to and eventually (3) h(ea) is a silent transition, and β 0 inserted into e1 at Line 5. (4) |[Ea]| < |[Eb]|,. k > 1 C0 ⊕ {e0 , e0 , ..., e0 } (b) For any , assume that 1 i1 ik−1 0 β 0 e π can eventually be contained in e1 . If ik is in , then the Using Def. 5.3, we can truncate a silently continued induction hypothesis holds trivially. Next we consider the unfolding to acquire its ‘complete finite prefix’ (denoted e0 π F in π β case of ik not being . as e1 or e1 ) of e1 . Here, ‘complete’ means any pTAR B = •e C0 ⊕ {e0 , e0 , ..., e0 } ⊕ . β Let ik . By isomorphism, 1 i1 ik−1 detectable with s in e1 can be discovered from the prefix {e0 } B ⊆ Cut(C0 ⊕ . ik is also a configuration, so we have with s as well, and ‘finite’ simply means the size of the {e0 , e0 , ..., e0 }) e < e e0 < e0 1 i1 ik−1 . As 1 ik , it holds that 1 ik and prefix is finite. 0 consequently ∃c ∈ B : e1 < c. Therefore condition (1) of Given the definition of our new type of cut-off events, Def. 4.3 holds. the algorithm for constructing the ‘complete finite prefix’ 10 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018 Algorithm 5.2 Finite Prefix of Continued Unfolding for eik ∈ Max([e2]\{e2, ein , ..., eik+1 })(k ≥ 1). In other pTAR Derivation words, from M1 = Mark[e1], there exists a firing h(e ) h(e ) h(e ) 1: function FINITEPREFIXFORPTAR(S, π, e1) i1 i2 in sequence M1 −−−−→ M2 −−−−→ ... −−−−→ Mn+1, 2: F ine1 ← π and h(e2) is enabled at Mn+1. As ei1 = e1 and 3: lse ← LSE(π, e1) {h(ei2 ), h(ei3 ), ..., h(ein )} are all silent transitions, we 4: while lse 6= ∅ do have h(e1) = t1

13: lse ← LSE(F ine1 , e1) distinct h(B) are also finite. Let n be the number of different 14: end if h(B). 15: endwhile Given an event e of F ine1 \π, define the depth of e as the 16: return pTAR length of a longest chain of events g1 < g2 < ... < e such 17: end function that g1, g2, ..., e are all in F ine1 \π; the depth of e is denoted by d(e). Moreover, the following results hold.

with respect to the silently continued unfolding of an event (1) For every event e of F ine1 \π, d(e) ≤ n + 1, where n e1 can be provided as follows. is the number of different h(B). For every chain of events g1 < g2 < ... < gn+1, let Ei = Theorem 5.2. For a pair of transitions t1, t2 in a net system S, [gi]\π, and we know that i < j ⇒ Min(•Ei) ⊆ Min(•Ej). t1

σ = ht1, ti1 , ti2 , ..., tin , t2i, where ti1 , ti2 , ..., tin are silent Min(•En+1)\Min(•Ej) from both sides, we get h((Ei • t t t t1 i1 i2 in t2 \• Ei) ∪ (Min(•Ej)\Min(•Ei))) = h(Ej •\• Ej). Since tasks, i.e. Ms −→ M1 −−→ M2 −−→...−−→ Mn+1 −→ Mn+2. g < g E ⊂ E |E | < |E | In the full unfolding Unf of the net system S, there is a con- i j, we have i j and therefore i j , so if h(gi) is a silent transition, then gj should be recognized as a figuration C in β for which it holds that Cut(C) = Ms. Con- local cut-off event in βe1 . Should Algorithm 5.2 generate gj, sequently, a sequence of events E = he1, ei1 , ei2 , ..., ein , e2i: then it has generated gi before and gj is recognized as a local h(eik ) = tik (1 ≤ k ≤ n), h(e1) = t1, h(e2) = t2 can be cut-off event of F ing1 , too. If h(gi) is not a silent transition, added to C. For each event in {eik } that occurs after e1, then Algorithm 5.2 will not generate gi+1, ..., gn+1 if it has because C ∪ E is a configuration, we have ¬(e1#eik ), and generated gi before, because as e1 < gi, e1 .s gi+1 cannot therefore it holds either e1 co ei or e1 < ei (ei < e1 k k k hold. does not hold because e1 can be added before eik ). Similarly, (2) For every event e of F ine , the sets •e and e• are we know that e1 < e2, because e1 co e2 must not hold 1 finite. And for every k ≥ 0, F ine contains only finitely (otherwise, we would have t1

(⇐) Let E1 = {eik ∈ [e2]|e1 ≤ eik < Theorem 5.4. F in e . e e2}={ei1 , ei2 , ..., ein , n = |E1|}, where eip < eiq ⇒ e1 is complete in the sense that if 1 s 2 β e ∈ F in p < q. As e1 .s e2 it holds that ei1 = e1 and in the full unfolding , then 2 e1 . {h(e ), h(e ), ..., h(e )} are all silent transitions. By i2 i3 in Proof. We denote the original net system, its full unfold- [e ]\{e } Lemma X and Lemma Z, 2 2 is a configuration, and ing and complete finite prefix as S, βŁπ respectively. For a h(e2) is enabled at Mn+1 = Mark([e2]\{e2}). pair of events ea, eb ∈ β : ea .s eb, by Thm. 5.1 we have Consider ein . It holds that ∀e ∈ [e2]\{e2}, ein 6< e, ∃e1 ∈ π, e2 ∈ βe1 : h(e1) = h(ea), h(e2) = h(eb), e1 .s e2, otherwise it contradicts with eip < eiq ⇒ p < q. Therefore in which βe1 is the silently continued unfolding after e1. If ein ∈ Max([e2]\{e2}). Again, by Lemma X and Lemma [e2] is not already contained by πe1 , it must contain some Z, [e2]\{e2, ein } is a configuration, and h(ein ) is enabled cut-off event ec ⊆cut-offe1 in βe1 . For such a cut-off event h(ein ) 0 at Mn = Mark([e2]\{e2, ein } and Mn −−−−→ Mn+1. ec, let Ec = [ec]\π, E1 = [e2]\π\Ec. Furthermore let ec 0 0 By recursively applying the above process, we have be the corresponding event of ec, Ec = {[ec]\π}. Obviously, PEI et al.: EFFICIENT TRANSITION ADJACENCY RELATION COMPUTATION FOR PROCESS MODEL SIMILARITY 11

2 0 there is an isomorphic mapping I1 between ⇑ [ec] and ⇑ [ec] silent continued unfoldings might overlap with existing 2 such that I1 (E2\[ec]) ⊆ βe1 , in which there is an event structures inside the original CFP. These overlapped parts 0 2 0 0 e2 = I1 (e2) and e1 .s e2, h(e2) = h(e2). do not offer new information for pTAR derivation, and incur According to the condition (4) of Def. 5.3, we have serious amount of computational costs (see the following 0 0 |Ec| = |[ec]\π| < |[ec]\π| = |Ec|. On the other hand, for figure). 0 0 0 2 E2 = [e2]\π = Ec ∪ E1 and E2 = [e2]\π ⊆ Ec ∪ I1 (E1), we In such cases, we may re-use existing structures in CFP 0 0 0 have |E2| < |E2| because |Ec| < |Ec|. If [e2] is still not fully as an isomorphism of the behavior after cut-off events. Next, contained in πe1 , then we may repeat the above procedure we show that this can be realized by redirecting the post-arcs 00 00 0 00 and find e2 : h(e2 ) = h(e2) = h(e2), e1 .s e2 in βe1 (noticing of some cut-off events. The basic purpose is to ‘link’ the cut- 00 00 condition (3) of Def. 5.3), and for E2 = [e2 ]\π we still off events back to the existing nodes in the CFP. Actually, we 00 0 have |E2 | < |E2| < |E2|. Since |E| ≥ 0, this procedure may do this when the cut-off events have identical ‘context’ can not be repeated infinitely and must terminate at some with its corresponding event, as the following definition event et such that e1 .s et, h(et) = h(e2),Et = [et]\π is shows. already contained in πe1 . Therefore in Algorithm 5.2, πe1 is Definition 5.5 (Contextual Isomorphism and Redirected complete, i.e. for some event e2 in βe1 , if e1 .s e2, it must CFP). In an unfolding prefix π, let e be a cut-off event 0 0 hold that ∃et ∈ πe1 : e1 .s et, h(et) = h(e2)  and e its corresponding event such that e is not a cut- off event. Let γ : e• → Cut([e0]) denote an injective mapping from e• to the conditions in Cut([e0]). If there 5.3 Prevent Unnecessary Extensions exists a mapping γ such that Based on Thm. 5.4, by building finite prefixes of the silently (1) h(c) = h(γ(c)), c ∈ e•, and continued unfoldings after each event in the original CFP, (2) ∀c ∈ e•, c0 6∈ (e • ∪γ(e•)) : c0 co c ⇔ c0 co γ(c). we can derive all pTARs using only silent adjacency and 0 co relations. We use the following definition to denote the we say that e is contextually isomorphic with e under the γ 0 ‘sum’ of all silently continued CFPs. mapping γ (denoted as e ∼ e ). Definition 5.4 (Merge of Silently Extended CFPs). Let E = Note that the above properties are not implied by the {ei} be the set of events of a complete finite prefix. Then definition of cut-off events, which only guarantees Cut([e]) the merged CFP of the locally extended CFPs after these and Cut([e0]) are mapped to identical set of places. How- unfinished events is denoted as: ever, we notice they can be applied for most of the cut-off [ µ(π) = F in . events in the practical models. ei When a cut-off event e is contextually isomorphic with e ∈E i some e0, we can ‘link’ e towards the conditions in Cut([e0]) By the above results, it is easy to see that every pair according to the γ mapping so as to reuse the existing struc- of transitions in pTAR correspond to at least one pair of ture after Cut([e0]). For every condition c in e•, we redirect events in µ(π) that are in silent adjacency. And therefore the the following e from their original target condition c to pTAR computation algorithm based on silently continued γ(c), and we call these arcs redirected arcs. After that, we unfolding can be given as Algorithm 5.3. remove the original conditions in e• from the CFP. Note that the redirected arcs introduced should not be taken as Algorithm 5.3 Deriving all pTARs based on silently contin- the same as the original arcs in the unfolding, otherwise, ued unfolding they will disrupt the restriction of branching process that 1: function UNFOLDLOCAL(S, π) | • c| ≤ 1, and it may also disrupt the acyclic property 2: pT AR ← ∅ of the original causal relation ‘<’. Therefore the redirected 3: Extended ← ∅ arcs are used separately for pTAR derivation only. We use 4: for each e1, e2 ∈ π : e1 .s e2 do pre(c) to denote the set of starting events whose post-arcs 5: pT AR ← pT AR ∪ hh(e1), h(e2)i are redirected to a condition c, which is defined as: γ 6: Cinit ← cut-off events of π pre(c) = {e ∈ π|∃c0 ∈ e•, e0 ∈ π : e ∼ e0 ∧ γ(c0) = c} 7: for each ec ∈ Cinit do 8: for each e1 ∈ π : e1 .s ec do Intuitively, we could call the events in pre(c), whose 9: if e1 6∈ Extended then post-arcs are redirected, as ‘soft’ cut-off events. We denote 10: Extended ← Extended ∪ {e1} the redirected CFP of the original CFP π as (π\Ec•, pre),

11: F ine1 = where Ec is the set of ‘soft’ cut-off events and pre indicates FINITEPREFIXFORPTAR(S, π, e1) the redirected arc relations.

12: for each eb ∈ F ine1 \π : ea .s eb do In the following example, we could see that the trun- 13: pT AR ← pT AR ∪ hh(ea), h(eb)i cated behaviors after cut-off event T2-4 and the two silent

14: π ← π ∪ F ine1 events are consecutively continued after the conditions 15: end if which redirected arcs point to. We call such events soft cut- 16: return pT AR off events because we can find their succeeding behaviors by 17: end function contextual isomorphism without performing further local extensions. However, the above algorithm is only a baseline result, Note that contextual isomorphism differs from the orig- and can be further improved. In fact, we notice that the inal isomorphism, i.e. the property mentioned and used 12 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

INV_3-8

TT_33--7 INV_2-6

TT22--5 T0 TT00--01 TT33--14

T1 T2 T3 T1-2 INV_2-4 Original Original Net CFP TT33--99

Local Silent T2-3 Extensions INV_3-12

Locally Extended Full Unfolding CFP

Fig. 8: Merged continued CFPs can be as large as the reachability graph.

by McMillan and Esparza in [18] and [17]. The previous Algorithm 5.4 Derive All pTARs from a Net System Using isomorphism cannot guarantee the partial silent adjacency Redirected CFPs structures starting from e can be immediately continued 1: function DERIVEPTARS(S, π0) after ⇑ [e0], since other cut-off events in ⇑ [e0] might have 2: pT AR ← ∅ prevented the unfolding of corresponding continued silent 3: Extended ← ∅ adjacency structures in the original CFP. 4: π ← π0 Compared to the previous isomorphism [17], contex- 5: for each cut-off event ec ∈ π0 do 0 0 tual isomorphism puts additional constraints on the co-sets 6: if 6 ∃e : e ∼ e then hard-cut-off← hard-cut-off 0 ‘near’ the cut-off events. With these additional constraints, it ∪{e } 0 0 ensures that we can look for the isomorphic silent adjacency 7: else for each ec ∈ π : ec ∼ ec do structures. Thus, less local extensions are required. 8: for each c ∈ ec• do π ← π∪{hec, γ(c)i} 9: for each e , e ∈ π : e . e do Definition 5.6 (Redirected Silent Adjacency). For an ordered 1 2 1 s 2 10: pT AR ← pT AR ∪ hh(e1), h(e2)i pair of nodes hn1, n2i in a redirected CFP, they are in 0 11: for each ec ∈ hard-cut-off do redirected silent adjacency, denoted as n1 . n2, if and s 12: for each e ∈ π : e . e do only if one of the following conditions holds: 1 1 s c 13: if e1 6∈ Extended then (1) n1 ∈ •n2, or 14: Extended ← Extended ∪ {e1} 0 F in = (2) n2 is a condition, ∃e ∈ •n2 ∪ pre(n2): n1 .s e and 15: e1 h(e) is a silent transition, or FINITEPREFIXFORPTAR(S, π, e1) 0 e , e : e ∈ F in \π, e .0 e (3) n2 is an event, and ∀c ∈ •n2 : n1 .s c ∨ n1 co c. 16: for each a b b e1 a s b do 0 It is easy to see that the above definition of .s is equiv- 17: pT AR ← pT AR ∪ hh(ea), h(eb)i alent with the original .s within the original CFP π. The 18: π ← π ∪ F ine1 major difference lies in (2), for not only the original causal 19: end if relations but also the redirected arcs represented by pre(c) 20: return pT AR are taken into account while backtracking. In redirected 21: end function CFPs, we need only the .s in Algorithm 5.1, 5.2 in 0 the last section with .s and we can get the pTAR derivation algorithms for redirected CFPs. So a pTAR computing algo- (1) Consider the set of events Eext = [e2]\π and its rithm using redirected CFP can now be given as Algorithm minimal conditions Cmin = Min(•Ec). Without loss of

5.4, for which the original net S its CFP π0 are assumed to generality, let Eext = {em1 , em2 , ..., emk , e2}, where emi < 0 be given as input. emj → i < j. Since Cmin ⊆ π\Ec•, then since .s ⊆ .s, we have Eext ⊆ µ(π\Ec•) and therefore the proposition holds. Theorem 5.5. For two transitions t1, t2 t1

T3 T6 T0 T7 T1 T5

T2 T4

(a) A net with multiple cut-off events

T4-7 T5-9 T1-2 T7-10

T2-4 T6-8 T0-1

T3-3

(b) Behaviors after ‘soft’ cut-off events can be consecutively continued.

Fig. 9: Illustration of ‘soft’ cut-off events

e .0 e0 E0 C E π0 is easy to prove that 1 s mi . If ext does not contain any events of min not included in M are redirected in , the 0 0 cut-off event, then it will correspond to the full sequence of new conditions they link to still form a co-set Cmin in π that 0 Eext, and the proposition already holds. Otherwise, with- is isomorphic with Cmin and ∀c ∈ Cmin : e1 . c. Starting E0 = {e0 , ..., e0 } C0 e out loss of generality, suppose ext m1 mp . Let from min, we can find an event 2 either through local 00 0 0 0 00 0 Cmin = (Cmin\• Eext) ∪ Eext•. If Cmin ⊆ (π\Ec•), then silent extensions in µ(π ) (by Proposition X) or as existing 00 0 0 0 0 by the same reasoning in Thm. 5.4 and (1), after Cmin, a event in π , such that e2 ∈ µ(π ) and e1 .s e2. {e0 , ..., e0 }, h(e0 ) = t sequence with the form of mp+1 2 2 2, can (2) Suppose some emi in EM ∩ π has been redirected 0 be found by performing local silent extensions such that in π . Then since either emi < emi+1 or emi co emi+1 , by 0 0 0 00 e1 .s e2, e2 ∈ µ(π\Ec•). If Cmin 6⊆ (π\Ec•), we could iterate property (2) of Def. X, we know that in the redirect CFP C00 E π0 e0 : h(e0 ) = h(e ) (2) again from min. Since ext is finite, the iteration cannot , some event mi+1 mi+1 mi+1 and obviously be repeated infinitely, so we will eventually arrive at some e . e0 e0 1 s mi+1 . If mi+1 is not a cut-off event, then we can find in 0 0 0 0 e ∈ µ(π\Ec•). π e e e . e 2 an mi+2 which is isomorphic with mi+2 and 1 s mi+2 . Let Ep = {e|e ∈ •c, c ∈ Cmin}. Since ∀c ∈ Cmin : e1 .s c, This can be iterated until we reach some event isomorphic e e0 we know that ∀ep ∈ Ep : e1 .s c. with 2, which proves the proposition, or some mj that is a e0 Let Er ⊆ •Cmin be the set of events that are redirected cut-off event. If mj is not redirected, then by the property 0 in the pre-events of the conditions in Cmin. Given property of contextual equivalence and Prop. X, an event e2 after e0 (2) of context equivalence, even if some pre-events of Cmin mj can be discovered through local silent extensions such 0 e0 ∈ µ(π0) e . e0 e0 not included in EM are redirected in π , the new conditions that 2 and 1 s 2. If mj is redirected, then we 0 0 they link to still form a co-set Cmin in π that is isomorphic can iterate the process of (2). This iteration cannot continue 0 0 with Cmin and ∀c ∈ Cmin : e1 . c. Starting from Cmin, we infinitely because EM is finite, and eventually we will arrive 0 0 0 0 can find an event e2 either through local silent extensions in at ∃e2 ∈ µ(π ) and e1 .s e2, h(e1) = t1 and h(e2) = t2. 0 0 0 0 µ(π ) (by Proposition X) or as existing event in π , such that ⇐) Given that e1 .s e2, consider the co-set B0 = •e2. As 0 0 0 0 e2 ∈ µ(π ) and e1 .s e2 ∀c ∈ •e2 : c co e1 or e1 .s c, let B0co = {c ∈ B0 co e1}, 0 Since only the events in EM ∩ π can be redirected, and B0prec = {c ∈ B0|e1 .s c}. Obviously B0 = B0co ∪ B0prec . 0 0 0 only the events in EM \π would affected, the case of EM ⊆ Take any condition c from B0prec , since e1 .s c , it follows 0 0 0 π is trivial. We now consider the case when EM \π 6= ∅. that ∃ek ∈ •c ∪ prec(c ): e1 .s ek} and h(ek) is a silent (1) If none of the events in EM ∩π has been redirected in transition. 0 0 π . Consider the set of conditions Cmin = Min(•(EM \π)) If ek < c , then let B1 = (B0\ek•) ∪ •ek. We have ek 0 in the original π, and from e1.se2 we have ∀c ∈ Cmin : e1.c. co B0\ek• (obviously ek < c cannot hold, and for any 0 Given property (2) of context equivalence, even if some pre- c ∈ B0\ek•, if ek#c ∨ c < ek, it will lead to a contradiction 14 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

with the fact that B0 is a co-set). We can prove that •ek Core I5 [email protected], 4G DDR3@1333MHz, and Windows 7 00 0 are in co with B0\ek• (for any c ∈ •e, c ∈ B0\ek•, Enterprise OS. For simplicity, we focus on the TAR com- any one of c00#c0, c00 < c0 or c0 < c00 would lead to the puting performance in this experiment, as many existing 0 contradiction with the fact that ek co c ) and consequently algorithms do not support pTAR computation. But still, = (B0\ek•) ∪ •ek is a co-set. this could demonstrate the performance improvement of 0 −1 If ek ∈ prec(c ) then let B1 = (B0\pre (ek)) ∪ •ek. our accelerated method (IMPROVE) against the baseline In the original π, by the property (2) of Def. 5.5, as ek• coverability based method (GENERAL) and other existing −1 have isomorphic context with pre (ek), we can simi- methods, since clearly the improvement in pTAR computing −1 larly prove that ek are in co with B0\pre (ek). Conse- performance, if there is any, will only be greater than that of −1 quently, •ek are in co with B0\pre (ek) and therefore TAR. −1 B1 = (B0\pre (ek)) ∪ •ek is a co-set. In both cases B1 6.1.1 Real-world Data is a co-set Similarly, we could have B1co = {c ∈ B1 co e1}, 0 B1prec = {c ∈ B1|e1 .s c}. Table 2 summarizes the features of our experiment datasets In both cases, we could generate from B0 a new co- [20] consisting of 5 libraries, from which we extract all the set B1 such that ek is enabled at some reachable marking bounded models for evaluation. The table also shows the Mk that contains h(B1), and the firing of h(ek) from that degree of concurrency found in the process models, i.e. reachable marking will lead to a reachable marking Mk+1 the maximum and average number of tokens that occur that contains B0 and therefore enables h(e2), which gives in a single reachable non-error state of the process. Since h(ek) RG-based algorithms may encounter state-explosions when Mk −−−→ Mk+1, and h(e2) is enabled at Mk+1. 0 0 applied on nets with high concurrencies, we count the target Again, since e1 .s ek, we have ∀c ∈ •ek : c co e1 or e1 .s c. net as “Time Limit Exceeded for RG” if it fails to construct Therefore, as B1\B0 = •ek, we have ∀c ∈ •B1 : c co e1 or 0 a RG within acceptable limit, as shown in the last row e1 .s c. Again, we can partition B1 as B1co = {c ∈ B1 co e1} 0 of the table. and B1prec = {c ∈ B1|e1 .s c} (B1 = B1co ∪ B1prec ). Take First, we compare the time costs (see Fig. 10a) of our any condition c from B1 , and we can repeat the above prec algorithm GENERAL (Algorithm 3.1) which is based on iterations to find the next event ek−1 such that h(ek−1) is a h(ek−1) h(ek) coverability reduction against Jin’s approach (JIN) based silent transition and Mk−1 −−−−−→ Mk −−−→ Mk+1. Since on the primitive unfolding derivation rules in [8] (which µ(π) is finite, we could arrive with finite steps at e1 and does not support arbitrary bounded nets since it assumes h(e1) h(ek) M1 −−−→ ...Mk −−−→ Mk+1 and h(e2) is enabled at Mk+1. free-choice WF-nets), and two RG-based algorithms (pure Therefore we have h(e1)

and emi < emj → i < j, we can easily prove that Bk−1 = of unfoldings alone (CFP-BUILD) in each library are also 0 (Bk\Bk) ∪ •emk is a co-set, and h(emk ) is ‘enabled’ at any shown in Fig. 10a. It can be observed that as nets in B2, 0 reachable marking Mk containing h(Bk−1). And since Bk ⊆ B3 and B4 contain more concurrent structures, GENERAL emk •, the firing h(emk ) performs significantly better than all RG-based algorithms 0 In case (2) let B0 = {c ∈ B0|emk ∈ pre(c)}. By while only being a bit slower than RG in A2, M1, which property (2) of Def. X, we can also easily prove that can be explained by the lacking of concurrent structures 0 Bk−1 = (Bk\Bk) ∪ •emk is a co-set, and h(ek) is ‘enabled’ in these two libraries. By comparing their overall time cost at any reachable marking containing h(Bk−1). (TOTAL), we observe that GENERAL is around 11.98 times  and 9.78 times faster than RG and ZHA respectively. We also see that whereas JIN only supports sound and free-choice In Algorithm 5.4, since we can re-use existing CFP nets, our GENERAL achieves a much greater generality structures to observe behaviors after soft cut-off events, (supports any bounded net system) at the expense of only a significant amount of computational costs can be reduced, bit more overhead and shows comparable performance with as will be demonstrated with detailed performance analysis JIN (only 1.1% more time cost in total). in the experiment section. Furthermore, we observe the acceleration effect of the Algorithm 4.1 (IMPROVED), which exploits the relations in CFP further than GENERAL (Fig. 10b). We compute the 6 PERFORMANCE EVALUATION overhead of IMPROVED and GENERAL, i.e., their total 6.1 TAR Computing Performance TAR derivation time cost minus CFP-BUILD time, which costs the same for both algorithms. In Fig. 10b, the total We conduct experiments to evaluate the effectiveness of overheads for GENERAL to derive TAR from CFPs are our proposed CFP-based algorithms using both industrial around 7 times larger than IMPROVED, which indicates (see [20] for further details) and artificial process model that exploiting more CFP relations results in significant datasets. We have implemented our algorithms based on acceleration. the open source BPM tool jBPT1 and all our programs and the process model datasets have been made available on- 6.1.2 Scalability Evaluation line2. All experiments were run on a PC with Intel Dual We compare the scalability of GENERAL, IMPROVED with 1. http://code.google.com/p/jbpt existing JIN, ZHA and RG on artificial Petri nets with 2. http://laudms.thss.tsinghua.edu.cn/trac/Test/raw- growing concurrency structures. In Test A, each net is a attachment/wiki/Share/CTAR. single parallel structure with exactly one transition on each PEI et al.: EFFICIENT TRANSITION ADJACENCY RELATION COMPUTATION FOR PROCESS MODEL SIMILARITY 15 TABLE 1: Features of the five process model libraries used in experiments

A2 B2 B3 B4 M1 Num. of Bounded Nets 256 392 338 272 27 Avg. Transitions/Places 28/44 29/43 27/41 28/41 48/45 Avg./Max Concurrency 2/13 8/14 16/66 14/33 2/4 Time-Limit Exceeded for RG 0 39 36 24 3

7 9x10 CFP-BUILD 7 8x10 GENERAL 7x107 IMPROVED JIN 7 6x10 ZHA 5x107 RG 4x107 3x107 2x107

TimeCost (nano-sec) 1x107 0 0 5 10 15 20 25 30 35 Num. of Concurrent Branches (a) Test-A: Breadth-Growth (a) Time cost comparison on bounded nets 7 9x10 CFP-BUILD 7 8x10 GENERAL 14000 Overhead-of-IMPROVED 7x107 IMPROVED JIN Overhead-of-GENERAL 7 12000 6x10 ZHA 5x107 RG 10000 4x107 8000 3x107 2x107 6000 TimeCost (nano-sec) 1x107 4000 0 0 2 4 6 8 10 12 Num. Transitions on Each Branch Overhead Time Cost (msec) Overhead Time 2000 (b) Test-B: Depth-Growth 0 A2 B2 B3 B4 M1 Total Libraries Fig. 11: Scalability evaluation of TAR derivation algorithms (b) Overhead of GENERAL & IMPROVED

Fig. 10: Efficiency evaluation of proposed TAR derivation of silent tasks. These data-sets are prepared by marking algorithms the same IBM industrial process data-set [20] with varying percentages of silent transitions. We prepare data-sets with five levels of silent task percentage, namely, 10, 30, 50, 70 and branch, and we observe how the increase in the number of 90 percent. Fig. 12a shows the total time cost to derive pTAR concurrent branches affects TAR derivation time. In Test B, from these data-sets under varying percentages of silent we generate concurrent branches of a fixed number (i.e. 5) transitions, using the three algorithms we have discussed. and observe the effect of the number of transitions on each In the next figures, RG stands for the original reachability- branch on the performance of algorithms. graph-based pTAR computation algorithm [2], CU and RCU In both tests, CFP-based algorithms (GENERAL, IM- denote our algorithm based on locally continued unfolding PROVED, JIN) scale much better than RG-based algorithms and its accelerated version using redirected CFP. Fig. 12b, (RG, ZHA). But JIN does not support non-free-choice nets 12c and 12d shows the detailed time cost spent on each or pTAR whereas GENERAL and IMPROVED do. Among library by the three algorithms respectively. CFP-based algorithms, IMPROVED scales the best with As can be observed in the above figures, RCU performs the growing size of concurrency structures, which is more the best among all three algorithms we have tested (RG, efficient than GENERAL in both tests and has comparable CU and its accelerated version RCU). Moreover, we could or even better performance than JIN. see that by using the redirected CFP and avoid unnecessary local silent extensions, the time cost of RCU is significantly less than CU. 6.2 pTAR Computing Performance To meet their needs for analysis, users may treat various sets 7 COMBINING TAR AND BP FOR SIMILARITY- of transitions as ‘projected’ transitions (and the rest silent transitions). Therefore, the percentage of silent transitions BASED PROCESS MODEL QUERYING may differ among scenarios. To observe the performance The implications of the above results are not only about of pTAR derivation algorithms under this diversity, we the improved efficiency of TAR computation. As Behavior run our algorithms on data-sets with different percentages Profiles (BP) are also efficiently computable through Petri 16 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

net unfolding [4], we can easily enhance BP information with additional TAR information by re-using the informa- tion contained in the unfolding structure (and vice versa). Therefore, the long-expected combined usage of TAR and BP for process model similarity estimation [6] could now be 700 Total-RCU realized with low overhead because of the above efficient 600 Total-CU algorithms. Total-RG 500 In Sect. 1, we have demonstrated the advantage of TAR 400 over BP in terms of the discriminative power for various process model behaviors (i.e. loops, parallel branches and 300 silent transitions). In this section, we will conduct exper- 200

TimeCost(sec) iments using industrial benchmark datasets to further il- 100 lustrate the implications of our proposal for process model 0 similarity estimation. 10 30 50 70 90 Percentage of Transitions Being Silent Our evaluation is based on the scenario of process model (a) Total Time Cost of Each Method retrieval. The problem of process model retrieval is defined as: given some query process model p and a repository of process models C, find in C process models that are 300 RCU-b4 RCU-a2 most similar with p, and rank them in order of similarity. 250 RCU-b3 RCU-b2 For benchmark dataset, we used the SAP reference model 200 RCU-m1 repository as in [5], which contains 100 process models in 150 total, among which there are 10 query models. The dataset comes with ground-truth tags provided by experts, suggest- 100 ing which models are similar to which query models by TimeCost(sec) 50 boolean values. For the similarity estimation between tasks 0 with different activity descriptions, we used the edit 10 30 50 70 90 distance threshold as in [5] to determine whether two tasks Percentage of Transitions Being Silent (b) Detailed Performance of RCU with different descriptions points to the same activity. For similarity estimation, we use the classic definition of Jaccard coefficient, by which similarity from the perspective 300 CU-b4 CU-a2 of TAR and BP can be given respectively as follows. 250 CU-b3 CU-b2 Definition 7.1 (TAR Similarity). For net system N1 = CU-m1 200 (P1,T1,F1) and N2 = (P2,T2,F2) (with initial markings 150 of M1,M2 respectively), let TS1 and TS2 be the TAR set of the two nets, then the TAR similarity between N1 and 100 N TimeCost(sec) 2 can be given as follows: 50 |TS1 ∩ TS2| 0 simT ((N1,M1), (N2,M2)) = 10 30 50 70 90 |TS1 ∪ TS2| Percentage of Transitions Being Silent (c) Detailed Performance of CU Definition 7.2 (BP Similarity). For net system N1 = (P1,T1,F1) and N2 = (P2,T2,F2)(with initial markings 300 RG-b4 of M1,M2 respectively), B1 and B2 be the BP set of the RG-a2 250 RG-b3 two nets, then the BP similarity between N1 and N2 can RG-b2 be given as follows: 200 RG-m1 |B1 ∩ B2| 150 simBP ((N1,M1), (N2,M2)) = |B1 ∪ B2| 100 TimeCost(sec) 50 Consequently, the ranking functions for process models C p ∈ P 0 in given a query model can be given as follows 10 30 50 70 90 Percentage of Transitions Being Silent Definition 7.3 (TAR Ranking). For some query model pp ∈ (d) Detailed Performance of RG P and some target model pc in repository C, the TAR ranking of pc is given as Fig. 12: pTAR time cost under varying percentage of silent 0 0 transitions RT (pp, pc) = |{pc ∈ C|simT (pp, pc) ≥ simT (pp, pc)}|

Definition 7.4 (BP Ranking). Similarly, for some query model pp ∈ P and some target model pc in repository C, the BP ranking of pc is given as 0 0 RBP (pp, pc) = |{pc ∈ C|simBP (pp, pc) ≥ simBP (pp, pc)}| PEI et al.: EFFICIENT TRANSITION ADJACENCY RELATION COMPUTATION FOR PROCESS MODEL SIMILARITY 17 1 BP Only (RBP) Enterprise Modelling and Information Systems Architec- TAR Only (RT) tures (EMISA), 2012, pp. 151–164. 0.8 Combined (RTBP) [3] M. Weidlich, J. Mendling, and M. Weske. “Efficient 0.6 consistency measurement based on behavioral profiles of process models,” IEEE Transactions on Software Engi- 0.4

Precision neering, vol. 37, no. 3, pp. 410–429, 2011. [4] M. Weidlich, F. Elliger, and M. Weske. “Generalised 0.2 computation of behavioural profiles based on petri- 0 net unfoldings,” in Proceedings of the 7th International 0 0.2 0.4 0.6 0.8 1 Workshop on Web Services and Formal Methods (WS-FM), Recall 2010, pp. 101–115. Fig. 13: Recall-precision performance of different similarity [5] R. Dijkman, M. Dumas, B. F. van Dongen, R. Ka¨arik,¨ measures and J. Mendling. “Similarity of business process models: Metrics and evaluation,” Information Systems, vol. 36, no. 2, pp. 498–516, 2011. We may use the product of TAR ranking and BP ranking [6] M. Becker, and R. Laue. ”A comparative survey of to combine their influence to the similarity ranking result as business process similarity measures,” Computers in follows Industry, vol. 63, no. 2, pp. 148–167, 2012. Definition 7.5 (TAR-BP Ranking). For some query model [7] M. Dumas, L. Garcia-Banuelos, and R. Dijkman. “Sim- pp ∈ P and some target model in pc ∈ C, the TAR-BP ilarity Search of Business Process Models,” IEEE Data ranking of pc can be given as Engineering Bulletin, vol. 32, no. 3, pp. 23–28, 2009. R (p , p ) = |{p0 ∈ C|R (p , p0 ) · R (p , p0 ) ≤ [8] T. Jin, J. Wang, and L. Wen. “Efficient retrieval of similar TBP p c c T p c BP p c workflow models based on behavior,” in Proceedings RT (pp, pc) · RBP (pp, pc)}| of the 14th Asia-Pacific Web Conference (APWeb), 2012, pp. 677–684. With the above similarity ranking functions, we evaluate [9] L. Wen, J. Wang, W. M. P. van der Aalst, B. Huang, and the process model retrieval performance when (1) only TAR J. Sun. “Mining process models with prime invisible (2) only BP and (2) both TAR and BP are considered. In Fig. tasks,” Data & Knowledge Engineering, vol. 69, no. 10, 13 we compared the precision-recall performance of these pp. 999–1021, 2010. three cases. It can be observed that the combined usage [10] W. M. P. van der Aalst. Process Mining: Data Science of TAR and BP could improve the overall performance of in Action. Springer, 2016. process model retrieval, which agrees with our previous [11] A. Rozinat, and W. M. P. van der Aalst. “Conformance discussions in Sect. 1. checking of processes based on monitoring real behav- ior,” Information Systems, vol. 33, no. 1, pp. 64–95, 2008. 8 CONCLUSION [12] W. M. P. van der Aalst, and T. Basten. “Inheritance of In this article we proposed comprehensive strategies for workflows: an approach to tackling problems related to the computation of TAR and pTAR based on Petri net change,” Theoretical Computer Science, vol. 270, no. 1–2, unfoldings. These strategies attempt to exploit as much pp. 125–203, 2002. information in unfoldings as possible, by translating causal [13] A. Polyvyanyy, M. Weidlich, R. Conforti, M. La Rosa, patterns and co-relation patterns into TAR/pTAR results. A. H. M. ter Hofstede. “The 4C Spectrum of Funda- The fundamental challenge of unfolding based TAR/pTAR mental Behavioral Relations for Concurrent Systems,” computation is the handling of cut-off events, which may in Proceedings of the 35th International Conference on truncate information needed for TAR/pTAR confirmation. Applications and Theory of Petri Nets and Concurrency Novel techniques to derive locally continued unfolding and (Petri Nets), 2014, pp. 210–232. redirected CFP for efficient recovery of necessary informa- [14] J. Esparza, and C. Schroeter. “Unfolding Based Al- tion after cut-off events are proposed, proved and evaluated gorithms for the Reachability Problem,” Fundamenta in this article. Experiments show that the proposed strate- Informaticae, vol. 47, no. 3–4, pp. 231–245, 2001. gies achieve significant performance improvement over [15] J. Pei, L. Wen, X. Ye, A. Kumar, and Z. Lin. “Transition existing methods based on reachability graphs. This will Adjacency Relation Computation Based on Unfolding: support further researches and more scalable applications Potentials and Challenges,” in Proceedings of the Con- of TAR/pTAR in various business process analytic tasks. federated International Conferences: CoopIS, C&TC, and ODBASE (OTM), 2016, pp. 61–79. [16] T. Murata. “Petri nets: Properties, analysis and applica- REFERENCES tions,” Proceedings of the IEEE, vol. 77, no. 4, pp. 541– [1] H. Zha, J. Wang, L. Wen, C. Wang, and J. Sun. “A 580, 1989. workflow net similarity measure based on transition [17] J. Esparza, S. Romer, and W. Vogler. “An improvement adjacency relations,” Computers in Industry, vol. 61, of McMillan’s unfolding algorithm,” Formal Methods in no.5, pp. 463–471, 2010. System Design, vol. 20, no. 3, pp. 285–310, 2002. [2] J. Prescher, J. Mendling, and M. Weidlich. “The Pro- [18] K. L. McMillan, and D. K. Probst. ”A technique of state jected TAR and its Application to Conformance Check- space search based on unfolding,” Formal Methods in ing,” in Proceedings of the 5th International Workshop on System Design, vol. 6, no. 1, pp. 45–65, 1995. 18 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, X 2018

[19] J. Pei, L. Wen, X. Ye. “Computation of Transition Ad- jacency Relations Based on Complete Prefix Unfolding (Technical Report),” arXiv:1512.00428v3, 2017. [20] D. Fahland, C. Favre, B. Jobstmann, J. Koehler, N. Lohmann, H. Volzer,¨ and K. Wolf. “Instantaneous soundness checking of industrial business process models,” in Proceedings of the 7th International Conference on Business Process Management (BPM), 2009, pp. 278– 293.

Jisheng Pei received his BS degree in Com- puter Software and his PhD degree in Computer Science from Tsinghua University. His research interests include the processing and analysis methods of data provenance, process mining, similarity measurements of process traces and models, as well as behavior analytics and ser- vice computing.

Lijie Wen received the BS degree, the Ph.D. degree in Computer Science and Technology from Tsinghua University, Beijing, China, in 2000 and 2007 respectively. He is currently an asso- ciate professor at School of Software, Tsinghua University. His research interests are focused on process mining, process data management (e.g., log completeness, trace clustering, pro- cess similarity, process indexing and retrieval), and lifecycle management of workflow for big data analysis.

Xiaojun Ye received the BS degree in mechan- ical engineering from Northwest Polytechnical University, Xian, China, in 1987 and the Ph.D. degree in information engineering from INSA Lyon, France, in 1994. Currently, he is a profes- sor at School of Software, Tsinghua University, Beijing, China. His research interests include cloud data management, data security and pri- vacy, and database system testing.

Akhil Kumar has a PhD from the University of California, Berkeley, and is a full professor of in- formation systems at the Smeal College of Busi- ness, Pennsylvania State University. His current research interests are in business process man- agement (BPM) and workflow systems, process mining, supply chain and business analytics, and health IT. He is a senior member of IEEE and an associate editor for ACM Transactions on Man- agement Information System. He served as a co-chair for the International Business Process Management Conference in 2017. He has been a principal investigator for National Science Foundation and also received support from IBM, HP, and other organizations for his work.