A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

Cong Liu1, Boudewijn van Dongen1, Nour Assy1 and Wil M. P. van der Aalst2,1 1Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600MB Eindhoven, The Netherlands 2Department of Computer Science, RWTH Aachen University, 52056 Aachen, Germany

Keywords: Pattern Instance Detection, Behavioral Design Pattern, Software Execution Data, General Framework.

Abstract: The detection of provides useful insights to help understanding not only the code but also the design and architecture of the underlying software system. Most existing design pattern detection approaches and tools rely on source code as input. However, if the source code is not available (e.g., in case of legacy software systems) these approaches are not applicable anymore. During the execution of software, tremendous amounts of data can be recorded. This provides rich information on the runtime behavior analysis of software. This paper presents a general framework to detect behavioral design patterns by analyzing sequences of the method calls and interactions of the objects that are collected in software execution data. To demonstrate the applicability, the framework is instantiated for three well-known behavioral design patterns, i.e., observer, state and strategy patterns. Using the open-source process mining toolkit ProM, we have developed a tool that supports the whole detection process. We applied and validated the framework using software execution data containing around 1000.000 method calls generated from both synthetic and open-source software systems.

1 INTRODUCTION latter specifies how classes and objects interact. Many techniques have been proposed to detect design pat- As a common design practice, design patterns have tern instances. Table 1 summarizes some typical de- been widely applied in the development of many soft- sign pattern detection approaches for object-oriented ware systems. The use of design patterns leads to software by considering the type of analysis (i.e., sta- the construction of well-structured, maintainable and tic, dynamic and combination of both), artifacts that reusable software systems (Tsantalis et al., 2006). are required as inputs, the description of pattern in- Generally speaking, design patterns are descriptions stances (i.e., what roles are used to describe the pat- of communicating objects and classes that are custo- tern), tool support, and the extensibility. mized to solve a general design problem in a parti- Based on the analysis type, these techniques can cular context (Gamma, 1995). The detection of im- be categorized as static analysis techniques, combina- plemented design pattern instances during reverse en- tion analysis techniques and dynamic analysis techni- gineering can be useful for a better understanding of ques. Static analysis techniques (e.g., (Bernardi et al., the design and architecture of the underlying software 2015), (Bernardi et al., 2014), (Dabain et al., 2015), system. As a result, the detection of design patterns (De Lucia et al., 2009b), (Dong et al., 2009), (Fontana is a challenging problem that has received a lot of at- and Zanoni, 2011), (Niere et al., 2002), (Shi and Ols- tention in software engineering domain (Arcelli et al., son, 2006), (Tsantalis et al., 2006)) take the source 2009), (Arcelli et al., 2010), (Bernardi et al., 2015), code as input and only consider the structural con- (Bernardi et al., 2014), (Dabain et al., 2015), (Dong nections among classes of the patterns. Therefore, et al., 2009), (Fontana and Zanoni, 2011), (Niere the detected pattern instances based on static analy- et al., 2002), (Shi and Olsson, 2006). sis only satisfy structural constraints. With the abun- A design pattern can be seen as a tuple of software dance of static analysis tools on the one hand, and elements, such as classes and methods, conforming to the growing availability of software execution data on a set of constraints. Constraints can be described from the other hand, combination analysis techniques come both structural and behavioral aspects. The former into reach. Combination analysis techniques (e.g., defines classes and their inter-relationships while the (De Lucia et al., 2009a), (Heuzeroth et al., 2003), (Ng

65

Liu, C., van Dongen, B., Assy, N. and M. P. van der Aalst, W. A Framework to Support Behavioral Design Pattern Detection from Software Execution Data. DOI: 10.5220/0006688000650076 In Proceedings of the 13th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2018), pages 65-76 ISBN: 978-989-758-300-1 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

Table 1: Summary of Existing Design Pattern Detection Approaches. Reference Analysis Type Required Artifacts Pattern Instance Tool Extensibility (Niere et al., 2002) static analysis source code class roles   (Shi and Olsson, 2006) static analysis source code class & method roles   (Tsantalis et al., 2006) static analysis source code class roles   (Dong et al., 2009) static analysis source code class roles   (De Lucia et al., 2009b) static analysis source code class roles   (Fontana and Zanoni, 2011) static analysis source code class roles   (Bernardi et al., 2014) static analysis source code class roles   (Dabain et al., 2015) static analysis source code class roles   (Bernardi et al., 2015) static analysis source code class roles   (Heuzeroth et al., 2003) combination analysis source code & execution data class & method roles   (Wendehals and Orso, 2006) combination analysis source code & execution data class & method roles   (Von Detten et al., 2010) combination analysis source code & execution data class roles   (Ng et al., 2010) combination analysis source code & execution data class roles   (De Lucia et al., 2009a) combination analysis source code & execution data class roles   (Arcelli et al., 2009), (Arcelli et al., 2010) dynamic analysis execution data unclear   et al., 2010), (Von Detten et al., 2010), (Wendehals • Missing Explicit Definition of Pattern Instance and Orso, 2006)) take as input the candidate design Invocation. To check the behavioral constraints pattern instances that are detected by static analysis of a candidate pattern instance, execution data that tools and software execution data, and check whet- characterize the behavior of the candidate are nee- her the detected candidate pattern instances conform ded. Normally, the behavioral constraints are de- to the behavioral constraints. Therefore, the identified fined based on the notion of pattern instance in- pattern instances using combination techniques match vocation which represents one independent exe- both structural and behavioral constraints. Compa- cution of the underlying pattern instance. One red with static analysis techniques, the combination needs to check the behavioral constraints against techniques can eliminate some of the false positives each pattern instance invocation. Existing dyna- caused by static techniques because the candidate pat- mic techniques do not define clearly a pattern in- tern instances are examined with respect to the beha- stance invocation, which causes inaccurate beha- vioral constraints. vioral constraint checking. Both the static analysis and combination analysis • Inability to Support Novel Design Patterns. An techniques require source code as input. However, if approach is extensible if it can be easily adapted to the source code is not available (e.g., in case of legacy some novel design patterns rather than only sup- software systems) these approaches are not applicable porting a sub-group of existing patterns. Existing anymore. Dynamic analysis techniques can detect de- dynamic analysis approaches do not provide ef- sign pattern instances directly from the software exe- fective solutions to support new emerging design cution data by considering sequences of the method patterns with novel structural and behavioral cha- calls and interactions of the objects that are involved racteristics. This limits the applicability and ex- in the patterns. However, existing researches into dy- tensibility of existing approaches in the large. namic analysis techniques suffer from the following problems that restrict the application: • Lacked Tool Support. The usability of an appro- • Unclear Description of Detected Pattern In- ach heavily relies on its tool availability. Existing stances. A design pattern instance is represen- dynamic analysis approaches do not provide usa- ted as a tuple of the participating classes or met- ble tools. This unavailability prohibits other rese- hods each acting a certain role. Existing dynamic archers to reproduce the experiment and compare techniques (Arcelli et al., 2009), (Arcelli et al., their new approaches. 2010) do not clearly define pattern instances. This In this paper, a general framework is proposed to may influence the precision of behavioral con- support the detection of design pattern instances from straint checking as some (class or method) roles execution data with full consideration of the limita- that need to be verified are not specified. In ad- tions of existing work. We clearly define a pattern dition, a lot of manual work is required to under- instance as a set of roles that each is acted by a class stand the detected pattern instances if they are not or a method. In addition, to ensure accurate beha- described properly (Pettersson et al., 2010). vioral constraint checking for execution data that in- volve multiple pattern instance invocations, we preci-

66 A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

sely define the notion of pattern instance invocation ware execution data. A method call is the basic unit of and propose a refactoring of the execution data (tra- software execution data (Liu et al., 2016), (Leemans ces) in such a way that each refactored trace repre- and Liu, 2017) and its attributes can be defined as fol- sents a pattern instance invocation. Our framework lows: definition is generic and can be instantiated to support Definition 1 (Method Call, Attribute). For any m ∈ new patterns (or new pattern variants). To validate the UM, the following standard attributes are defined: proposed approach, we have developed a tool, named DEsign PAttern Detection from execution data (De- • η : UM → UO is a mapping from method calls to PaD), which supports the whole detection process for objects such that for each method call m ∈ UM, observer, state and strategy design patterns. η(m) is the object containing the instance of the The rest of this paper is organized as follows. method m.b Section 2 defines some preliminaries. Section 3 for- • c : UM → UM ∪ {⊥} is the calling relation among malizes the general framework. Section 4 details each method calls. For any mi,mj ∈ UM, c(mi) = mj step of the instantiation of the framework by taking means that mi is called by mj, and we name mi the as a running example. Section 5 as the callee and mj as the caller. Specially, for introduces the tool support. Section 6 conducts expe- m ∈ UM, if c(m) = ⊥, then mb is a main method. rimental evaluation. Section 7 discusses some threats ∗ to the validity of our approach. Finally, Section 8 con- • p : UM → UO is a mapping from method calls to cludes the paper and presents further directions. their (input) parameter object sequences such that for each method call m ∈ UM, p(m) is a sequence of (input) parameter objects of the instance of the method m. 2 PRELIMINARIES b • ts : UM → UT is a mapping from method calls to their timestamps such that for each method call Given a set S, |S| denotes the number of elements m ∈ UM, ts(m) is start timestamp of the instance in S. We use ∪, ∩ and \ for the union, intersection of method m. and difference of two sets. 0/ denotes the empty set. b • t → is a mapping from method calls to The powerset of S is denoted by P (S) = {S0|S0 ⊆ S}. e : UM UT their timestamps such that for each method call f : X → Y is a total function, i.e., dom( f ) = X is the m ∈ , t (m) is end timestamp of the instance of domain and rng( f ) = { f (x)|x ∈ dom( f )} ⊆ Y is the UM e method m. range. g : X 9Y is a partial function, i.e., dom(g) ⊆ X b is the domain and rng(g) = {g(x)|x ∈ dom(g)} ⊆ Y is Definition 2 (Software Execution Data). Software the range. A sequence over S of length n is a function execution data is defined as a set of method calls, i.e., σ : {1,2,...,n} → S. If σ(1) = a1,σ(2) = a2,...σ(n) = SD ⊆ UM. an, we write σ = ha1,a2,...ani. |σ| = n represents the length of sequence σ is n. The set of all finite se- quences over set S is denoted by S∗. Let σ ∈ S∗ be a sequence, σ(i) represents the ith element of σ where 3 A GENERAL FRAMEWORK TO 1 ≤ i ≤ |σ|. Given a sequence σ and an element e, we DETECT DESIGN PATTERNS have e ∈ σ if ∃i : 1 ≤ i ≤ |σ| ∧ σ(i) = e. FROM EXECUTION DATA To be able to refer to the different entities, we in- troduce the following universes. Let be the met- UM Fig. 1 shows an overview of the framework to detect hod call universe, be the method universe, be UN UC behavioral design patterns. It involves two main pha- the universe of classes and interfaces, be the ob- UO ses, i.e., candidate pattern instance discovery and be- ject universe where objects are instances of classes, havioral constraint checking. Software execution data be the role universe and be the time universe. UR UT and design pattern specification are required as the in- To relate these universes, we use the following notati- put artifacts. The design pattern specification and the ons: For any m ∈ , m ∈ is the method of which UM b UN two phases are detailed in the following. m is an instance. For any o ∈ UO, bo ∈ UC is the class of o. pc : UN → P (UC) defines a mapping from a method to its parameter class set. ms : UC → P (UN) 3.1 Design Pattern Specification defines a mapping from a class to its method set. iv : UN → P (UN) defines a mapping from a method A design pattern specification precisely defines how to its invoked method set. a design pattern should be organized, which includes In this paper, we mainly reason based on the soft- the role set that is involved in the design pattern, the

67 ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

3.2 Phase 1: Candidate Pattern Instance Discovery

By taking the execution data as input, we need to Software Execution Data Design Pattern Specification discover a set of candidate pattern instances, i.e., finding the values (classes or methods) that play certain roles in the pattern instances. Formally, P crs : UR 9 UN ∪ UC is a partial function that maps Phase 1 Phase 2 a sub-set of roles to its corresponding values. In dom(crs) = Discovery our case, we have 0/, i.e., all roles do Checking not have values and we need to brute force all possibilities according to the structural constraints. P P dis : (UR 9UN ∪ UC)→P (UR → UN ∪ UC) is the discovery function that maps an empty pattern in- stance to a set of complete pattern instances. P For any crs ∈ UR 9 UN ∪ UC,rs ∈ dis(crs), we have sc(rs)=true, i.e., the structural constraints hold for each discovered candidate pattern instance. Candidate Pattern Instances Validated Pattern Instances Existing static pattern instances discovery approa- ches focus on structural analysis of classes, operations Figure 1: General Overview of the Framework. and their inter-relationships (e.g., inheritance, realiza- tion, dependency) by exploring the source code. As definition of pattern instance, a set of structural con- for the discovery from execution data, class operati- straints, the definition of pattern instance invocation, ons and some of the relationships, e.g., dependency and a set of behavioral constraints. and inheritance relationships, can be recovered. Rea- Definition 3 (Design Pattern Specification). A design lization relationship cannot be recovered as the inter- P pattern is defined as DP = (UR,rs,sc,pii,bc) such faces cannot be instantiated and recorded during exe- that: cution. Hence, some of the roles that are played by P interfaces as detected from source code will be repla- • Role Set. UR ⊆ UR is the role set of a design pat- tern. ced by the implemented classes from the execution P data. • Pattern Instance. rs : UR → UN ∪ UC is a map- ping from the roles to their values (methods or classes). Essentially, it is an implementation of 3.3 Phase 2: Behavioral Constraint the pattern. Checking P • Structural Constraints. sc : (UR →UN ∪ UC) → IB with IB = {true,false} is used to check the The pattern instance discovery takes as input the exe- structural constraints of a pattern instance. cution data and return a set of candidate pattern in- • Pattern Instance Invocation Identification. stances by considering only the structural constraints of the pattern, i.e., relationship among classes and pii : P (UM) → P (P (UM)) is a function that identifies a set of pattern instance invocations methods. So the discovered candidate pattern instan- from the method call set of a pattern instance. ces inevitably contain false positives, especially for behavioral design patterns. For each candidate pat- • Behavioral Constraints. bc : ( ( ))→IB is P P UM tern instance rs and its execution data SD, we check used to check the behavioral constraints of all in- whether the behavioral constraints given in the speci- vocations of a pattern instance. fication are satisfied with respect to all invocations of Note that the way invocations are identified is spe- a pattern instance, i.e., bc(pii(SD)) = true. cific to each type of design pattern and independent of the number of runs included in the software exe- 3.4 Instantiation of the Framework cution data. In addition, for some structural patterns (e.g., , ), they do not Given a novel design pattern X, the framework is re- necessary contain the definition of pattern instance in- quired to support its detection. To execute the frame- vocation and behavioral constraints. work, design pattern specification of pattern X and software execution data that cover the behavior of such pattern are required. To test the applicability,

68 A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

«interface» The following naming conventions refer to roles Observer Subject observers that are involved in the observer pattern. Note that +update() this naming convention is only used for explanations +register() +unregister() and our approach does not rely on these names. +notify() • Observer is an interface that defines an updating interface for objects that should be notified of foreach o in observers{ ConcreteObserver changes; 0->update() } +update() • Subject is a class that stores state of interest to Ob- server objects and sends notifications to its inte- Figure 2: Structure of the Observer Design Pattern. rested objects when its state changes; we instantiate the proposed methodology to discover • notify is a method that is responsible for notifying the observer, state and strategy patterns. Detailed ex- the observers of a state change in the Subject; planations of the applicability to the observer pattern • update is a method implemented by the Observer is given in the next section with a running example. objects and can be called by the notify method; For the state and strategy patterns, their instantiation • register is a method responsible for adding Obser- details are not repeated for page limits. ver objects to a Subject object; and • unregister is a method responsible for removing Observer objects from a Subject object. 4 DETECTION OF OBSERVER A concrete implementation of the observer pattern DESIGN PATTERN is AISWorld which is an academic community soft- ware for researchers and practitioners. All its mem- We first use the observer design pattern as a running bers subscribe to the news updates by registering to example to illustrate the proposed framework. This a public mailing server. When new events of the is one particular instance of our general framework. community occur, the mailing server will push these Later we provide other examples. news items to all its subscribed members. Community members can also unsubscribe if they do not want to 4.1 Observer Design Pattern and A follow any more. Running Example An excerpt of the execution data that are genera- ted by an independent run of the AISWorld software The observer design pattern (Gamma, 1995) defi- are given in Table 2.1 Its behavior can be described nes one-to-many dependency between objects so that as: (1) three Member objects (Observer objects) are when one object changes state, all its dependents are first created and registered to a MailingServer object automatically notified. The structure of the observer (Subject object); (2) the notifyMember method of the design pattern is shown in Fig. 2. The key partici- MailingServer object is called and the update methods pants of this pattern are Subject and Observer. In this of the three registered Member objects are invoked pattern, there are many Observer objects which are during its execution; (3) these three Member objects observing a particular Subject object. Observers are are unregistered from the MailingServer object; (4) interested and want to be notified when the Subject another two Member objects are created and registe- undergoes a change. Therefore, they register themsel- red to the second MailingServer object; (5) the notify- ves to that Subject. When an observer loses interest Member method of the second MailingServer object in the subject they simply unregister from it. is called and the update methods of the two registered More specifically, a Subject is a class that (1) Member objects are invoked during its execution; and keeps track of a list of Observer references; (2) sends (6) these two Member objects are unregistered from notifications to its registered observers on state chan- the second MailingServer object. ges; and (3) provides interfaces for registering and de- registering Observer objects. The Observer interface defines an update interface for objects that should be notified when the subject changes. When the state of Subject is changed, it invokes the notify method that implements a sequential call of the update method on 1Note: methods are fully quantified including package, all registered Observer objects. class and method names. init represents the constructor of a class.

69 ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

Table 2: An Example of Software Execution Data ID (Callee) Method (Callee) P O (Callee) O Caller Method Caller O Start Time End Time

m1 Member.init – 1807970113 mainclass.main – 709020268 709120368

m2 Member.init – 1807567788 mainclass.main – 709244786 709267786

m3 Member.init – 1488142454 mainclass.main – 709378641 709378641

m4 MailingServer.init – 1333401746 mainclass.main – 717761066 717961966

m5 MailingServer.register 1807970113 1333401746 mainclass.main – 718086509 718086619

m6 MailingServer.register 1807567788 1333401746 mainclass.main – 718261847 718361447

m7 MailingServer.register 1488142454 1333401746 mainclass.main – 718420506 718588315

m8 MailingServer.notifyMembers – 1333401746 mainclass.main – 718686715 719110738

m9 Member.update – 1807970113 MailingServer.notifyMembers 1333401746 718788715 718880715

m10 Member.update – 1807567788 MailingServer.notifyMembers 1333401746 718929841 718929941

m11 Member.update – 1488142454 MailingServer.notifyMembers 1333401746 719050867 719100467

m12 MailingServer.unregister 1807970113 1333401746 mainclass.main – 719270253 719370253

m13 MailingServer.unregister 1807567788 1333401746 mainclass.main – 719408812 719507712

m14 MailingServer.unregister 1488142454 1333401746 mainclass.main – 719570465 719880462

m15 Member.init – 1491288577 mainclass.main – 719712873 719722873

m16 Member.init – 805469502 mainclass.main – 719843307 719943908

m17 MailingServer.init – 1936493073 mainclass.main – 720398401 720698461

m18 MailingServer.register 1491288577 1936493073 mainclass.main – 720719568 720919968

m19 MailingServer.register 805469502 1936493073 mainclass.main – 721077514 721276516

m20 MailingServer.notifyMembers – 1936493073 mainclass.main – 721526122 721976868

m21 Member.update – 1491288577 MailingServer.notifyMembers 1936493073 721526122 721626122

m22 Member.update – 805469502 MailingServer.notifyMembers 1936493073 721825906 721875908

m23 MailingServer.unregister 1491288577 1936493073 mainclass.main – 722279646 722379646

m24 MailingServer.unregister 805469502 1936493073 mainclass.main – 722579858 722779768

m25 TestMail.mainclass.main 5336152135 – – – 703720242 723873758 Note: O is short for object, P is short for parameter and – means the value is unavailable.

4.2 Candidate Observer Pattern An observer pattern instance is an implementation Instances Discovery of the observer pattern and it defines a binding from the role set to its values. The discovery process aims to find the missing values for all roles involved in the In this subsection, we introduce how to discover can- role set of observer pattern with respect to the struc- didate observer pattern instances from software exe- tural constraints. The following definition formalizes cution data. Note that different levels of granularity the structural constraints of the observer pattern. can be used for the roles of a design pattern. For Definition 5 (Structural Constraints of Observer Pat- example, most existing works (e.g., (Bernardi et al., tern). For each observer pattern instance rso, we have 2015), (Bernardi et al., 2014), (Dabain et al., 2015), sco(rso)=true iff: (De Lucia et al., 2009a), (De Lucia et al., 2009b), o o (Dong et al., 2009), (Fontana and Zanoni, 2011), (Ng • rs (not) ∈ ms(rs (Sub)), i.e., notify is a method of et al., 2010), (Niere et al., 2002) and (Tsantalis et al., Subject; and 2006)) use a tuple of Subject and Observer as roles • rso(reg) ∈ ms(rso(Sub)), i.e., register is a method of observer pattern. The coarse-grained descriptions of Subject; and of pattern instances are not enough to preform pre- • rso(unr) ∈ ms(rso(Sub)), i.e., unregister is a met- cise behavioral constraint checking. This paper aims hod of Subject; and to give a complete description of design patterns (in • rso(upd) ∈ ms(rso(Obs)), i.e., update is a method terms of class and method roles) . The following de- of Observer; and finition specifies the observer pattern in detail. • rso(Obs) ∈ pc(rso(reg)), i.e., register should con- Definition 4 (Role Set and Pattern Instance of Ob- tain a parameter of Observer type; and server Pattern). Role set of the observer pattern is • rso(Obs) ∈ pc(rso(unr)), i.e., unregister should O O O O defined as UR = URC ∪ URM where URC ={Sub,Obs} contain a parameter of Observer type; and O o O o o and URM ={not,upd,reg,unr}. rs : UR → UC ∪ UN • rs (Obs) ∈/ pc(rs (not)), i.e., notify should not is a mapping from the role set of observer pattern contain a parameter of Observer type; and O o o o to their values such that ∀r ∈URC : rs (r) ∈ UC and • rs (upd) ∈ iv(rs (not)), i.e., update should be in- O o ∀r ∈URM : rs (r) ∈ UN. voked by notify; and

70 A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

• rso(reg) 6= rso(unr), i.e., register and unregister 4.3 Behavioral Constraint Checking can not be played by the same method. Based on the structural constraints, we propose an This section introduces how to check whether a dis- algorithm to discover candidate observer pattern in- covered candidate observer pattern instance conforms stances from software execution data. Before outli- to the behavioral constraints. Because one or more ning the algorithm, some important concepts and no- invocations may be involved in the execution data, tations that are used in the reminder are introduced. we need to identify independent observer pattern in- Given software execution data SD, we define: stance invocations from the execution data. Because not all method calls in the software execution data are • cs(SD) = {η[(m)|m ∈ SD} is the class set involved relevant with the observer pattern instance that we are in the software execution data SD; going to check, we first define execution data for an observer pattern instance. • For any c ∈ UC, ms(c,SD) = {mb|η[(m) = c} is the method set of class c in execution data SD; Definition 6 (Execution Data of Observer Pat- tern Instance). Let rso be an observer pat- • For any n ∈ UN, pc(n,SD) = {o|∃m ∈ SD : b tern instance and SD be the execution data. mb = n ∧ o ∈ p(m)} is the parameter class set of O O o method n in the software execution data SD; and SD = {m ∈ SD | ∃r ∈ UR : rs (r) = mb} are the execution data of observer pattern instance rso. • For any n ∈ UN, iv(n,SD) = {m|m ∈ SD∧ b According to specification of observe design pat- cd(m) = n} is the invoked method set of method n tern, an observer pattern instance invocation starts in the software execution data SD. with the creation of one Subject object and involves For the discovery of observer pattern instances all method calls such that: (1) the method plays a role from execution data, we first identify a set of possible in the observer pattern instance; and (2) its object is values for each role as intermediate results. This the Subject object or its caller object is the Subject o O is defined as rs∗ : UR → P (UC ∪ UN). Then we object and the object is an Observer object. In ad- define a function ω that generates a set of pattern dition, by taking the same observer pattern instance instances by exploiting all possible combinati- execution data as input, the set of identified observer ons of different role values. Formally, we have pattern instance invocations should be unique. O O ω : (UR → P (UN ∪ UC)) → P (UR → UN ∪ UC). Definition 7 (Observer Pattern Instance Invocation). o O For any rs∗ ∈ UR → P (UC ∪ UN), (1) for Let rso be an observer pattern instance and SDO be o o o O o any rs ∈ ω(rs∗): dom(rs ) = UR ; (2) for its execution data. We define invocation set of rs as O o S o o O O any r ∈ UR : rs∗(r) = rs (r); (3) pii (SD ) = {I1,I2,...,In} ⊆ P (SD ), such that: rso∈ (rso) ω ∗ o O o o0 o O o o0 • for any Ii ∈ pii (SD ), m ∈ Ii where 1 ≤ i ≤ n, @rs ,rs ∈ ω(rs∗): ∀r ∈ UR , rs (r) = rs ; and o o o there exists o ∈ UO and rs (Sub) = o such that: (4) |ω(rs∗)| = ∏ |rs∗(r)|. b r∈UO R o o o Assume that design pattern W invol- – (mb = rs (reg) ∨ mb = rs (unr) ∨ mb = rs (not))∧ ves with two type of roles (X and Y), we (η(m) = o), i.e., the object of each method call w w have rs∗ (X) = {a,b} and rs∗ (Y) = {c,d}. is the Subject object and the method should Then we generate four pattern instances play the role of register, unregister or notify; or w w w w ω(rs∗ )={{rs1 (X)={a},rs1 (Y)={c}},{rs2 (X)={a}, – m=rso(upd)∧η(c(m))=o∧rso(Obs) = η[(m), w w w w b rs2 (Y) = {d}},{rs3 (X) = {b},rs3 (Y) = {c}},{rs4 (X) i.e., the caller object of each method call is the w = {b},rs4 (Y) = {d}}. The pseudocode description of Subject object and the method should play the the observer pattern instance discovery is given in the role of update. following algorithm. • for any I ,I ∈ piio(SDO), m ∈ I , m0 ∈ I where As the algorithm discovers a set of candidate i j i j 1≤i

71 ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

Algorithm 1: Candidate observer pattern instances discovery. Input: Software execution data SD. Output: Observer pattern instance set RS. 1 RS ← 0/, ClassSet ← cs(SD). /**initialization.**/ 2 for ci ∈ ClassSet do 3 /**subject class should at least contain three methods**/ 4 if |ms(ci,SD)| ≥ 3 then 5 for cj ∈ ClassSet do 6 /**observer class should at least contain one method.**/ 7 if ci 6= cj & |ms(cj,SD)| ≥ 1 then 8 /**create intermediate result op.**/ 9 op(Sub)←{ci}, op(Obs)←{cj}, op(not)←0/, 10 op(upd) ← 0/, op(reg) ← 0/, op(unr) ← 0/; 11 for mi ∈ ms(ci,SD) do 12 /**register and unregister are played by methods with a parameter of observer class.**/ 13 if cj ∈ pc(mi,SD) then 14 /**set values for register and unregister roles of op.**/ 15 op(reg) ← op(reg) ∪ {mi}; 16 op(unr)←op(unr)∪{mi}; 17 if |op(reg)| ≥ 2 then 18 for mj ∈ ms(ci,SD) do 19 /**notify should not contain a parameter of observer class.**/ 20 if mj ∈/ op(reg) & cj ∈/ pc(mj,SD) then 21 for mk ∈ iv(mj,SD) do 22 /** update is invoked by notify.**/ 23 if mk ∈ ms(cj,SD) then 24 op(not) ← op(not) ∪ {mj}; 25 op(upd) ← op(upd) ∪ {mk};

26 /** for each op, we generate all possible role to value combinations as candidate instances.**/ 27 for rs ∈ ω(op) do 28 if rs(reg) 6= rs(unr) then 29 RS ← RS ∪ {rs};

30 return All detected observer pattern instances RS.

Table 3: Two Observer Candidate Pattern Instances. put) parameter objects of method calls with n Sub Obs not upd reg unr being their method and these objects are of class o rs1 MS. M. notifyM update register unregister type c in I; rso MS. M. notifyM update unregister register 2 0 0 MS. and M. are short for MailingServer and the Member. • For any m ∈ UM, Iv(I,m) = {m ∈ I|c(m ) = m} is the invoked method call set of method call m After obtaining candidate pattern instances and re- in I; and factoring execution data by invocation identification, 0 0 • For any m∈UM, Pre(I,m)={m ∈I|te(m ) we can check whether a candidate conforms to the be- < ts(m)} is the set of method calls that are havioral constraints. To this end, we formally define invoked before method call m in I. the behavioral constraints of the observer pattern. The following notations and operators are defined on the Definition 8 (Behavioral Constraints of Observer Pat- basis of each invocation. tern). For each observer pattern instance rso, the be- Given an observer pattern instance invocation havioral constraints bco(piio(SDO)) = true iff there I ⊆ SD, we define: exists an invocation I ∈ piio(SDO) such that: o o • For any n ∈ UN, N(I,n) = {m ∈ I|mb = n} is a set •| N(I,rs (not))| ≥ 1 ∧ |N(I,rs (upd))| ≥ 1∧ of method calls with n being its method in I; |N(I,rso(reg))| ≥ 1 ∧|N(I,rso(unr))| ≥ 1, i.e., for each observer pattern invocation, notify, • For any M ⊆ I, CO(M) = {o∈UO|∃m∈M update, register and unregister methods should : η(m)=o}is the object set of M; be invoked at least once; and • For any n ∈ UN, c ∈ UC, PS(I,n,c) = {o ∈ UO| • ∀ ( ∃ mb = ∃m ∈ I : mb = n∧o∈p(m) ∧bo = c} is a set of (in- o∈PS(I,rso(reg),rso(Obs))∪PS(I,rso(unr),rso(Obs)) m∈I

72 A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

Input: Software Execution Data Output: Pattern Instances Plugin

Plugin description

Run

Figure 3: Screenshot of the DePaD. Figure 4: A Detected Observer Pattern Instance.

rso(reg) ∧ o ∈ p(m)) ∧( ∀ ∃ m = rso(reg) ∧ mb0 All experimental results in the following section are m∈I m0∈I b o 0 0 based on this tool. = rs (unr)∧o∈p(m) ∧o∈p(m ) ∧ te(m)

73 ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

Table 4: Statistics of Subject Software Execution Data. Table 5: Observer Pattern Instances Detected from the Exe- Software #Pac. #Cla. #Meth. #Meth. Call cution Data of Synthetic Software Systems. Synthetic 1 1 5 9 155 #1 #2 #3 Synthetic 2 1 5 11 135 Synthetic 3 1 5 15 160 Sub GlobalClock ActiveSensorSystem CommentaryObject Synthetic Synthetic 4 1 6 12 104 Obs GlobalClockObs ActiveAlarmLis Observer Synthetic 5 1 8 14 258 Synthetic 6 8 12 28 9728 not run() soundTheAlarm() notifyObservers() Lexi 0.1.1 5 68 263 20344 upd periodPasses() alarm() update() Real-life JUnit 3.7 3 47 213 363948 reg attach() addAlarm() subscribe() JHotDraw 5.1 7 1.8 549 583423 unr detach() removeAlarm() unsubscribe() Note: The number of packages (#Pac.), the number of classes (#Cla.), the number of methods (#Meth.), and the number of method calls (#Meth. measures that are widely adopted in the information Call). retrieval area. Precision measures the percentage of • Synthetic 2 is a testing software system imple- the detected pattern instances that are correct pat- menting a state and a strategy pattern instances; tern instances while recall measures the percentage of correct pattern instances that have been correctly • Synthetic 3 is a short message software system detected by our approach. implementing an observer design pattern instance; Formally, Correct refers to the set of actual de- • Synthetic 4 is a lights control software system im- sign pattern instances implemented in the software, plementing a strategy design pattern instance; Detected refers to the set of detected pattern instan- • Synthetic 5 is a sensing alarm software system im- ces based on our approach. Precision and recall can plementing an observer design pattern instance; be computed as follows: |Correct ∩ Detected| • Synthetic 6 is a product management software sy- precision = (1) stem implementing an observer pattern instance; |Detected| 3 |Correct ∩ Detected| • Lexi 0.1.1 is a Java-based open-source word pro- recall = (2) cessor. Its main function is to create document, |Correct| edit text, save file, etc. The format of exported For the synthetic software systems, we have files are compatible with the Microsoft word. enough knowledge of the implementation, i.e., Correct is known. In addition, the collected execution • JUnit 3.74 is a simple framework to write repeata- data cover all possible execution scenarios, i.e., the ble tests for java programs. It is an instance of the execution data can fully represent the behavior of all xUnit architecture for unit testing frameworks. potential pattern instances. Therefore, we can mea- • JHotDraw 5.15 is a GUI framework for technical sure precision and recall of our approach for the synt- and structured 2D Graphics. Its design relies hea- hetic software systems. As Correct is not available vily on some well-known GoF design patterns. for the open-source software and the execution data Note that the execution data of Lexi 0.1.1 and that are collected by monitoring typical scenario exe- JHotDraw 5.1 are collected by monitoring typical cution only cover a fraction of the software behavior, execution scenarios of the software system. For ex- we only evaluate the precision. In the experiment, we ample, a typical scenario of the JHotDraw 5.1 is: manually validate each detected pattern instances. launch JHotDraw, draw two rectangles, select and align the two rectangles, color them as blue, and close 6.3 Evaluation based on Synthetic JHotDraw. For the JUnit 3.7, we monitor the execu- Software Systems tion of the project test suite with 259 independent tests provided in the MapperXML6 release. In this section, we report the detection results obtai- ned by our tool for the six synthetic software systems. 6.2 Quality Metrics We executed the DePaD tool by taking the exe- cution data collected from the six synthetic software To evaluate the accuracy of the proposed design pat- systems as input. Three observer pattern instances tern detection approach, we use precision and recall were returned. Detailed information on the discove- red observer pattern instances are shown in Table 5. 3http://essere.disco.unimib.it/svn/DPB/ Lexi%20v0.1.1%20alpha/ By manually inspecting these observer pattern instan- 4http://essere.disco.unimib.it/svn/DPB/JUnit%20v3.7/ ces with respect to our domain knowledge, we found 5http://www.inf.fu-berlin.de/lehre/WS99/java/swing/ that (1) all these detected observer pattern instances JHotDraw5.1/ are valid; and (2) all observer pattern instances imple- 6http://essere.disco.unimib.it/svn/DPB/ mented in Synthetic 3, Synthetic 5, and Synthetic 6 MapperXML%20v1.9.7/ were fully detected.

74 A Framework to Support Behavioral Design Pattern Detection from Software Execution Data

Table 6: State Pattern Instances Detected from the Execu- instances, we found that all of them are valid, i.e., the tion Data of Synthetic Software Systems. precision of our approach is 100%. This can be ex- #4 #5 plained by the fact that our approach guarantees all Context TContext Context detected pattern instances satisfy both structural and State TS State behavioral constraints. setState setS() setState() request request() writeName() Table 9: Number of Detected Pattern Instances from the handle handle() write() Execution Data of 3 Open-source Software Systems.

Observer Pattern State Pattern Strategy Pattern Table 7: Strategy Pattern Instances Detected from the Exe- Lexi 0.1.1 – – 8 cution Data of Synthetic Software Systems. JUnit 3.7 2 2 – #6 #7 JHotDraw5.1 4 22 24 Context RemoteControl TContext Note that – means no pattern instance is detected from the data. State Command TS setStrategy setCommand() setS() contextInterface pressButton() contextInterface() algorithmInterface execute() algorithmI() 7 THREATS TO VALIDITY Similarly, we executed DePaD tool by taking the execution data collected from the six synthetic soft- In the following, we discuss the main threats that may ware systems as input for state and strategy patterns, affect the validity of our approach. two state pattern instances and two strategy pattern • Similar to other dynamic analysis techniques, the instances were returned. Detailed information of the quality of the proposed approach heavily depends detected state and strategy pattern instances are shown on the completeness of the execution data. If the in Tables 6-7. By manually inspecting these detected execution data do not cover fractions of the soft- pattern instances with respect to the domain know- ware’s behavior including all pattern candidates, ledge, we found that (1) all detected state/strategy pat- the results would be unreliable. tern instances are valid; and (2) all state/strategy pat- • The precision and recall of the detection appro- tern instances implemented in Synthetic 1, Synthetic ach heavily rely on the design pattern specifica- 2, and Synthetic 4 were fully detected. tion. On one hand, if the pattern specification is To validate the accuracy of the detection results, over-defined (e.g., some unnecessary constraints we also manually analyzed the detected pattern in- are included), this will cause low recall as some stances in terms of precision and recall. Table 8 re- true positives may be missing. On the other hand, ports the precision and recall for observer, state and if the pattern specification is under-defined (e.g., strategy pattern instances detected from the synthetic some essential constraints are not included), this software execution data. According to the compari- will cause low precision as some false positives son, we conclude that the proposed approach does not may be incorrectly detected. include false positives and can find all true positives in case they are included in the execution data.

6.4 Evaluation based on Open-source 8 CONCLUSION Software Systems Existing dynamic analysis-based design pattern de- tection approaches generally have problems in per- In this section, we report the evaluation of our appro- forming accurate behavioral constraint checking and ach using three open-source software systems. We providing extensible mechanism to support novel de- executed the DePaD tool by taking the software exe- sign patterns. This paper proposes a general frame- cution data as input, the number of detected observer, work to support the detection of behavioral design state and strategy pattern instances are shown in Ta- patterns from software execution data. To test the ap- ble 9. By manually inspecting the detected pattern plicability, the framework was instantiated for three typical behavioral design patterns, i.e., observer pat- Table 8: Precision and Recall of the Detected Pattern In- tern, state pattern and strategy pattern. In addition, the stances from Synthetic Software Systems. proposed approaches have been implemented as a tool Observer Pattern State Pattern Strategy Pattern in the open source process mining toolkit ProM. Cur- Precision 100% 100% 100% rently, this tool supports observer pattern, state pattern Recall 100% 100% 100% and strategy pattern. The applicability of this tool was

75 ENASE 2018 - 13th International Conference on Evaluation of Novel Approaches to Software Engineering

demonstrated by a set of software execution data con- De Lucia, A., Deufemia, V., Gravino, C., and Risi, M. taining around 1 million method calls. (2009b). Design pattern recovery through visual lan- This work opens the door for several further rese- guage parsing and source code analysis. Journal of Systems and Software, 82(7):1177–1193. arch directions. The detection of other typical crea- tional and behavioral design patterns (e.g., singleton Dong, J., Zhao, Y., and Sun, Y. (2009). A matrix-based approach to recovering design patterns. IEEE Tran- pattern, , , vi- sactions on Systems, Man, and Cybernetics-Part A: sitor pattern) should be included in our framework. Systems and Humans, 39(6):1271–1282. In addition, we are working on analyzing large-scale Fontana, F. A. and Zanoni, M. (2011). A tool for de- software projects and try to detect more design pattern sign pattern detection and software architecture recon- instances to evaluate the scability and empirically va- struction. Information sciences, 181(7):1306–1324. lidate extensibility of the approach. Gamma, E. (1995). Design patterns: elements of reusable object-oriented software. Pearson Education India. Heuzeroth, D., Holl, T., Hogstrom, G., and Lowe, W. (2003). Automatic design pattern detection. In Pro- ACKNOWLEDGEMENTS gram Comprehension, 2003. 11th IEEE International Workshop on, pages 94–103. IEEE. This work is supported by the NIRICT 3TU.BSR (Big Leemans, M. and Liu, C. (2017). Xes software event exten- Software on the Run) research project7. sion. XES Working Group, pages 1–11. Liu, C., van Dongen, B., Assy, N., and van der Aalst, W. (2016). Component behavior discovery from software execution data. In International Conference on Com- REFERENCES putational Intelligence and Data Mining, pages 1–8. IEEE. Ng, J. K.-Y., Gueh´ eneuc,´ Y.-G., and Antoniol, G. (2010). Arcelli, F., Perin, F., Raibulet, C., and Ravani, S. (2009). Ja- Identification of behavioural and creational design dept: Dynamic analysis for behavioral design pattern motifs through dynamic analysis. Journal of Software detection. In 4th International Conference on Evalu- Maintenance and Evolution: Research and Practice, ation of Novel Approaches to Software Engineering, 22(8):597–627. ENASE, pages 95–106. Niere, J., Schafer,¨ W., Wadsack, J. P., Wendehals, L., and Arcelli, F., Perin, F., Raibulet, C., and Ravani, S. (2010). Welsh, J. (2002). Towards pattern-based design reco- Design pattern detection in java systems: A dynamic very. In Proceedings of the 24th international confe- analysis based approach. Evaluation of Novel Appro- rence on Software engineering, pages 338–348. ACM. aches to Software Engineering, pages 163–179. Pettersson, N., Lowe, W., and Nivre, J. (2010). Evalu- Bernardi, M. L., Cimitile, M., De Ruvo, G., Di Lucca, ation of accuracy in design pattern occurrence de- G. A., and Santone, A. (2015). Model checking to im- tection. IEEE Transactions on Software Engineering, prove precision of design pattern instances identifica- 36(4):575–590. tion in oo systems. In 10th International Joint Confe- Shi, N. and Olsson, R. A. (2006). Reverse engineering rence onSoftware Technologies (ICSOFT), volume 2, of design patterns from java source code. In 21st pages 1–11. IEEE. IEEE/ACM International Conference on Automated Bernardi, M. L., Cimitile, M., and Di Lucca, G. (2014). Software Engineering, 2006., pages 123–134. IEEE. Design pattern detection using a dsl-driven graph ma- Tsantalis, N., Chatzigeorgiou, A., Stephanides, G., and Hal- tching approach. Journal of Software: Evolution and kidis, S. T. (2006). Design pattern detection using si- Process, 26(12):1233–1266. milarity scoring. IEEE transactions on software engi- Cong Liu, Boudewijn van Dongen, N. A. W. v. d. A. (2018). neering, 32(11). A general framework to detect behavioral design pat- Von Detten, M., Meyer, M., and Travkin, D. (2010). Re- terns. In International Conference on Software Engi- verse engineering with the reclipse tool suite. In 2010 neering (ICSE2018), pages 1–2. ACM. ACM/IEEE 32nd International Conference on Soft- Dabain, H., Manzer, A., and Tzerpos, V. (2015). Design ware Engineering, volume 2, pages 299–300. IEEE. pattern detection using finder. In Proceedings of the Wendehals, L. and Orso, A. (2006). Recognizing behavioral 30th Annual ACM Symposium on Applied Computing, patterns atruntime using finite automata. In Procee- pages 1586–1593. ACM. dings of the 2006 international workshop on Dynamic De Lucia, A., Deufemia, V., Gravino, C., and Risi, M. systems analysis, pages 33–40. ACM. (2009a). Behavioral pattern identification through visual language parsing and code instrumentation. In Software Maintenance and Reengineering, 2009. CSMR’09. 13th European Conference on, pages 99– 108. IEEE.

7http://www.3tu-bsr.nl/doku.php?id=start

76