SEQUENCE INTEGRATION GRAPH AND ITS APPLICATIONS TO MODEL-BASED SOFTWARE DEVELOPMENT

Debasish Kundu

SEQUENCE INTEGRATION GRAPH AND ITS APPLICATIONS TO MODEL-BASED SOFTWARE DEVELOPMENT

Thesis submitted to the Indian Institute of Technology Kharagpur For award of the degree

of Doctor of Philosophy

by Debasish Kundu

Under the supervision of

Dr. Debasis Samanta

and

Prof. Rajib Mall

SCHOOL OF INFORMATION TECHNOLOGY INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR MARCH 2014 ⃝c 2014 Debasish Kundu. All rights reserved.

CERTIFICATE OF APPROVAL

.... / .... / ....

Certified that the thesis entitled Sequence Integration Graph and its Applica- tions to Model-Based Software Development submitted by Debasish Kundu to the Indian Institute of Technology, Kharagpur, for the award of the degree of Doc- tor of Philosophy has been accepted by the external examiners and that the student has successfully defended the thesis in the viva-voce examination held today.

Signature: Signature: Signature:

Name: Name: Name:

(Member of the DSC) (Member of the DSC) (Member of the DSC)

Signature: Signature:

Name: Name:

(Supervisor) (Supervisor)

Signature: Signature:

Name: Name:

(External Examiner) (Chairman)

Certificate

This is to certify that the thesis entitled Sequence Integration Graph and its Applications to Model-Based Software Development, submitted by Debasish Kundu to the Indian Institute of Technology, Kharagpur, is a record of bonafide research work under our supervision and we consider it worthy of consideration for the award of the degree of Doctor of Philosophy of the Institute.

Prof. Rajib Mall Dr. Debasis Samanta Department of Computer Science School of Information Technology Indian Institute of Technology Indian Institute of Technology Kharagpur -721 302, INDIA Kharagpur -721 302, INDIA

Date: Date:

Declaration

I certify that,

a. the work contained in this thesis is original and has been done by me under the guidance of my supervisors. b. the work has not been submitted to any other institute for any degree or diploma. c. I have followed the guidelines provided by the institute in preparing the thesis. d. I have conformed to the norms and guidelines given in the ethical code of conduct of the institute. e. whenever I have used materials (data, theoretical analysis, figures, and text) from other sources, I have given due credit to them by citing them in the text of the thesis and giving their details in the references. Further, I have taken permission from the copyright owners of the sources, whenever necessary.

Debasish Kundu

BIO-DATA

Debasish Kundu is currently a Ph.D. research scholar in the School of Information Technology at Indian Institute of Technology Kharagpur, India. In his research, he is focusing on UML and its three applications: code generation, coverage analysis, and infeasible path detection. He has received the B.Tech. degree in Computer Science and Technology from Kalyani Govt. Engineering College, India in 2003 and the M.S. degree in Information Technology from Indian Institute of Technology Kharagpur, India in 2008. His current research interests include software engineering, model-based testing using UML, program analysis.

Dedicated To My beloved Parents

Acknowledgments

Writing this part of the thesis is probably the toughest. Though the list of people to acknowledge is long, making this list is not difficult. The difficult part is to find the words that convey the sincerity and magnitude of my gratitude. People unknown to me till a few years before, have become indispensable in my life, while the people already known to me, remain pillars of support and encouragement all these days. It is because of their presence that I am here now. First and foremost I would like to express my deepest gratitude to my supervisors Dr. Debasis Samanta and Professor Rajib Mall for their invaluable guidance and en- couragement throughout my work. Their constant motivation, support and infectious enthusiasm have guided me towards the successful completion of my doctoral study. My interactions with them have been of immense help in defining my research goals and in identifying ways to achieve them. Their encouraging words have often pushed me to put in my best possible efforts. Above all, the complete belief that they have entrusted upon me and have instilled a great sense of confidence and purpose in my mind, which I am sure, will stand me in good stead throughout my career. It gives me immense pleasure to thank my doctoral scrutiny committee members Dr. S. Misra, Prof. S. Sarkar, Prof. B Mahanty for their valuable suggestions during my research tenure. My sincere thanks to the heads of the department Prof. I. Sen- gupta, Prof. J. Mukhopadhyay, and Prof. R. Mall for the world class infrastructure provided in the department to the research students. I also thank all faculty members of the School of Information Technology for their helpful comments and constant en- couragement. I owe my deepest gratitude to Dr. Monalisa Sarma for her continuous support and encouragement during my doctoral study. I sincerely remember the sup- port of office staffs Mithunda, Somadi, Soumitrigi, Malayda, Vinodda, Pratap and others. I wish to convey my special thanks to my old friends Sandeep, Kamal for their constant support and help during the various stages of my work. I am greatly in- debted to many of my friends for their constant inspiration. The support of my lab mates namely Sashidhar, Somanth, Rajkumar, Col. Ranjit Singh, Prasenjit, Shobhana madam, Ashalata madam, Sankar, Sharmagi, Barsha, Narendra, Ananth, Jaswasi, Jainath, Gourang, Ganesh, Soumya, Arindamda, Nirnay, Sudhamay, Gau- tam, Praveen, Kanchan, Partha, and many more. It is a great fun and source of ideas and energy to have friends like Soumalya, Sayan, Manoj, Pradipta, Tuhin, Raj Kumar, Santa, Jayeeta, and many more during my stay at IIT Kharagpur. My hostel days have been joyful with the presence of the friends like Rajarshi, Dinesh, Sahidul- lah, NirmalyaDa, Santunu, Radhasham, Saikat, Rajib Da, Chandan, and others. Acknowledgement certainly remains incomplete if I do not write anything regard- ing my parents, whom I love the most. Due to my studies, they have sacrificed a lot since my childhood days. Their priceless affection, spontaneous encouragement, ded- ication to build myself a good human being are the pillars of my strength to achieve goals which are almost next to impossible. No word is enough to express their contri- butions to my life. Finally, I am grateful to my school teachers, well wishers, elders in our native place for their blessings.

Debasish Kundu Contents

Certificate of Approval i

Certificate iii

Declaration v

Acknowledgements xi

Contents xiii

List of Figures xvii

List of Tables xix

List of Symbols and Abbreviations xxiii

Abstract xxvii

1 Introduction 1 1.1 Software automation ...... 1 1.2 Model-based software development ...... 3 1.3 Scope of work ...... 6 1.4 Objectives of the thesis ...... 7 1.5 Contributions of our research ...... 8 1.6 Organization of the thesis ...... 9

2 Survey of Related Work 11 2.1 Code generation from UML diagrams ...... 11 2.2 Testing using UML sequence diagrams ...... 13

xiii Contents

2.3 Infeasible path detection techniques ...... 18 2.3.1 Static analysis techniques ...... 19 2.3.2 Dynamic analysis techniques ...... 21 2.4 Conclusion ...... 23

3 Construction of SIG 25 3.1 Some definitions ...... 25 3.2 Sequence Integration Graph (SIG) ...... 27 3.3 Issues with construction of SIG ...... 29 3.4 Our approach ...... 33 3.5 Analyzing consistency between sequence diagram and SIG ...... 47 3.6 Applications of SIG ...... 49 3.7 Conclusion ...... 50

4 Code Generation 53 4.1 Our approach ...... 53 4.2 Analysis and Results ...... 61 4.2.1 Completeness of generated code ...... 62 4.2.2 Complexity analysis ...... 71 4.2.3 Memory and performance analysis ...... 74 4.3 Comparison with related work ...... 74 4.4 Conclusion ...... 75

5 Control of MM Path Coverage 77 5.1 MM paths for sequence diagram ...... 77 5.2 Our approach ...... 81 5.2.1 MM path generation ...... 81 5.2.2 Construction of MM coverage model ...... 89 5.2.3 Path generation from MM coverage model ...... 92 5.3 Experimental results and analysis ...... 98 5.3.1 Subject design ...... 98 5.3.2 Compare MM coverage model with CFG ...... 100 5.3.3 Comparing computation overhead with SIG and CFG .... 107 5.3.4 Threats to validity ...... 110 5.4 Comparison with related work ...... 111 5.5 Conclusion ...... 112

xiv Contents

6 Identification of Infeasible Paths 115 6.1 Two interaction patterns of sequence diagrams ...... 115 6.2 Our approach ...... 120 6.3 Experimental results ...... 133 6.3.1 Objectives ...... 133 6.3.2 Subject programs ...... 134 6.3.3 Effect of MUX and NLC interaction patterns ...... 134 6.3.4 Influence of infeasible paths on test effort estimation . . . . . 141 6.3.5 Influence of locations of interaction patterns ...... 147 6.3.6 Compare computation overhead with SIG and CFG ..... 150 6.3.7 Threats to validity ...... 151 6.4 Comparison with related work ...... 153 6.5 Conclusion ...... 156

7 Conclusions and Future Research 159 7.1 Research contributions ...... 159 7.2 Directions for Future Research ...... 162

Bibliography 165

A Restaurant Automation System (RAS) 175

B Auditorium Management System (AMS) 183

xv

List of Figures

3.1 An example sequence diagram and corresponding SIG...... 26 3.2 An example sequence diagram and its control flow graph...... 30 3.3 XMI representation of Fig. 3.2(a)...... 32 3.4 for construction of SIG from a sequence diagram. . 33 3.5 Sequence diagram of Issue Book use case...... 35 3.6 for meta-data of sequence diagram in XMI...... 37 3.7 SIG of Issue Book sequence diagram...... 41 3.8 Block diagram of integrated framework...... 50

4.1 Code generation framework...... 54 4.2 Sample code generated from SIG of Issue Book...... 56 4.3 A comparison of lines of generated code for different classes of RAS.. 67 4.4 Comparison of design details (NO × NM) in DSD and RSD of RAS. 70 4.5 Comparison of code generation from DSD and RSD of RAS...... 71

5.1 Example of MM path...... 78 5.2 Example of redefined MM path...... 79 5.3 An example sequence diagram and corresponding SIG...... 80 5.4 A framework for selecting a subset of all paths in a sequence diagram. 82 5.5 Sequence diagram of Generate Show Statistics use case...... 83 5.6 SIG and four integration colonies for Generate Show Statistics use case. 85 5.7 Three graph models with implicit control flow...... 87 5.8 MM coverage model of Generate Show Statistics...... 93

6.1 Sequence diagram with NLC interaction pattern and its SIG...... 116 6.2 Sequence diagram with MUX interaction pattern and its SIG. . . . . 118 6.3 Sequence diagram of Manage Show use case...... 121

xvii List of Figures

6.4 SIG and three integration colonies for Manage Show use case. . . . . 123 6.5 Two scenarios with modification of the object reference under nullify test of NLC pattern...... 133

A.1 Sequence diagram of Deliver Order use case...... 176 A.2 Sequence diagram of Process Order use case...... 177 A.3 Sequence diagram of Generate Statistics use case...... 178 A.4 Sequence diagram of Generate Bill use case...... 179 A.5 Sequence diagram of Pay Bill use case...... 180 A.6 Sequence diagram of Manage Item use case...... 181 A.7 Sequence diagram of Make Order use case...... 182

B.1 Sequence diagram of Book Ticket use case...... 184 B.2 Sequence diagram of Cancel Ticket use case...... 185 B.3 Sequence diagram of Compute Sale Commission use case...... 186 B.4 Sequence diagram of Pay Commission use case...... 187 B.5 Sequence diagram of Manage Show use case...... 188 B.6 Sequence diagram of Generate Show Statistics use case...... 189

xviii List of Tables

1.1 Examples of model based software development in industries...... 4

2.1 Summary of existing approaches for code generation from UML Models. 13 2.2 Summary of existing approaches for testing object-oriented systems. . 17 2.3 Summary of existing infeasible path detection techniques...... 22

3.1 Association of nodes in Fig. 3.2(b) with tagged elements in XMI of Fig. 3.3 ...... 32 3.2 Types of meta objects for storing attribute values of elements in XMI. 36 3.3 Message Node Table ...... 42

4.1 Model to Code Map...... 58 4.2 Types of statements within a class method...... 62 4.3 Characteristics of programmers...... 63 4.4 Characteristics of RAS...... 64 4.5 The characteristics of seven sequence diagrams of RAS...... 65 4.6 Results of code generation for RAS system...... 66 4.7 Results of generated LOC for three types of classes in RAS...... 67 4.8 Performance of code generation from design level sequence diagrams (DSD) of RAS...... 69 4.9 Performance of code generation from reverse engineered sequence dia- grams (RSD) of RAS...... 69 4.10 Performance of code generation with different levels of abstraction in DSD of RAS...... 71 4.11 Memory and performance analysis of our approach...... 74

5.1 Three rules...... 88

xix List of Tables

5.2 MM paths obtained from four integration colonies of Generate Show Statistics use case...... 89 5.3 Ranks of all MM paths (see Table 5.2) for each integration colony. . . 97 5.4 Ten paths generated from MM coverage model...... 97 5.5 The characteristics of six sequence diagrams of AMS...... 99 5.6 Characteristics of subject designs...... 99 5.7 Coverage of MM paths for Deliver Order...... 101 5.8 Coverage of MM paths for Generate Show Statistics...... 101 5.9 Coverage of MM paths for Manage Show...... 102 5.10 Prioritized MM path coverage of Deliver Order...... 105 5.11 Prioritized MM path coverage of Gen. Show Statistics...... 105 5.12 Prioritized MM path coverage of Manage Show...... 106

5.13 Eight C-paths (P1 ··· P8) for Generate Show Statistics...... 106 5.14 Computation of ∆ for three subject sequence diagrams...... 110

6.1 MM paths of integration colonies of Manage Show use case...... 124 6.2 Independent control blocks of graph model for Manage Show...... 126 6.3 Correlated control blocks of three integration colonies...... 128 6.4 The characteristics of open source softwares...... 135 6.5 Number of MUX and NLC patterns in the sequence diagrams for use cases of RAS and AMS...... 136 6.6 Infeasibility of MM path, MM pairs, scenario paths...... 137 6.7 Correlation of infeasible MM paths with cyclomatic complexity. . . . 138 6.8 Correlation of infeasible MM pairs with their dependent factors. . . . 139 6.9 Correlation of infeasible scenarios with theirs dependent factors. . . . 140 6.10 Infeasible paths of open source applications for MUX patterns. . . . . 141 6.11 Infeasible paths of open source applications based on NLC patterns. . 142 6.12 Computation of UAW for RAS and AMS...... 143 6.13 Computation of UUCW for RAS and AMS...... 144 6.14 The values of technical and environmental factors...... 146 6.15 Computation of TEF...... 147 6.16 Computation of % S for use cases of RAS and AMS...... 148 6.17 Effect of the locations of integration colonies...... 149 6.18 Computation of Z for the different sequence diagrams of RAS and AMS.152 6.19 Comparison of characteristics of different infeasibility patterns. . . . 154

xx List of Tables

6.20 Number of infeasible MM paths in three open-source applications. . . 155 6.21 Number of infeasible MM pairs in three open-source applications. . . 156

xxi

List of Symbols and Abbreviations

List of Symbols

AUCPa Adjusted Use Case Points for all paths

AUCPf Adjusted Use Case Points for all feasible paths AW P L Average Weighted Path Length C − P aths Paths generated from control flow graph eW eight Edge weight E Test effort I Interaction IC Integration colony MIC Main integration colony MIN Limit Minimum limit on the number of message paths in sequence diagram MT Method M − P aths Paths generated from MM coverage model

Malt Set of interactions of alt fragment

MI Set of interactions of a sequence diagram

Mloop Set of interactions of loop fragment

Mopt Set of interactions of opt fragment

NMD Number of methods in design level sequence diagram

NMR Number of methods in reverse engineered sequence diagram

NOD Number of objects in design level sequence diagram

NOR Number of objects in reverse engineered sequence diagram

P (MI ) Set of precedence relations for MI PR Parameter PS Priority score

PSavg Average priority score

xxiii List of Symbols and Abbreviations r Correlation coefficient RC Receiver class RO Receiver object SD Sequence diagram SIG Sequence Integration Graph SO Source object SR Signed Rank ST Stack

UUCPa Unadjusted Use Case Points for all paths

UUCPf Unadjusted Use Case Points for all feasible paths W Sum of signed ranks S Cf Control node representing the start of the fragment f E Cf Control node representing the end of the fragment f

G(Mi) Coverage of the MM path Mi [%S] Percentage saving in test effort F Set of fragments in sequence diagram [ ]seq M Set of messages of fragment f [ ]f M Set of messages in sequence diagram [ ] seq R Set of reply messages in sequence diagram [ ] seq f S Set of EOperand objects for fragment f [ ]EOprds f S Set of EMessage objects for fragment f EMsgs (m,r) An M-R pair (ˆa, ˆb) A scope edge pair % of Gen. LOC Percentage of generated lines of code ≺ Precedence relation Λ Null precedence relation η Total number of MM paths in main integration colony

ΓSIG Total computation overhead with SIG.

ΓCFG Total computation overhead with CFG.

Γi Total computation overhead for a basic path i.

xxiv List of Symbols and Abbreviations

µi Number of MM paths contained in the path i.

λi Number of nodes of the path i. δ Total number of nodes in SIG #Messages Number of messages #F ragments Number of fragments #Objects Number of objects #DSD Number of design level sequence diagrams #RSD Number of reverse engineered sequence diagrams #IntegrationColony Number of integration colonies #P ri.MMP aths Number of prioritized MM paths Message Node Table Table representing message node details

List of Abbreviations

AMP All MM Paths coverage AMS Auditorium Management System AUCP Adjusted Use Case Points CA Code Artifacts CFG Control Flow Graph DSD Design Level Sequence Diagram LOC Lines of Code MM Method Message M − R pair Message-Reply pair MUX MUtually Exclusive NLC NuLL reference Check PMP Prioritized MM Paths coverage RAS Restaurant Automation System RSD Reverse Engineered Sequence Diagram SAX Simple API for XML SDAL Sequence Diagram Abstraction Level SRS Software Requirement Specification TEF Technical Complexity Factor TPTP Eclipse Test & Performance Tools Platform UAW Unadjusted Actor Weight xxv List of Symbols and Abbreviations

UML Unified Modeling Language UUCP Unadjusted Use Case Points UUCW Unadjusted Use Case Weights XMI XML Metadata Interchange

xxvi Abstract

We investigate using UML sequence diagrams for enhanced automatic code gener- ation, coverage analysis, and infeasible path detection. We propose a graph model called sequence integration graph (SIG) to capture control flows among model ele- ments of sequence diagrams as well as their method scope information. To construct SIG from a sequence diagram, we process its XMI representation using SAX parser. The difference between SIG and the conventional graph model (i.e. control flow graph) is that the SIG subsumes control flow graph and additionally contains method scope information of the interactions. Subsequently, we identify subgraphs of SIG where each subgraph would contain model elements in same method scope and then apply mapping rules to the subgraphs to generate code of multiple class methods. The explicit method scope information present in SIG helps to identify different class methods for which the code has to be generated. Our empirical study shows that code generation from sequence diagrams is most effective for the controller classes compared to the entity and boundary classes. An MM path represents an execution sequence of model elements from the start to the end of a method scope. We use the SIG for determining effective coverage of MM paths while generating a subset of all message paths from sequence diagrams, since exhaustive testing of all message paths is not feasible in practical situations. Note that For this, we construct an MM coverage model from SIG and follow two coverage criteria: All MM Paths (AMP) and Prioritized MM Paths (PMP). Our empirical study shows that MM coverage model can be used for selecting a subset of all message paths satisfying our proposed coverage criteria, whereas control flow graph can not be used for the same even with prioritization of message paths. Finally, we use two interaction patterns namely NLC and MUX to detect a considerable number of infeasible paths by processing SIG. Applying our approach to a few case studies, we identify 14% to 80% infeasible MM paths, 33% to 94% infeasible MM pairs, and 50% to 97% infeasible scenarios.

Keywords: UML sequence diagram, graph model, automatic code generation, MM path coverage, infeasible path detection.

Chapter 1

Introduction

Now-a-days software is being used in many applications such as ATM, online reser- vation, share market, cell phone, air traffic control etc. Since our everyday life is dependent on software, it is expected to be extremely reliable and high quality. In practical situations, achieving high quality levels for large and complex software is a challenging task. This is particularly true when software developers work under mul- tiple pressures such as meeting deadlines, limiting budget, providing critical technical solutions [1] etc. To cope with this situation, software engineers advocate automation of effort-intensive activities.

1.1 Software automation

Software automation refers to automating different development activities such as designing, coding, code quality metrics report generation, test case design, test exe- cution, test coverage report generation, build and deployment [2] etc. The benefits with using software automation are the following.

a) Reusability: Automation increases the scope for reuse of software artifacts. For example, designs and code can be reused across similar types of projects; auto- mated test cases can be reused on different versions of the software [3] etc.

b) Productivity: Automation enables to achieve higher productivity without re- quiring additional resources. For example, automated tests can be scheduled to run across multiple platforms, browsers, and environments simultaneously, allowing developers to focus on other critical issues.

1 1. Introduction

c) Cost reduction: With a high degree of automation, the cost incurred in defect prevention, regression testing, report generation, and execution of repeatable tests can be saved.

d) Increased fault detection: Automating code review, traceability from design to code, test case design, test data generation can increase the ability to detect a large number of faults [3].

e) Improved test coverage: High degree of test coverage (e.g. statement, branch, path) can be achieved through test automation [3].

f) Reduced development time: Automation can help to speed up many activities such as coding, testing, documentation, report generation, thereby reducing software development time.

Although software automation has many benefits, there are several vexing issues pertaining to software automation, which are discussed in the following.

a) Automation can not replace manual effort: Not all development activities can be fully automated. For example, fault localization, test data generation, test script preparation are difficult to automate fully and require substantial manual effort even while using automated tools [3].

b) Inappropriate automation strategy: It is hard to decide which activities to au- tomate to what extent. The selection of right tool for automation of different activities is also a difficult task [3].

c) Lack of tool compatibility and interoperability: Available tools from different vendors are mostly not compatible and interoperable to each other [4]. This causes major difficulty in tool integration required to systematic progress in software development.

d) Lack of skilled people: Developers with multiple skill sets (e.g. programming knowledge, domain expertise, knowledge of tool usage) are less available [4].

To address these issues, a model of software can be used. This would enable developers to work at a higher level of abstraction (i.e. model of software) with less effort.

2 1.2. Model-based software development 1.2 Model-based software development

Model-based software development (MBD) is a software engineering approach for ap- plying models and model technologies in software development with the goal of both simplifying and automating various activities and tasks that span across the entire life cycle. MBD comprises of artifacts which are meant for particular stage in the life cycle (such as describing an architecture) can be used as a communication medium between different teams involved in the project, linkage to related artifacts earlier or later in the life cycle [5]. Model-based software development has many advantages, including reduced cost throughout the software development life-cycle, reduced development time for new applications, improved software quality, increased return on technology investments, rapid inclusion of emerging technology benefits into their existing systems [6]. In support of these benefits, we refer to a few industrial case studies of model based software development as depicted in Table 1.1 (the data has been taken from the source [7]). To facilitate model-driven software development, Model Driven Architecture [6], UML [8], XMI [9], SDL [10], Z language [11] etc. have been introduced, which are being discussed in the following.

MDA: Model Driven Architecture is a set of modeling standards specified by the Object Management Group [6]. MDA defines two types of models: platform (soft- ware/hardware) independent model (also called PIM) and platform specific model (called PSM). The platform-independent models are used to specify the core func- tionality of an application, separated from the technology-specific code. The platform- independent model (PIM) is translated to a platform-specific model (PSM) by map- ping the PIM to some implementation language (e.g., Java) using formal rules. There are few OMG standards such as Unified Modeling Language (UML), Meta Object Fa- cility (MOF) [12], XML Metadata Interchange (XMI), and the Common Warehouse Metamodel (CWM) defining the core infrastructure of the MDA.

UML: Unified Modeling Language is a semi-formal, platform-independent model- ing language. It supports modeling different views of software [13, 14]. UML is used for (a) visualizing the business processes, requirements, architecture; (b) specifying artifacts such as use cases, classes, attributes, operations etc.; (c) constructing soft-

3 1. Introduction

Table 1.1: Examples of model based software development in industries.

Company Product Specified and autocoded Benefits claimed 70% Fly-by-wire controls 20 times reduction in errors Airbus A340 70% Automatic flight controls Reduced time to market 50% Display computer EC-155/135 Eurocopter 90 % of autopilot 50% reduction in cycle time Autopilot GE and FADEDC Reduction in errors Lockheed Engine - 50% reduction in cycle time Martin Controls Decreased cost US 50%-75% reduction in cost DCX Rocket - Spaceware Reduced schedule and risk Electrical 60% reduction in cycle time PSA Management 50% SLOC auto generated 5 times reduction in errors System Subway CSEE Improved productivity from Signaling 80,000 C SLOC auto generated Transport 20 to 300 SLOC/day System Primus Epic Honeywell Flight Control 60% Automatic flight controls 5 times productivity increase System

Reference: http://2013.icse-conferences.org/documents/publicity/MiSE-WS-Whalenslides.pdf [7]. ware, that is, generate code from design artifacts; (d) documenting useful information [13].

XMI: XMI stands for XML Metadata Interchange. It is OMG standard for ex- changing metadata information by means of Extensible Markup Language (XML) [9]. Note that UML modeling tools export UML diagrams by means of XMI to sup- port interoperability.

SDL: Specification and description language (SDL) is an object-oriented, formal lan- guage defined by the The International Telecommunications Union Telecommunica- tions Standardization Sector (ITU-T) [10]. This is originally intended for telecom-

4 1.2. Model-based software development munication systems and is currently used in complex, event-driven, real-time, and interactive applications. The SDL provides both graphical and textual representa- tions, which are equivalent representations of the same underlying semantics. Models are usually shown in the graphical, whereas textual representation is mainly used for exchanging models between tools.

Z language: It is a typed language based on set theory and first order predicate logic [11]. In this modeling environment, a system is modeled by representing states and set of operations that can change the states.

At present formal models do not scale to large software systems and are used to a limited extent in practice [15]. Software industries typically use semi-formal modeling languages to model software systems [15]. As UML is semi-formal language with support of a large, useful, and extensible set of predefined constructs, UML is a de facto standard for modeling large and complex software [16]. Researchers have investigated potential applications of UML design models in code generation, test case synthesis, test case prioritization etc [17, 18, 19, 20, 21, 22, 23, 24, 25]. The benefits of using UML models for above mentioned tasks are the following [26].

1) UML model-based analysis often use message paths [27, 17, 18, 19, 20] object- states [28, 23, 22], which are explicitly captured in UML sequence and statechart diagrams. In absence of such diagrams, determination of message paths and object-states is a difficult task requiring static/dynamic analysis of huge amount of information in code.

2) Test case design can be initiated using UML models even before coding starts. This implies that coding and testing activities can be carried out in parallel, thus reducing software development time.

3) The expressiveness and visualization power of UML design models can help to avoid the miscommunications among different development teams.

At present software practitioners face several challenges, three of which are the im- proving the communications between design and coding teams, increasing reliability of software, and reducing the testing effort. UML models can be used to address these challenges as follows.

5 1. Introduction

1) Auto-generate code from design artifacts: Automatic code generation from UML diagrams not only help automate an effort intensive activity, but it can also help reduce the miscommunications between coding and design teams, thereby ensuring that actual behavior of the software does not differ from the expected one.

2) Test coverage based on design specifications: Reliability can be enhanced by adequately testing critical parts of software. Since UML diagrams explicitly capture behavior information required for testing, UML design specification based coverage can help to mitigate the risks associated with the faults occurring in the critical parts of the software.

3) Identify infeasible paths: Effective use of path infeasibility can help enhance the precision of the results of UML model based analysis. Further, test effort can be reduced by excluding test data generation and test script preparation for the prior-detected infeasible paths.

1.3 Scope of work

The aim of our research is to investigate how the information about object-interactions captured in design model can be effectively used for code generation, test coverage, and infeasible path detection. For this, we have two alternative choices of using UML sequence or collaboration diagrams since they are known for capturing message paths of object-oriented systems. In our research, we decide to use sequence diagram because of the following: (1) sequence diagrams correspond to code at object inter- action level; (2) object interactions and their order are well captured by means of fragments and message sequence in sequence diagrams. We now examine the scope of our research to be the following.

a) UML modeling tools export UML sequence diagrams by means of XMI repre- sentation, which is interoperability standard [9]. XMI stores model elements of sequence diagrams in a complicated structure. This impedes use of XMI directly in software engineering applications. Therefore, it is necessary to have a graph model for XMI of sequence diagram. In this regard, we may note that the existing mapping rules [29, 20, 22] are difficult to apply directly to XMI of

6 1.4. Objectives of the thesis

sequence diagram and are also not able to capture method scope information in a graph model.

b) The existing UML modeling tools support automatic generation of structural code from UML class diagrams. For generation of code inside class methods, sequence diagrams can be used, which most of the existing commercial UML tools do not support and also has scarcely been reported in the literature. Be- havioral code generation is necessary to avoid the human errors during manual translation of the sequence diagrams to equivalent code [30], to improve com- munications between design and coding teams [31], and to reduce coding effort [30].

c) A sequence diagram typically has many message paths and their effective testing may not be possible with limited effort [32]. This is because, each message path has to be tested with multiple test data in order to expose the faults associated with state-transitions, predicates etc. In a resource-constrained environment, test engineers often have no choice but to select a subset of all message paths either in an ad-hoc manner or based on their intuitive priorities. It can be possible that the selected message paths may not include all Method-Message paths (MM Paths) which are the sequence of method executions linked by mes- sages [33, 34] or include a less critical MM path multiple times, but may not include a highly critical MM path not even once. Therefore, the issue is how to decide the criteria for selection of message paths that can save test effort without compromising testing quality.

d) UML model-based analysis results reported in the literature implicitly assume that every identifiable path in various UML models such as activity, sequence, communication diagrams is feasible. Such analysis ignores the presence of infea- sible paths that can not be taken using any test data [35, 36, 37, 38]. However, several studies report that a significant number of paths in object-oriented pro- grams can be infeasible [35, 37]. There is scope to investigate infeasibility of message paths in UML models.

1.4 Objectives of the thesis

Our research objectives are the following.

7 1. Introduction

A) Define a graph model: Our first objective is to construct a graph model for the XMI representation of a sequence diagram. The graph model should capture control flow among model elements as well as their method scope information.

B) Code generation: Our next objective is to generate behavioral code of differ- ent class methods from the graph model. For this, it is necessary to map model elements in graph model into code artifacts.

C) Control MM path coverage: Our third objective is to determine effective coverage of MM paths while selecting a subset of message paths from sequence diagrams. For this, it is required to follow suitable coverage criteria and a coverage model.

D) Detect infeasible paths: Our fourth objective is to identify frequently occur- ring infeasibility patterns in sequence diagrams and design algorithms to detect infeasible paths arising out those patterns.

1.5 Contributions of our research

a) We propose a novel graph model of sequence diagram that we have named as Sequence Integration Graph (SIG). In contrast to the conventional graph model (i.e. control flow graph), SIG subsumes control flow graph and additionally contains method scope information of the interactions.

b) We use SIG to generate behavioral code belonging to multiple class methods. The explicit method scope information present in our graph model helps to identify different class methods for which code has to be generated.

c) Our empirical study has established that code generation from sequence dia- grams for use cases is most effective for controller classes compared to entity and boundary classes.

d) We process SIG to generate MM paths and subsequently, build MM coverage model to capture their call relationships and priority information. Using this MM coverage model, we determine the coverage of MM paths with respect to two coverage criteria: All MM Paths (AMP) and Prioritized MM Paths (PMP).

8 1.6. Organization of the thesis

e) To prioritize the MM paths, we propose a novel metric called AW P L (Average Weighted Path Length). Higher AWPL value for an MM path implies a larger length of its sub path common with other MM paths and higher density (i.e. average number of paths covering the sub path).

f) Our experimental results have shown that MM coverage model can be used to select a subset of message paths satisfying our proposed coverage criteria, whereas control flow graph can not be used for the same even with prioritization of message paths.

g) We investigate infeasibility of message paths in sequence diagrams with respect to two interaction patterns: Null Reference Check (NLC) and Mutually Exclu- sive (MUX). An NLC interaction pattern consists of modeling both null and non-null return values for a return variable in one method and checking nullify test of the return variable in another method. On the other hand, an MUX in- teraction pattern consists of state-based interaction of an object that has many choices of taking actions based on its own current state or other object’s state, but the object is permitted to take one action at a time.

h) We design two algorithms for detecting infeasible paths for NLC and MUX pat- terns. By applying our infeasibility detection algorithms to the sequence dia- grams for Restaurant Automation System and Auditorium Management System, we identify 14% to 80% infeasible MM paths, 33% to 94% infeasible MM pairs, and 50% to 97% infeasible scenarios.

1.6 Organization of the thesis

The rest of the thesis has been organized into chapters as follows.

Chapter 2: This chapter reviews the reported work on code generation from UML models, test case generation using sequence diagrams, and infeasible path de- tection techniques.

Chapter 3: This chapter introduces our proposed graph model called Sequence Integration Graph (SIG). We then discuss our approach to construct SIG from XMI of a given sequence diagram.

9 1. Introduction

Chapter 4: In this chapter, we present our approach to generate code from SIG. In our approach, we discuss how to map model elements in SIG into code artifacts. The time and space complexities of our code generation approach are also reported.

Chapter 5: In this chapter, we describe our technique to select a subset of message paths in sequence diagrams after determining effective coverage of underlying MM paths. For this, we use two coverage criteria and MM coverage model.

Chapter 6: In this chapter, we introduce two interaction patterns named as Null Reference Check (NLC) and Mutually Exclusive (MUX) which cause infeasibility of message paths in UML sequence diagrams. We then present our technique to detect infeasible paths with respect to MUX and NLC patterns. The effects of MUX and NLC patterns on few case studies are also reported.

Chapter 7: In this chapter, we summarize our research contributions and also discuss about directions for future research.

10 Chapter 2

Survey of Related Work

Our research work encompasses the areas of code generation, determination of effec- tive test coverage, detection of infeasible paths using UML sequence diagrams. In this chapter, we first review the existing work on structural and behavioral code gen- eration from UML diagrams. We then discuss the different ways in which researchers have used sequence diagrams for testing object-oriented systems. Finally, we review existing techniques on infeasible path detection.

2.1 Code generation from UML diagrams

Different UML diagrams such as class, statechart, collaboration, and sequence dia- grams have been used for code generation as discussed below. Engels et al. [39] propose a transformation process based on collaboration diagram to generate Java code. For this, Engels et al. provide the guidelines to transform the collaboration diagrams into a well-formed structure. Engels et al. consider a refined meta-model for collaborations so that well-formed structured collaboration diagram can be instantiated from the refined meta-model. The transformation rules are ap- plied on the refined meta-model to generate Java code. Similar to Engels et al.’s approach, Thongmak et al. [40] propose a set of rules to transform UML sequence diagrams into Java code. For this, Thongmak et al. map sequence and class diagrams into a meta-model suitable for transformation and then apply the meta rules for conditional method invocation, assigning a value to a vari- able, creating new object, invoking a method of object itself. Code generation from UML statechart diagrams have been reported by Tanaka

11 2. Survey of Related Work et al. [41, 42, 43]. Niaz and Tanaka [41] propose an approach to generate Java code from UML class and statechart diagrams. For code generation, they represent states as objects, transitions as operations, hierarchical and concurrent substates by means of object composition and delegation. To simplify the code generation process, Niaz and Tanaka introduce a helper object that encapsulates all the state specific behavior of a multi-state domain object. Jakimi et al. [31] propose an approach for requirements engineering using UML and high-level Petri nets with additional support for code generation. Initially, they model scenarios in the form of sequence diagrams enriched with time and security con- straints. Next, Jakimi et al. [31] combine and all sequence diagrams into Hierarchical Colored Petri Nets (CPNs) [44]. After this, the sequence diagrams representing scenarios of one particular use case are composed into one single se- quence diagram using four different operators like sequential, concurrent, conditional, and iteration. By linking these single sequence diagrams, Jakimi et al. [31] build a global single sequence diagram capturing the behavior of the entire system. This global single sequence diagram is further analyzed to generate code. However, they have not reported detailed code generation steps. Usman et al. [30] presented a tool called UJECTOR to generate structural and behavioural code from UML 2.0 class, sequence, and activity diagrams. For this, Us- man et al. apply XMI parser to XMI representation of input UML diagrams to create three different types of meta model instances, which are subsequently processed to generate Java code artifacts as follows: classes, their attributes and method signa- tures (from class diagram meta model instance); flow of control within class methods (from sequence diagram meta model instances); actions within class methods (from activity diagram meta model instances). Parada et al. [45] present an approach to generate code from UML 2.0 class and sequence diagrams for embedded systems. Parada et al. map messages, loops, con- ditions captured in sequence diagram into method invocations, for/while statements, switch/if-else statements in Java code, respectively. For structural code generation from class diagrams, they consider attributes, method signatures, constructors, rela- tionship between classes or interfaces, the cardinality of attributes etc.

Summary of observations We summarize the existing work of code generation from UML diagrams in Table 2.1. The sequence diagrams used in the existing work [40, 30, 45] contain the messages

12 2.2. Testing using UML sequence diagrams

Table 2.1: Summary of existing approaches for code generation from UML Models.

Authors UML diagrams Main steps (1) Generate behavioral code Collaboration Engels et al. [39] (2) Apply transformation rules to refined meta- diagram model of collaboration diagram; (1) Generate behavioral code Statechart Niaz et al. [41] (2) Use helper object that encapsulates all the diagram state specific behavior (1) Use UML 1.x syntax; Thongmak et al. Sequence (2) generate behavioral code; [40] diagram (3) Apply the rules to the meta models (1) Use UML 1.x syntax; (2) Compose scenario level sequence diagrams Sequence Jakimi et al. [31] into a global sequence diagram using four oper- diagram ators: sequential, concurrent, conditional, itera- tion; (3) Generate code from global sequence diagram (1) Use UML 2.0 syntax; (2) Generate code structure from class diagram, Usman et al. [30] Class, sequence, and activity flow of control within methods from sequence di- diagrams agrams, object manipulations from activity dia- grams Class, sequence (1) Use UML 2.0 syntax; Parada et al. [45] diagrams (2) Generate structural and behavioral code; belonging to single class method. But, sequence diagrams that are designed to model the behavior of complex use cases in practical situations [46, 47] usually contain the messages belonging to multiple class methods. For code generation from such se- quence diagrams, consideration of method scope information is necessary, which the existing work [40, 30, 45] have ignored. Although scenario level sequence diagrams have been used in Jakimi et al.’s approach [31], but they have followed UML 1.x syntax and have also not reported how to handle method scope information.

2.2 Testing using UML sequence diagrams

Several research work on integration and system testing of object-oriented systems using sequence diagrams have been reported. We briefly review these in the following.

13 2. Survey of Related Work

Fraikin et al. [17] presented a tool called SeDiT eC for testing Java applications using sequence diagrams. For functioning of SeDiT eC, sequence diagrams must con- tain the method calls which make the system in the desired state and those to be actually tested. During execution of application under test, SeDiT eC stores test exe- cution data, which include the values of all objects referred in the sequence diagrams. This enables SeDiT eC to decide whether the correct method has been called or it has been invoked on the correct object and with its appropriate state. SeDiT eC also supports the generation of test stubs. Briand et al. [48] propose a system testing methodology named TOTEM (Testing Object orienTed systEms with the unified Modeling language). They use different ar- tifacts of UML specifications such as use case, activity, sequence diagrams and Object Constraint Language (OCL). Briand et al. capture sequential dependencies among use cases into a diagram similar to an activity diagram with the help of system ana- lysts. This helps to determine the legal sequences of use cases. Analyzing sequence diagrams for individual use cases, Briand et al. [48] identify operation sequences along with their initial condition(s), post-condition(s) etc. These information are then used to find test oracles. Pilskalns et al. [18] propose a test case generation approach by combining infor- mation from both UML sequence and class diagrams. For this, Pilskalns et al. propose the following steps: (a) find partitions and boundary values of all attributes/parameters referred in class diagram; (b) transform sequence diagram into a control flow graph, called Object Method Directed Acyclic Graph (OMDAG); (c) generate paths from OMDAGS; (d) select attribute/parameter values from partitions and assign them to OMDAG paths. The parameter values associated with OMDAG paths are recorded in a data structure, named Object Method Execution Table (OMET), which is subse- quently used for test case generation following All Message Paths and Class Attribute criteria [49]. Garousi et al. [29] propose a methodology to analyze control flow of UML 2.x sequence diagrams using Object Constraint Language (OCL). For this, Garousi et al. consider an extended activity diagram metamodel, called as Concurrent Control Flow Graph (CCFG). Garousi et al. [29] define a set of mapping rules (expressed in OCL) to map an instance of sequence diagram meta model into an instance of CCFG meta model. A formal representation of concurrency control flow path is also discussed in their work. Rountev et al. [32] investigate applicability of sequence diagrams for run-time

14 2.2. Testing using UML sequence diagrams coverage analysis. For this, they propose several coverage criteria based on sequence diagrams and a coverage model called Interprocedural Restricted Control-Flow Graph (IRCFG). This IRCFG captures calling relationships among a set of restricted CFGs (RCFGs), where each RCFG corresponds a control flow graph restricted to one method. Note that coverage of all IRCFG paths is equivalent to the coverage of all message paths in sequence diagrams. For selection of a subset of all message paths from IRCFG, Rountev et al. [32] advocate following coverage criteria: All-RCFG- Paths to cover all RCFG paths, All-RCFG-Branches to cover all RCFG edges, All- Unique-Branches to cover all unique RCFG edges. Dinh-Trong et al. [21] propose an approach to generate test inputs for validation of UML design models. For this, they use both class and sequence diagrams to build a control flow graph annotated with data-flow information, called Variable Assign- ment Graph (VAG). Dinh-Trong et al. select paths from the VAG based on different coverage criteria: node, edge, path and subsequently generate test inputs by solving the path constraints using the constraint solver named Alloy [50]. Sarma et al. [19] use sequence diagrams to synthesize test cases. For this, they model scenarios of sequence diagram into a control flow graph, called Sequence Di- agram Graph (SDG), where a node represents a message and an edge represents a control flow. Sarma et al. augment the nodes of SDG with different information like attributes of participating objects, parameters of a method, predicate (if any), range of attribute values etc. which are necessary to compose test vectors. These informa- tion are obtained from use case templates, class diagrams and data dictionary. With the node information, SDG is used to generate test cases following All Message Paths coverage criterion. Ali et al. [23] use UML collaboration diagram, which is equivalent to the sequence diagram, for state-based integration testing. They first build a testable model, called SCOTEM (State COllaboration TEst Model) after combining control flow graph of collaboration diagram with state information of collaborating objects. Ali et al. gen- erate test paths from SCOTEM following different coverage criteria such as Single- Path, All-Transition, n-Path, and Path. Note that the existing work of Fraikin et al. [17], Pilskalns et al. [18], Sarma et al. [19] are based on UML 1.x sequence diagrams. Considering advanced notations of UML 2.0 sequence diagrams (also called interaction diagrams), Nayak et al. [20] propose a test generation approach. For this, they transform sequence diagram into a control flow graph, called Scenario Graph. Its nodes are of following types: block, de-

15 2. Survey of Related Work cision, merge, fork, join. Note that a block node represents a set of messages that are executed without jump in control flow. To facilitate the generation of paths, Nayak et al. build an intermediate testable model called ITM. For this, they identify min- imum regions (a region has the single entry and exit and is not contained in another region) in the Scenario Graph and replace them with composite nodes successively until the scenario graph becomes a single chain of nodes, which is actually an ITM. Nayak et al. use ITM to generate test scenarios by expanding each composite node into its internal paths in successive iterations. These generated test scenarios are then processed to synthesize test cases. Extending the work of Dinh-Trong et al. [21], Bandyopadhyay et al. [22] annotate the VAG [21] with the information from state machines of the participating objects to transform it into another graph model, called Extended Variable Assignment Graph (EVAG). For this, Dinh-Trong et al. augment the message node in VAG with the information of associated transitions. Likewise Dinh-Trong et al.’s approach, Bandy- opadhyay et al. also solve path constraints to generate test inputs for EVAG paths. To improve fault detection capability for state-based integration testing, Briand et al. [25] use data flow information. They combine sequence diagram and statechart diagrams of collaborating objects into a control flow graph annotated with data flow information, called MEACF G (Message/Event/Action Control Flow Graph). This MEACF G is used to generate test cases by using coupling-based, data flow testing criteria.

Summary of observations We summarize the existing work for testing object-oriented systems using UML se- quence diagrams in Table 2.2. From Table 2.2, we can observe that a few work [17, 32, 20] have solely used sequence diagrams, while the others [18, 21, 19, 22, 25] have used sequence, class and/or statechart diagrams. A large number of reported research work [18, 21, 19, 20, 22] are based on constructing control flow graph of sequence diagram annotated with useful information and have followed All Message Paths coverage criterion. According to this coverage criterion, all message paths have to be executed atleast once. It is well known that effective testing of a message path requires to test it with multiple test data in order to expose the faults associated with state-transitions, predicates [23]. With a limited budget, it may not be feasible to allow effective testing of all message paths [32]. In such situations, it is necessary to select a subset of message paths.

16 2.2. Testing using UML sequence diagrams

Table 2.2: Summary of existing approaches for testing object-oriented systems.

Authors UML diagrams Main steps (1) Presented a tool called SeDiTeC for testing Java applications. Sequence Fraikin et al.[17] (2) SeDiT eC can decide whether the correct method diagram has been called or it has been invoked on the correct object and with its appropriate state. (1) Transform sequence diagram into a control flow graph called OMDAG. (2) Associate the values from the class at- Pilskalns et al. Sequence and tribute/parameter partitions with the objects in [18] class diagrams all paths of the OMDAG. (3) Generate test cases following All Message Paths and Class Attribute criteria. (1) Use a coverage model called IRCFG to capture Rountev et al. Sequence calling relationships among a set of restricted CFGs. [32] diagram (2) Propose several coverage criteria: All-RCFG-Paths, All-RCFG-Branches, All-Unique-Branches. (1) Build a control flow graph annotated with data-flow Class and information called Variable Assignment Graph (VAG). Dinh-Trong et al. sequence 2) Select paths from the VAG based on different cov- [21] diagrams erage criteria: node, edge, path and generate their test inputs by solving the path constraints. (1) Model scenarios of sequence diagram into a control flow graph called sequence diagram graph (SDG). Sequence (2) Augment its node information with object at- Sarma et al. [19] diagram and tributes, method parameters, predicate, range of at- data dictionary tribute values. (3) Generate test cases following All Message Paths coverage criterion. (1) Build a testable model called SCOTEM after com- bining control flow graph of collaboration diagram with Collaboration state information of collaborating objects. Ali et al. [23] and statechart (2) Generate test paths from SCOTEM following diagrams different coverage criteria such as Single-Path, All- Transition, n-Path, and Path.

17 2. Survey of Related Work

(1) Convert sequence diagram into a control flow graph called scenario graph. Sequence (2) Generate test scenarios, use composite nodes and Nayak et al. [20] diagram subsequently, expand their internal paths. (3) Generate test cases following All Message Paths coverage criterion. (1) Annotate the VAG with the information from state Class, machines of the participating objects to transform it Bandyopadhyay sequence, into another graph model called EVAG. et al. [22] statechart (2) Solve path constraints to generate test inputs for diagrams EVAG paths. (1) Combine sequence diagram and state chart dia- Sequence grams of collaborating objects into a control flow graph Briand et al. and statechart annotated with data flow information called MEACFG. [25] diagrams (2) Use MEACFG to generate test cases following coupling-based, data flow testing criteria.

When we select a subset of message paths in random manner, then it is possible that some MM path(s) are not covered at all or the coverage of MM paths are not as per their priorities. To cope with this situation, we need MM path based coverage model since control flow graph that has been used in the existing approaches is not effective for the same. Further, we need to consider suitable prioritization metric for deciding effective coverage of underlying MM paths. Concerned with this, we should consider the length and density (number of paths covering an edge) of common subpaths of MM paths. This is important because if a single fault occurs in highly dense common subpath of an MM path, then it would cause more number of failures to occur during execution, affecting reliability of the software.

2.3 Infeasible path detection techniques

Infeasible path detection has been acknowledged to be an important yet vexing prob- lem [51, 52, 53, 54, 37, 35, 55, 36, 38]. Of late, researchers have proposed several infeasible path detection techniques. All these techniques can be classified into two categories: static and dynamic. Static techniques are mainly based on the charac- teristics of program code such as branch correlation, code pattern, satisfiability of a

18 2.3. Infeasible path detection techniques set of predicates along a path etc. On the other hand, dynamic techniques are based on executing a program where actual values are assigned to input variables and the program’s execution flow is monitored. The existing work on static and dynamic techniques are briefly reviewed in the following.

2.3.1 Static analysis techniques

Clark [56] first reported path infeasibility while generating test data for Fortran pro- grams. For this, Clark [56] used symbolic path execution where expressions are as- signed to program variables instead of values. A set of constraints contained in each path is simplified and then checked for satisfiability using an inequality solver. If these constraints are satisfiable, then the corresponding path is feasible, otherwise it is infeasible. The set of constraints so obtained for a feasible path is also used to generate test data. Based on experimental evidence, Hedley et al. [51] concluded that an infeasible path contains at least one infeasible LCSAJ (Linear Code Sequence And Jump) where an LCSAJ is a linear sequence of statements, which (a) begins at either the start of the program or a point to which the control flow may jump, (b) ends at either the end of the program or a point from which the control flow may jump. Hedley et al. [51] performed a study on Fortran Nag Library [57] consisting of 88 routines and found that 12.5% of the total LCSAJs are infeasible. Their study reveals that the presence of loop constructs and poor programming style are the main causes for infeasible LC- SAJs to occur in Fortran programs. Yates et al. [58] propose a path generation approach for branch testing by ex- cluding infeasible paths. Their approach is based on the assertion that the fewer the predicates a program path contains, the more likely is that the path is feasible. Based on the work of Yates et al. [58], Malevris [53] propose a path generation method to cover all LCSAJs in a program while minimizing the number of infeasible paths generated. The key idea of Malevris’s work [53] is that if an LCSAJ say m lies on an infeasible path, then another path with the next smaller number of predicates that contains m is considered for a candidate feasible path. Bodik et al. [52] refine def-use pair analysis [59] using information about infeasible paths. For this, Bodik et al. [52] extend the idea of compiler optimization technique for elimination of redundant conditional branches [60]. However, this branch correla- tion analysis does not necessarily yield shortest infeasible paths, which are required

19 2. Survey of Related Work to maximize the number of def-use pairs to be excluded. To identify the shortest infeasible paths, Bodik et al. analyze control flow graph. Subsequently, def-use pairs that span the shortest infeasible sub paths are excluded. Forgacs et al. [54] propose an approach to select feasible test paths to reach a spec- ified program point. Their approach is based on the concept of influencing predicates, which actually determines reachability to a set of selected program statements. Using this concept, Forgacs et al. introduce a new term called principal slicing, which is a program slice with an almost minimum number of influencing predicates. To identify feasible test paths, they compute principal slice using both control and data flow information since paths derived from principal slices are likely to be feasible. Souter et al. [37] propose an approach to detect infeasible paths in object-oriented programs by analyzing corresponding call graphs (capture all possible runtime calling relationships among class methods). Their approach is based on identification of type infeasible call chains (execution paths through call graph) along which the type of data expected to propagate from polymorphic call site is not valid. While investigat- ing the causes of type infeasible call chains, Souter et al. observe that they can occur due to use of static type of formal parameter as Object, polymorphic container (that can contain objects of different types), polymorphic field (that can be instantiated as different types). Similar to Clark’s approach [56], Zhang et al. [61] present an approach to detect infeasible paths through symbolic execution. Zhang et al. first extract path con- straints by applying backward substitution starting from the final node to the start node of the path. Subsequently, the set of path constraints are checked for satisfiabil- ity using a constraint solver named as BoNuS [62]. This is an extension of a Boolean satisfiability checker. The advantage of using BoNuS is that it can accept variables of the types Boolean, integer, real, fixed-size arrays etc. Ngo et al. [35] propose an approach to detect infeasible paths in Java programs with respect to four infeasibility patterns namely identical/complement-decision, mutually- exclusive-decision, check-then-do, and looping-by-flag. An identical/complement-decision pattern consists of pair of conditions which are independent and identical/complement to each other. For mutually-exclusive-decision pattern, only one action among a set of actions is allowed to perform. An check-then-do pattern consists of successful check- ing of some conditions to activate other actions. The activation is enabled through setting a value to flag variable. For looping-by-flag pattern, flag variable is used to terminate loop. Ngo et al. analyze control flow graphs of class methods to detect

20 2.3. Infeasible path detection techniques infeasible paths arising out of these four patterns. They also empirically study the effects of these patterns on several object-oriented systems and find a large number of infeasible paths. To improve the performance of existing test path generation approaches with the presence of infeasible paths, Yan et al. [55] propose an approach to generate a set of feasible basis paths. Note that a basis path set is the maximum linearly independent subset of all paths of a program. Given an upper bound of length of path (which is required to avoid infinite execution of the program), their proposed approach contin- uously selects a basis path with the increasing path length starting from one. The selected path is then checked to examine whether it is linearly independent of previ- ously selected paths. To verify feasibility of the selected path, Yan et al. solve path constraints by using constraint solver. In case a selected path is found to be infeasible, then the proposed approach attempts to find some feasible path of the same or higher length until the path length exceeds a given upper bound.

2.3.2 Dynamic analysis techniques

Bueno et al. [63] present a dynamic approach to generate test data and identify potential infeasible paths. They use genetic algorithm for input data selection to execute a path. For this, Bueno et al. introduce a new fitness function that combines control and data flow information to drive search for the test data. If lack of progress in search is detected over successive generations, then the path is considered to be potentially infeasible. Ngo et al. [36] decide path infeasibility with respect to a few empirical properties of the correlated conditional statements. Note that two conditional statements are said to be correlated when outcome of later can be implied from the outcome of ear- lier. Ngo et al. use program traces to check validity of the empirical properties. They [36] consider the information about path infeasibility while generating path-oriented test data. In case an infeasible path is found, then test data generation for that path is skipped, thereby avoiding any further wastage of effort. Since static analysis is not enough to determine the branch correlations timely and accurately [52], Gong et al. [38] combine static and dynamic analysis to de- tect infeasible paths. For this, Gong et al. [38] collect sample test execution data capturing outcomes of branches and subsequently use them to calculate the values of maximum likelihood estimations for conditional probability of branch’s outcome.

21 2. Survey of Related Work

Table 2.3: Summary of existing infeasible path detection techniques.

Subject Authors Main steps language (1) Use branch correlation; Bodik et al. [52] C (2) Exclude def-use pair that span on infeasible paths (3) Static analysis (1) Detect type infeasible call chains due to method call Souter et al. [37] Java with polymorphic typed variable (2) Static analysis (1) Use symbolic execution Zhang et al. [61] C (2) Solve path constraints using constraint solver named as BoNuS (1) Use code patterns such as identical/complement- decision, mutually-exclusive-decision, check-then-do, and Ngo et al. [35] Java looping-by-flag. (2) Static analysis (1) Use genetic algorithm to find test data for an intended path Bueno et al. [63] C (2) Path’s infeasibility is determined based on lack of search progress (3) Dynamic analysis (1) Consider some relationships between the correlated statements and path infeasibility Ngo et al. [36] Java (2) Exclude test data generation for infeasible paths. (3) Dynamic analysis. (1) Combine static and dynamic analysis. Gong et al. [38] C (2) Use sample data to estimate the conditional probabil- ity of branch’s outcome for infeasibility detection.

Based on conditional probability values, the branch correlations are determined and hence, infeasible paths are detected.

Summary of observations We summarize the existing detection techniques in Table 2.3. From Table 2.3, we can observe that a few techniques [37, 35, 36] have reported infeasibility patterns [37, 35, 36] for Java programs. These infeasibility pattern(s) [37, 35, 36] can affect only when there is a message path calling the method containing the infeasibility pat-

22 2.4. Conclusion tern(s). With this consideration, infeasibility of message paths can be of two types: 1) the message path calls a method containing some infeasibility patterns 2) the message path is itself infeasible. However, the existing techniques have considered first type of infeasibility of message paths. To the best of our knowledge, research investigations on the second type of infeasibility has not been reported in the literature. Further, we may note that all existing infeasible path detection techniques are code-based. It would be interesting to investigate infeasibility in UML sequence di- agrams, which are known for capturing message paths of object-oriented systems. Indeed, there are advantages of UML-based infeasibility detection over code-based infeasibility detection, which are as follows. First, with UML based infeasibility de- tection, we need to process less amount of information instead of voluminous code. Second, we can identify which test scenarios are infeasible after analyzing UML design models at design stage of software development life cycle. This facilitates software developers to determine test effort early and hence enables them to devise an effective test plan (in a model-driven software development environment) even before coding starts.

2.4 Conclusion

In this chapter, we have reviewed the existing work related to our research objec- tives. Based on our review, we infer that consideration of method scope information of messages is necessary for generation of code of multiple class methods as well as determination of effective coverage of MM paths (an MM path is execution sequence of model elements from the start to end of a method scope) and detection of infea- sible paths. In the next chapter, we report our investigation on how method scope information can be captured in a graph model of a sequence diagram.

23

Chapter 3

Construction of SIG

Given an XMI representation of a UML 2.x sequence diagram, we propose a graphical representation of the same. We call this graph model as Sequence Integration Graph and abbreviate as SIG. In this chapter, we discuss the construction of SIG from an XMI of UML sequence diagram. Our approach consists of extraction of meta- information from XMI of sequence diagram and a set of mapping rules to store these information into SIG.

3.1 Some definitions

In our approach, we refer to few definitions, which are discussed in the following. We also illustrate them with reference to an abstract sequence diagram shown in Fig. 3.1(a).

Definition 3.1 (M-R pair). A message pair (m, r) is called an M-R pair if r is the reply message corresponding to the message m. The reply message implies the return of control along with data values (if any) from the receiver object to the sender object.

The example sequence diagram (in Fig. 3.1(a)) has three M-R pairs (m1, r1), (m2, r2), (m5, r5) as r1, r2, r5 are reply messages for the messages m1, m2, m5, respectively.

Definition 3.2 (Interaction). An interaction I refers to a message or a fragment

[64]. In general, an interaction can be represented as I= {m1, m2, ··· , mn | n ≥ 0

}, where m1, m2, ··· , mn are some messages of a sequence diagram. If n = 0, then I refers to a null interaction implying no occurrence of a message, and if n = 1, then

25 3. Construction of SIG

<> <> << Entity>> <> m1 a : Display Interface b : Message Controller c : AddressBook d : Message m2

S m1() Calt1 m2() C1 ! C1

m3 m4 alt-1 alt m3() E [C1] Calt1

[else] m4() r2

m5 r2 C S m5() opt1

C2

opt-1 opt m6 [C2] m6() Control node E Copt1 Message node

Scope edge

m7 m7() Control edge r5 r1 r5

r1

(a) A sequence diagram. (b) SIG.

Figure 3.1: An example sequence diagram and corresponding SIG.

I refers to a message. On the other hand, if n > 1, then I refers to a fragment where n denotes the number of messages in I.

In case of our example sequence diagram (Fig. 3.1(a)), we find the set of in- teractions as MI = {m1, m2, alt1, r2, m5, opt1, m7, r5, r1}. Two fragments alt1 and opt1, which are itself two interactions, contain the interactions m3, m4 and m6, respectively.

Definition 3.3 (Precedence relation). For any two interactions I1 and I2 in the set of interactions MI of a sequence diagram, we say there is a precedence relation between

I1 and I2, denoted as I1 ≺ I2 if I1 occurs immediately before I2. It implies that if there exists a precedence relation between I1 and I2 in MI , then there would be no I3 ∈ MI such that I1 ≺ I3 and I3 ≺ I2. This relation satisfies the following properties.

26 3.2. Sequence Integration Graph (SIG)

a. If I1 ≺ I2 then, I2 ̸≺ I1, where I1,I2 ∈ MI [Asymmetric].

b. If I1 ≺ I2 and I2 ≺ I3, then I1 ̸≺ I3, where I1,I2,I3 ∈ MI [Non-transitive].

c. I1 ̸≺ I1 [Non-reflexive].

In Fig. 3.1(a), we can see that the message m2 occurs immediately before the fragment alt1, implying a precedence relation between them, that is, m2 ≺ alt1.

Applying the precedence relation among the interactions in MI = {m1, m2, alt1, r2, m5, opt1, m7, r5, r1}, we find the set of precedence relations as P (MI ) = {m1 ≺ m2, m2 ≺ alt1, alt1 ≺ r2, r2 ≺ m5, m5 ≺ opt1, opt1 ≺ m7, m7 ≺ r5, r5 ≺ r1} ∪ ∪ P (Malt1 ) P (Mopt1 ), where P (Malt1 ), P (Mopt1 ) represent two sets of precedence relations among the interactions of alt1 and opt1.

Definition 3.4 (Null precedence relation). A precedence relation between an inter- action I and null interaction is called null precedence relation. If a null interaction Λ occurs before I, then we say there exists a left null precedence relation, denoted as Λ ≺ I. On the other hand, when a null interaction Λ occurs after I, then there exists a right null precedence relation, denoted as I ≺ Λ.

The concept of null precedence relation arises when we compute a set of precedence relations among the interactions of a fragment. Let us consider the fragment alt1 for which we do not find a precedence relation between two interactions m3 and m4 in it (see Fig. 3.1(a)). To ensure the presence of m3, m4 in some precedence relation, we assume the occurrence of null interaction before and after each of m3 and m4. { ≺ ≺ ≺ Applying null precedence relation, we find P (Malt1 ) as Λ m3, m3 Λ, Λ m4, ≺ } { ≺ ≺ } m4 Λ . Similarly, we obtain P (Mopt1 ) as Λ m6, m6 Λ .

3.2 Sequence Integration Graph (SIG)

We precisely define our proposed Sequence Integration Graph (SIG) in the following.

Definition 3.5 (Sequence Integration Graph). Let us denote the Sequence Integration Graph as SIG. Sequence Integration Graph for a sequence diagram is a directed graph SIG =< V, E >, where V is a set of nodes and E is a set of edges. The SIG has two types of nodes: control and message. A control node corresponds to the start or end of a fragment, whereas a message node represents a message. On the other hand, edges of an SIG are of two types: control and scope. A control edge represents control

27 3. Construction of SIG

flow and a scope edge represents a change of method scope (scope of local variables, message sending within a method) between two interactions.

The two types of nodes and edges in SIG are formally defined in the following.

Control node: A control node of a fragment is denoted as a tuple < T, B, ID >, where

i) T is the fragment type,

ii) B is the start or end of the fragment, and

iii) ID is the identification number of the fragment.

Message node: A message node is denoted as a tuple < SO, SC, RO, RC, M, P R, RV ar >, where

i) SO is the name of sender object,

ii) SC is the name of sender class,

iii) RO is the name of receiver object,

iv) RC is the name of receiver class,

v) M is the name of corresponding method,

vi) PR is the set of parameters of M, and

vii) RV ar is the return value of M.

Control edge: Between any two nodes (V 1, V 2) in SIG, there exists a control edge if their corresponding model elements (m1, m2) are such that m2 occurs immediately after m1. Note that a model element can be a message, start or end of a fragment.

Scope edge: Scope edge is a special type of control edge between two model el- ements which belong to different methods.

28 3.3. Issues with construction of SIG

Example 3.1 Figure 3.1(b) shows Sequence Integration Graph (SIG) for the sequence diagram shown in Fig. 3.1(a). In Fig. 3.1(b), it is evident that ten message nodes m1, m2, m3, m4, r2, m5, m6, m7, r5, r1 of SIG correspond to the ten messages (with the same labels), whereas two pairs of control nodes (CS , CE ), (CS , CE ) represent the alt1 alt1 opt1 opt1 starts and ends of two fragments alt1 and opt1. Apart from this, SIG has five scope edges namely (m , m ), (m , CS ), (r , m ), (m , CS ), (r , r ), each of which 1 2 2 alt1 2 5 5 opt1 5 1 indicate a change in method scopes between two consecutive interactions. 

3.3 Issues with construction of SIG

Some mapping rules are known [29, 65, 21, 20, 22] for construction of control flow graph from a sequence diagram. In this section, we discuss why the existing mapping rules are not applicable to XMI of sequence diagram while constructing control flow graph (we may note that SIG subsumes control flow graph). Let us consider an example sequence diagram (see Fig. 3.2(a)) and its correspond- ing XMI representation as exported from MagicDraw 16.0 tool [66] (Fig. 3.3). In Fig. 3.2(a), we see that the sequence diagram has five messages namely m1, m2, m3, m4, m5, and two fragments opt1, alt1. We refer tagged elements in XMI by lines numbers as appeared in Fig. 3.3. The mapping rules that have been used in the existing work [29, 65, 21, 20, 22] are in essence (1) map a model element (i.e. a message or start of a fragment or end of a fragment) into a node, (2) add an edge between two nodes if their correspond- ing model elements (m1, m2) are such that m2 occurs immediately after m1. Prior to applying these mapping rules, it is necessary to have the detailed information of messages and fragments.

Example 3.2 Let us see how the information of the message m1 is stored in multiple tagged elements of XMI. For this, we identify the tagged element (line 4) specifying the name of the method as ‘m1’. Invocation of this method corresponds to the call event at 66. On the other hand, the send and receive events (lines 24, 25) of m1 occur at the lifelines (lines 20, 21) associated with ObjectA and ObjectB (lines 16, 17), which are of the types ClassA and ClassB (lines 2, 3), respectively. In essence, ObjectA of

29 3. Construction of SIG

m 1

ObjectA: ClassA ObjectB: ClassB ObjectC: ClassC ObjectD: ClassD C S 1. m1() opt1 ! C1 C1 opt opt 1 m 2 2. m2() [ C1 ]

S alt1 alt C alt 3. m3() C2 1 ! C2 [ C2 ] m 3 m 4

[ !C2 ] 4. m4() E C alt1

E 5. m5() Copt1

m 5

(a) Sequence diagram. (b) Control flow graph for Fig. 3.2(a).

Figure 3.2: An example sequence diagram and its control flow graph.

ClassA sends the message m1 to ObjectB of ClassB and this information has to be captured for the message node m1 in the control flow graph. This implies that the node m1 has correspondences to the tagged elements at 4, 66, 25, 24, 62, 20, 16, 2, 21, 17, 3. Similar to a message, a fragment also corresponds to multiple tagged elements in

XMI. For example, the information of the fragment alt1 is captured in the tagged elements from 33 to 50, where the elements from 34 to 41 is for the first operand and the elements from 42 to 49 is for the second operand. In this regard, we may note that the node CS (representing the start of alt ) in the control flow graph alt1 1 corresponds to the elements at 34, 42, whereas the node CE (representing the end alt1 of alt1) corresponds to the elements at 41, 49.  Table 3.1 shows the association of each node in the control flow graph with a group of tagged elements in XMI. From Table 3.1 we can observe that the information of each node is not only associated with multiple tagged elements, but also are scattered in different places of XMI. This actually impedes to apply the mapping rules directly

30 3.3. Issues with construction of SIG

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. ... 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.

31 3. Construction of SIG

54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70 71. ... 72.

Figure 3.3: XMI representation of Fig. 3.2(a). to XMI. The situation becomes more complex when we apply the mapping rules for the edges of nested fragments. This is because, the tagged elements for a fragment are intermixed with elements of other fragment and a fragment operand may contain nested fragments of arbitrary nesting depth. Addressing these problems, we devise a suitable technique to construct graph model from XMI of sequence diagram.

Table 3.1: Association of nodes in Fig. 3.2(b) with tagged elements in XMI of Fig. 3.3

Nodes in Fig. 3.2(b) Tagged elements in Fig. 3.3

m1 4,66,25,24,62,20,16,2,21,17,3 S Copt1 28,29

m2 5,67,53,52,54,21,17,3 CS 33,34,42 alt1

m3 8,68,59,58,39,21,17,3,22,18,7

m4 11,69,61,60,47,21,17,3,23,19,10 CE 41,49,50 alt1 E Copt1 56,57

m5 63,26,27,70,21,17,3,20,16,2

32 3.4. Our approach 3.4 Our approach

Our conversion approach consists of seven steps shown in the form of an activity diagram (see Fig. 3.4). Given XMI representation of sequence diagram as an input, we parse it to extract meta-information and find set of messages and fragments. This helps us to find node information, to identify M-R pairs, to determine fragment struc- ture. These information are further processed to find edges and their labels. We now discuss all these steps in details as follows.

XMI

Identify fragments and their message sets

Identify M-R pair and their message sets Determine nodes Determine fragment structure

Determine edges

Identify edge labels

Identify scope edges

SIG

Figure 3.4: Activity diagram for construction of SIG from a sequence diagram.

33 3. Construction of SIG

To illustrate our approach, we use the example of Issue Book use case of Library In- formation System [67]. In Issue Book, user enters his/her card number and accession number of the book. Depending on the availability of book and user credit status, account existent, different scenarios may occur, which are modeled in the sequence diagram as shown in Fig. 3.5. Note that the synchronous messages of sequence dia- grams have been considered in our study. For simplicity, we refer to all messages by the message numbers and fragments by the fragment labels (See Fig. 3.5).

Step 1: Identifying fragments and their message sets We parse XMI of a given sequence diagram using standard SAX (Simple API for XML) parser [68]. The SAX parser notifies about parsing events (opening tags, clos- ing tags, content of elements etc.) to application through event handlers (startEle- ment(), endElement()). During invocation of event handlers, the attributes of the tagged elements are passed as parameters, which we subsequently process in the fol- lowing tasks.

A) Storing values from tagged elements

B) Synthesizing information of messages

C) Finding set of messages and fragments

D) Finding set of messages in fragments

All the tasks mentioned above are discussed in details as follows.

A) Storing values from tagged elements: Depending on types of tagged ele- ments parsed by SAX parser, we store their attribute values in meta objects of types EMessage, EMessageEvent, ECallEvent, EOperation, EF ragment, EOperand, EObject, EClass, ELifeline as shown in Table 3.2. In some cases, object data are obtained from other objects or set with specific values. For example, “type” of a EMessageEvent object corresponding to a send event is set as “sendEvent”, whereas for receive event, it is set as “receiveEvent”. If an operand has an inner fragment, “MessageList” of EOperand object (cor- responding to the operand) would contain Id of the fragment along with other messages in the same sequence as they appear in the operand. Finally, we obtain a set of meta objects for individual types (also see Table 3.2).

34 3.4. Our approach

interaction IssueSD[ IssueSD ]

<> <> <> <> <> <> <> <> <> IBB : Issue Book Boundary IBC : Issue Book Controller BR : BookRegister BookList[i] : Book aBook : Book UR : UserRegister UserList[i] : User aUser : User IR : IssueRegister 1: IssueBook(userID=, BookID=) 2: aBook = FindBook(BookId=)

loop-1 loop [i < BookList.length & found =F] 3: found = Match(BookId=)

alt-1 alt [found=T] 4: BookList[i]

[found=F] 5: Null

alt-2 alt [aBook=Null] 6: DisplayMessage(Msg="Book not available")

[aBook!=Null] 7: Status = GetStatus()

alt-3 alt [Status="Issued"] 8: DisplayMessage(Msg="Book is alreday issued"

[Staus="Reserved"] 9: DisplayMessage(Msg="Book is alreday reserved")

[Status ! = "Issued" & Status !="Reserved"] 10: aUser = FindUser(UserID=)

loop-2 loop [i < UserList.length & found = F]

11: found = Match(UserId=)

alt-4 alt [found = T] 12: UserList[i]

[found=F] 13: Null

alt-5 alt [aUser=Null] 14: DisplayMessage(Msg="Not a Vald User"

[aUser!=Null] 15: CreditStatus = GetStatus()

alt-6 alt [CreditStatus!=Valid] 16: DisplayMessage(Msg="Not Credit worthy")

[CreditStatus=Valid] 17: UpdateStatus() 18: SetStatus(status="Issued") 19: AddRecord(UserID=, BookId=)

20: DisplayMessage(Msg="Book is being issued")

21:

Figure 3.5: Sequence diagram of Issue Book use case.

35 3. Construction of SIG

Table 3.2: Types of meta objects for storing attribute values of elements in XMI.

XMI tagged element Value of attribute “xmi : type” Class < ownedOperation > --- EOperation < packagedElement > “uml : CallEvent” ECallEvent < message > --- EMessage < fragment > “uml : MessageOccurrenceSpecification” EMessageEvent < lifeline > --- ELifeline < fragment > “uml : CombinedF ragment” EF ragment < operand > --- EOperand < packageElement > “uml : class” EClass < ownedAttribute > --- EObject

B) Synthesizing information of messages: To find details of a message such as sender and receiver objects and their classes, parameters, return variable (if any), we find relationships among meta-classes based on inherent structure of XMI and meta-model of sequence diagram [64]. Figure 3.6 shows the rela- tionships among classes along with their attributes which are self explanatory. From Fig. 3.6, we observe that one message has two events namely send and receive, where receive event corresponds to a call event for an operation (except for reply message). Each event occurs at lifeline associated with an object of some class type. Considering the class relationships as described above, we identify two mes- sage event objects sMEvent and rMEvent (of type EMessageEvent), one call event object aCallEvent (of type ECallEvent), one operand object aOperation (of type EOperation), two lifeline objects sLifeline and rLifeline (of type ELifeline), two objects sObject and rObject (of type EObject), two objects sClass and rClass (of type EClass) such that they satisfy the following con- ditions. (a) sMEvent.type = “sendEvent”

sMEvent.Id = aMessage.SendEventId

(b) rMEvent.type = “receiveEvent” and

rMEvent.Id = aMessage.ReceiveEventId

36 3.4. Our approach

package Data[ XMIClassDgm ]

ECallEvent EMessageEvent EObject -Id : String 0..1 1 ELifeline -OperationID : String -Id : String -Type : String 1 1 1 1 -Id : String -SeqNo : Integer -Id : String -ClassId : String -LifelineId : String -ObjectId : String -CallEventId : String -Name : String 1 -EMessageId : String 1 0..1 2 EOperation EFragment 1 -Id : String -Id : String EClass -Name : String -Type : String -Id : String -ParameterList : Collection -OperandIDList : Collection -Name : String 1

EMessage 1 -Id : String 1..* -SendEventId : String EOperand -ReceiveEventId : String -messageType : String -Guard : String -ArgumentValueList : Collection 1..* 1 -Id : String -ParameterList : Collection -MessageIDList : Collection -MethodName : String -OperationID : String -SenderObject : String -ReceiverObject : String -SenderClass : String -ReceiverClass : String -SeqNumber : int -ReturnVar : String

Figure 3.6: Class diagram for meta-data of sequence diagram in XMI.

(c) (aCallEvent.Id = rMEvent.CallEventId

aOperation.Id = aCallEvent.OperationId) or

(aCallEvent.Id = sMEvent.CallEventId

aMessage.messageT ype = “reply”).

(d) sLifeline.Id = sMEvent.LifelineId

rLifeline.Id = rMEvent.LifelineId

(e) sObject.Id = sLifeline.ObjectId

rObject.Id = rLifeline.ObjectId

(f) sClass.Id = sObject.ClassId

rClass.Id = rObject.ClassId.

Once the meta-objects: sObject, rObject, sClass, rClass, aCallEvent, aOperation are identified for aMessage object using the above mentioned equations (a)-(f), we determine the individual member values of aMessage object as follows.

37 3. Construction of SIG

aMessage.MethodName = aOperation.Name(if aMessage.messageT ype ≠ “reply”)

aMessage.SenderObject = sObject.Name

aMessage.ReceiverObject = rObject.Name

aMessage.SenderClass = sClass.Name

aMessage.ReceiverClass = rClass.Name

aMessage.SeqNumber = aCallEvent.SeqNo

aMessage.parameterList = aOperation.parameterList

Note that return variable of a message m is specified in XMI as a reply message immediately following m. To find return variable for aMessage, we identify another meta-object bMessage (of type EMessage) satisfying the following conditions.

(g) bMessage.SenderObject = aMessage.ReceiverObject

(h) bMessage.ReceiverObject = aMessage.SenderObject

(i) bCallEvent.SeqNo − aCallEvent.SeqNo = 1

(j) bMessage.messageT ype = “reply”

(k) aMessage.messageT ype ≠ “reply”.

If bMessage object is found, then we set ReturnV ar of the aMessage object as follows.

aMessage.ReturnV ar = bMessage.argumentV alueList[0]

After this update of aMessage.ReturnV ar, bMessage becomes redundant and hence, is removed from the list of EMessage objects.

C) Finding set of messages and fragments: We obtain the set of messages

(Mseq), set of reply messages (Rseq) after processing EMessage objects and set

of fragments (Fseq) after processing EF ragment objects as follows.

38 3.4. Our approach [ ] { }

M = aMessage.Id aMessage is an EMessage object [ ]seq { [ ]

R = aMessage.Id aMessage.Id ∈ M & seq seq } aMessage.messageT ype = “reply” [ ] { }

F = aF ragment.Id aF ragment is an EF ragment object seq

D) Finding set of messages in fragments: For a fragment say f, we identify cor- responding EF ragment object (i.e. aF ragment) and its associated EOperand f objects (SEOprds) as follows.

[ ] { f

S = aOperand aOperand is an EOperand object EOprds } & aF ragment.OperandList contains aOperand.Id & f → aF ragment

We then find the set of EMessage objects for the fragment f as follows.

[ ] { f

S = aMessage aMessage is an EMessage object EMsgs [ ] } f & aOperand.MessageList contains aMessage.Id & aOperand ∈ S EOprds

f In case f is a nested fragment, then SEMsgs would contain an object aF ragment ′ f ′ corresponding to some other fragment f , for which we also need to find SEMsgs. f Subsequently, we update SEMsgs as

[ ] [ ] [ ] ′ { } f f ∪ f S = S S – aF ragment EMsgs EMsgs EMsgs

f Once the update of SEMsgs is complete, we obtain the message set (Mf ) for the fragment f as follows [ ] { [ ] } f

M = aMessage.Id aMessage ∈ S f EMsgs

39 3. Construction of SIG

Example 3.3 Applying the above mentioned Step 1 on the sequence diagram (in Fig. 3.5), we obtain M = {1,2,3,4,5,6,7,8,9,10, 11,12,13,14,15,16,17,18,19,20,21}, R = {4,5,12,13,21},

F = {loop1, alt1, alt2, alt3, loop2, alt4, alt5, alt6}. We also find set of messages for the { } { } { } fragments as Mloop1 = 3 , Malt1 = 4,5 , Malt2 = 6,7,8,9,10, 11,12,13,14,15,16,17,18,19,20 , { } { } { } Malt3 = 8,9,10,11,12, 13,14,15,16,17,18,19,20 , Mloop2 = 11 , Malt4 = 12,13 , Malt5 = { } { }  14,15,16, 17,18,19,20 , Malt6 = 16,17,18,19,20 .

Step 2: Determining nodes in SIG Initially, our graph model SIG is empty. That is, the set of nodes V and set of ∈ S edges E are both null. For each fragment f F , we add two control nodes Cf and E S E Cf into V of SIG, where Cf and Cf represent the start and end of the fragment f, respectively. For each message m ∈ M, we add a message node Vm into V and store the details of the message as a tuple < SO,RO,RC,MT,PR,RVar > in a table called Message Node Table, where SO=‘sender object’, RO=‘receiver object’, RC=‘receiver class’, MT =‘method’, PR=‘set of parameters’, RV ar=‘return vari- able’. Note that for message node corresponding to a reply message, PR would mean return value, if any and both MT and RV ar would be null.

Example 3.4 Considering the message set (M) and fragment set (F ) of the sequence diagram in Fig. 3.5, we find the set of nodes as shown in Fig. 3.7, where the circular and square nodes represent the control and message nodes, respectively. Message Node Table obtained for the sequence diagram is depicted in Table 3.3.  Step 3: Identifying M-R pairs and their message sets In this step, we determine the set of M-R pairs and the set of messages for each M-R pair, and then eliminate the redundant M-R pairs, if any. We state this step in the following.

a) For each reply message r ∈ R, we find two objects A and B such that r is returned from A to B and then identify a message m from B to A such that

there is no other message between m and r from B to A. Let SM−R be the set of all M-R pairs in a given sequence diagram.

b) For each M-R pair (m, r) in SM−R, we find its message set M(m,r) as the set

40 3.4. Our approach

S V 1 C loo p1

I < BookList.length & found=F V 2 V 3 S aB ll C o u alt2 ok N != = N E k u o ll C o loo p B 1 a

V 7 V 6 S C fo ” alt u d T 1 n ue S S = d ”iss S ta d = = t n F s C ta u ou atu alt tu s! f St 3 s =” ” != Is d ”R s e e ue rv s d V V 5 e e ” 4 s rv & e ed ”R ” = s tu V 1 0 V 8 ta E S Calt1 V 9 S aU C s ll u alt5 er N != = N er u s ll aU V 1 5 C S V 1 4 loop2 C re dit St atu S s= I < UserList.length & found=F Va id C li l d a alt6 V != V s 1 1 tu V ta 1 7 S it d re C V 1 6 E V 1 8 Cloop2

V 1 9 C S alt4 fo T u = n d d n = E V 2 0 u F fo C alt6 V 1 2 V 1 3 C E alt5 E C alt4 E C alt3 Control node E Message node C alt2 S co p e e d g e

V 2 1 Control edge

Figure 3.7: SIG of Issue Book sequence diagram.

41 3. Construction of SIG Node D V V V V V V V V V V V V V V V V V V V V 21 20 19 18 17 16 14 13 12 11 10 15 9 8 7 6 5 4 3 2 1 IBC IBC IBC IBC IBC IBC IBC IBC UR UR UR IBC IBC IBC IBC IBC BR BR BR IBC IBB SO IBB IBB IR aBook aUser IBB aUser IBB IBC IBC UserList UR IBB IBB aBook IBB IBC IBC BookList BR IBC RO [ [ i i ] ] IssueBookBoundary IssueBookBoundary IssueRegister Book User IssueBookBoundary User IssueBookBoundary IssueBookController IssueBookController User UserRegister IssueBookBoundary IssueBookBoundary Book IssueBookBoundary IssueBookController IssueBookController Book BookRegister IssueBookController RC al 3.3: Table DisplayMessage AddRecord SetStatus UpdateStatus DisplayMessage GetStatus DisplayMessage Match indUser F DisplayMessage DisplayMessage GetStatus DisplayMessage Match indBook F IssueBook MT esg oeTable Node Message Msg BookID UserID, status Msg Msg null UserList UserID UserID Msg Msg Msg null BookList BookID BookID BookID UserID, R P ======NtaVldUser” Valid a ”Not renewed” already is ”Book issued” already is ”Book available” not ”Book ” = Bo sbigissued” being is ”Book worthy” credit ”Not [ [ i i issued ] ] ” CreditStatus found aUser Status found aBook ar RV

42 3.4. Our approach

of messages that occur between m and r both inclusive. For this, we consider the sequence of messages corresponding to the sequence of tuples as specified in Message Node Table.

c) We discard an M-R pair (m, r) if its message set is a proper subset of the message set of other M-R pair. This may happen when the same message m

is M-R paired with multiple reply messages. We update SM−R after excluding the redundant M-R pairs.

Example 3.5 To determine the M-R pairs for the sequence diagram shown in Fig. 3.5, we ob- serve that the message 2 occurs from IBC to BR and subsequently, there is a reply message 4 from BR to IBC (see Fig. 3.5). Since no other message exists from IBC to BR between 2 and 4, (2,4) is an M-R pair. Similarly, we find other M-R pairs, namely (2,5), (10,12), (10,13), (1,21) for the reply messages 5, 12, 13, 21, respectively.

Thus, the set of M-R pairs is SM−R = {(2,4), (2,5), (10,12), (10,13), (1,21)}. We find the message sets of M-R pairs as M(2,4)= {2, 3, 4}, M(2,5)= {2, 3, 4, 5}, M(10,12)=

{10, 11, 12}, M(10,13)= {10, 11, 12, 13}, and M(1,21)= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 ,19, 20, 21}. Note that the message 2 is M-R paired with multiple reply messages 4 and 5 and M(2,4) ⊂ M(2,5). Thus, we discard M(2,4). Similar way, M(10,12) is also discarded. Therefore, the resultant set of M-R pairs is {(2,5), (10,13), (1,21)}. 

Step 4: Determining fragment structure We determine hierarchy structure of the fragments, specifically outermost frag- ments (the fragments that are not contained in another fragment) and inner fragments contained in each fragment as follows.

i) We initialize a set F ′ as the set of all fragments in the sequence diagram. We exclude a fragment from F ′, whose message set is a subset of the message set of another fragment. The exclusion of fragments continues until no such fragment is found. The resultant set F ′ is the minimum set of fragments that can replace the largest subset of the set of all messages (M) and they are the outermost fragments built by M. To encapsulate this information in the message set (M), we replace that the largest subset of M with the IDs of the fragments in F ′.

ii) Following the same procedure, we determine the minimal set of fragments F ′′

43 3. Construction of SIG

that together correspond to the largest subset of the message set of each frag- ment. This information is encapsulated in the message set by replacing that largest subset with IDs of the fragments in F ′′.

Example 3.6 Let us see how fragment structure can be determined for the sequence diagram shown in Fig. 3.5. We observe that the fragments loop1, alt1, alt2, alt3, loop2, alt4, alt5, alt6 correspond to the subset of the message set M of the sequence diagram. ⊂ ⊂ ⊂ ⊂ ⊂ Moreover, Malt3 Malt2 , Mloop2 Malt2 , Malt4 Malt2 , Malt5 Malt2 , Malt6 Malt2 . Therefore, the minimal set of fragments that can replace the largest subset of M is ′ F ={loop1, alt1, alt2} and these fragments are the outermost fragments formed by the message set M. Replacing the largest subset of M with the IDs of the fragments ′ in F , we obtain M= {1, 2, loop1, alt1, alt2, 21}. Following the same procedure, we { } { determine inner fragment of each fragment, if any as Malt2 = 6, 7, alt3 , Malt3 = 8, } { } 9, 10, loop2, alt4, alt5 , Malt5 = 14, 15, alt6 , which implies that alt2 contains an inner fragment alt3; alt3 contains three inner fragments loop2, alt4, alt5; alt5 contains an inner fragment alt6. But, we do not find any other fragment that can replace the subset of Mloop1 , Malt1 , Mloop2 , Malt4 , Malt6 and therefore, these message sets remain unchanged. 

Step 5: Determining edges To determine the edges of SIG, we apply the precedence relation ’≺‘ on M and obtain a set of precedence relations P (M). If P (M) has a precedence relation (x ≺ y) with x ∈ F or y ∈ F , then we recompute P (M) as P (M) ∪ P (Mf ), where P (Mf ) is the set of precedence relations obtained for the message set of the fragment f and f = x or y. This union operation is necessary to include all precedence relations exists among the messages of the fragment f. Depending on the types of operands of a precedence relation (x ≺ y) in P (M), we determine an edge as per the following rules.

i) If x, y ∈ M, then we add a control edge (Vx,Vy) from the message node Vx to

the message node Vy.

∈ ∈ S ii) If x M, y F , then a control edge (Vx,Cy ) is added from the message node S Vx to the control node Cy .

∈ ∈ E iii) If x F and y M, then we add a control edge (Cx ,Vy) from the control node

44 3.4. Our approach

E Cx to the message node Vy.

∈ E S E iv) If x, y F , then a control edge (Cx ,Cy ) is added from the control node Cx to S the control node Cy .

v) For a left null precedence relation (Λ ≺ y) ∈ P (M), we identify a fragment f ∈ F whose message set contains y. If y ∈ F , then we add a control edge S S S S (Cf ,Cy ) from the control node Cf to the control node Cy (representing the S start of the fragment y) otherwise, we add a control edge (Cf ,Vy) from the S control node Cf to the message node Vy.

vi) Similarly, for a right null precedence relation (x ≺ Λ) ∈ P (M), we identify a fragment f ∈ F whose message set contains x. If x ∈ F , then we add a control E E E edge (Cx ,Cf ) from the control node Cx (representing the end of the fragment E x) to the control node Cf (representing the end of the fragment f), otherwise E we add a control edge (Vx,Cf ) from the message node Vx to the control node E Cf .

Example 3.7 Applying the precedence relation on the message set M of the sequence diagram, we find the set of precedence relations as P (M) = {1 ≺ 2, 2 ≺ loop1, loop1 ≺ alt1, ≺ ≺ } ∪ ∪ ∪ alt1 alt2, alt2 21 P (Mloop1 ) P (Malt1 ) P (Malt2 ). These unions are due to the presence of the fragments loop1, alt1, alt2 in some precedence relations of P (M). To compute the unions, we determine the sets of precedence relations for three fragments loop1, alt1, alt2 in the following.

{ ≺ ≺ } P (Mloop1 )= Λ 3, 3 Λ { ≺ ≺ ≺ ≺ } P (Malt1 )= Λ 4, 4 Λ, Λ 5, 5 Λ { ≺ ≺ ≺ ≺ ≺ } ∪ P (Malt2 )= Λ 6, 6 Λ, Λ 7, 7 alt3, alt3 Λ P (Malt3 ) { ≺ ≺ ≺ ≺ ≺ ≺ ≺ P (Malt3 )= Λ 8, 8 Λ, Λ 9, 9 Λ, Λ 10, 10 loop2, loop2 alt4, ≺ ≺ } ∪ ∪ ∪ alt4 alt5, alt5 Λ P (Mloop2 ) P (Malt4 ) P (Malt5 ) { ≺ ≺ } P (Mloop2 )= Λ 11, 11 Λ { ≺ ≺ ≺ ≺ } P (Malt4 )= Λ 12, 12 Λ, Λ 13, 13 Λ { ≺ ≺ ≺ ≺ ≺ } ∪ P (Malt5 )= Λ 14, 14 Λ, Λ 15, 15 alt6, alt6 Λ P (Malt6 )

45 3. Construction of SIG

{ ≺ ≺ ≺ ≺ ≺ ≺ ≺ } P (Malt6 )= Λ 16, 16 Λ, Λ 17, 17 18, 18 19, 19 20, 20 Λ

Therefore, P (M)= {1 ≺ 2, 2 ≺ loop1, loop1 ≺ alt1, alt1 ≺ alt2, alt2 ≺ 21, Λ ≺ 3,

3 ≺ Λ, Λ ≺ 4, 4 ≺ Λ, Λ ≺ 5, 5 ≺ Λ, Λ ≺ 6, 6 ≺ Λ, Λ ≺ 7, 7 ≺ alt3, alt3 ≺ Λ, Λ ≺ 8,

8 ≺ Λ, Λ ≺ 9, 9 ≺ Λ, Λ ≺ 10, 10 ≺ loop2, loop2 ≺ alt4, alt4 ≺ alt5, alt5 ≺ Λ, Λ ≺ 11,

11 ≺ Λ, Λ ≺ 12, 12 ≺ Λ, Λ ≺ 13, 13 ≺ Λ, Λ ≺ 14, 14 ≺ Λ, Λ ≺ 15, 15 ≺ alt6, alt6 ≺ Λ, Λ ≺ 16, 16 ≺ Λ, Λ ≺ 17, 17 ≺ 18, 18 ≺ 19, 19 ≺ 20, 20 ≺ Λ}. Depending on the operand of a precedence relation (x ≺ y) in P (M), we add an edge in SIG as follows.

i) In our example, we have four precedence relations 1 ≺ 2, 17 ≺ 18, 18 ≺ 19, 19 ≺ 20 that satisfy the condition (x, y ∈ M). Thus, we add four control edges

(V1,V2), (V17,V18), (V18,V19), (V19,V20) in SIG.

ii) As the precedence relations 2 ≺ loop1, 7 ≺ alt3, 10 ≺ loop2, 15 ≺ alt6 satisfy the second condition (x ∈ M, y ∈ F ), we add four control edges (V ,CS ), 2 loop1 (V ,CS ), (V ,CS ), (V ,CS ). 7 alt3 10 loop2 15 alt6

iii) There is only precedence relation alt2 ≺ 21 satisfying the condition (x ∈ F , y ∈ M). Therefore, we add one control edge (CE ,V ). alt2 21

iv) The four precedence relations loop1 ≺ alt1, alt1 ≺ alt2, loop2 ≺ alt4, alt4 ≺

alt5 satisfy the condition (x, y ∈ F ). For them, we add four control edges (CE ,CS ), (CE ,CS ), (CE ,CS ), (CE ,CS ). loop1 alt1 alt1 alt2 loop2 alt4 alt4 alt5

v) For the left null precedence relation Λ ≺ 3, we identify the fragment loop1 that contains the message 3 and hence, add an edge (CS , V ). Similarly, we add loop1 3 the edges for other left null precedence relations as shown in Fig. 3.7.

vi) We map the right null precedence relation alt ≺ Λ to the edge (CE , CE ). 5 alt5 alt3 Similarway, the remaining right null precedence relations are mapped to corre- sponding edges in SIG (see Fig. 3.7). 

Step 6: Identifying edge labels Once the edges of the SIG are determined, we label the edge corresponding to each left precedence relation (Λ ≺ x) ∈ P (M) same as the guard condition associated with corresponding operand of the fragment that contains x (combined fragment or message). All other edges would have no label.

46 3.5. Analyzing consistency between sequence diagram and SIG

Example 3.8

Applying the Step 6 to (Λ ≺ 3), we find that only one operand of the loop1 contains the message 3 and is associated with the condition i

Step 7: Identifying scope edges We identify a control edge of SIG as a scope edge if it corresponds to a precedence relation (x ≺ y) ∈ P (M) satisfying one of the following criteria: i) y belongs to the message set of some M-R pair (x, r).

ii) x belongs to the message set of some M-R pair (z, r), but y is not.

Example 3.9 After observing the sequence diagram (in Fig. 3.5), we find that the precedence relation (1 ≺ 2) satisfies the first condition of the Step 7 since 2 ∈ M(1,21) and

(1, 21) ∈ SM−R. Thus, the edge edge (V1,V2) corresponding to (1 ≺ 2) is a scope edge which is shown as dotted edge in Fig. 3.7. On the other hand, the precedence relation

(alt1 ≺ alt2) satisfies the second condition of the Step 7 as evident from alt1 ∈ M(2,5), alt ̸∈ M . This implies that (CE ,CS ) is a scope edge (see Fig. 3.7). In this 2 (2,5) alt1 alt2 way, we identify five scope edges as (V ,V ), (V ,CS ), (CE ,CS ), (V ,CS ), and 1 2 2 loop1 alt1 alt2 10 loop2 (CE ,CS ).  alt4 alt5

3.5 Analyzing consistency between sequence dia- gram and SIG

We need to check whether SIG constructed so far is consistent with the input sequence diagram, that is, whether (1) model elements in sequence diagram (say SD) have correspondences to the nodes in the graph model (say SIG), (2) control flows in SD have correspondences to the edges in SIG. We now prove Theorem 4.5.6 using a set of Lemmas. Lemma 3.5.1. For the graph model (SIG) to be consistent with a sequence diagram (SD), the total number of nodes in SIG must be equal to the number of messages plus twice the number of fragments in SD.

47 3. Construction of SIG

Proof. For the graph model (SIG) to be consistent with a sequence diagram (SD), each message and each fragment in SD must be mapped into a message node and a pair of control nodes (representing the start and end of the fragment) in SIG, respectively. Therefore, the total number of nodes in SIG would be the number of messages plus twice the number of fragments in SD. 

Lemma 3.5.2. There are nine possible types of control flows in sequence diagram.

Proof. The type of a control flow in sequence diagram depends on the types of connected model elements of the control flow. A model element would be any one of the following three types: message (m), fragment start (f S), fragment end (f E). Considering the types of connected model elements, a control flow has to be any one of the following nine types: (m,m), (m,f S), (m,f E), (f S,m), (f E,m), (f S,f S), (f S,f E), (f E,f S), (f E,f E). 

Lemma 3.5.3. There are nine possible types of edges in graph model.

Proof. The type of an edge in graph model depends on the types of the connecting nodes of the edge. On the other hand, a node can be one of following three types: S a message node (v), a control node representing the start of a fragment (Cf ), a E control node representing the end of a fragment (Cf ). Considering the types of the S connected nodes, an edge can be any one of the following nine types: (v,v), (v,Cf ), E S E S S S E E S E E  (v,Cf ), (Cf ,v), (Cf ,v), (Cf ,Cf ), (Cf ,Cf ), (Cf ,Cf ), (Cf ,Cf ). Lemma 3.5.4. For the graph model (SIG) to be consistent with a sequence diagram (SD), there must be a one-to-one correspondence of the type of control flow in SD with the type of edge in SIG.

Proof. During construction of graph model (SIG) from a sequence diagram (SD), following mappings are applied: message to message node (m → v), fragment start S → S to control node representing the start of the fragment (f Cf ), fragment end to E → E control node representing the end of the fragment (f Cf ). Therefore, nine types of control flows (m,m), (m,f S), (m,f E), (f S,m), (f E,m), (f S,f S), (f S,f E), (f E,f S), E E S E (f ,f ) would have correspondences to nine types of edges (v,v), (v,Cf ), (v,Cf ), S E S S S E E S E E  (Cf ,v), (Cf ,v), (Cf ,Cf ), (Cf ,Cf ), (Cf ,Cf ), (Cf ,Cf ), respectively. Lemma 3.5.5. For the graph model (SIG) to be consistent with a sequence diagram (SD), the number of types of control flows in SD must be equal to the number of corresponding types of edges in SIG.

48 3.6. Applications of SIG

Proof. We prove Lemma 3.5.5 through contradiction. Let us assume that the number of types of control flows in SD is not equal to the number of corresponding types of edges in SIG. This is only possible when there does not exist one-to-one correspondence of the type of control flow in SD with the type of edge in SIG, which contradicts Lemma 3.5.4. 

Theorem 3.5.6. The graph model (SIG) is consistent with a sequence diagram (SD) if and only if the following conditions are satisfied (1) the number of nodes in SIG is equal to the number of messages and twice the number of fragments in SD, (2) the number of types of edges in SIG is equal to the number of corresponding types of control flows in SD.

Proof. From Lemmas 3.5.1, 3.5.5, we find that the graph model (SIG) to be consistent with a given sequence diagram (SD) must satisfy the conditions (1) and (2). To prove that these two conditions are sufficient for SIG to be consistent with SD, we consider the basic constituent elements of the graph model (SIG), which are the set of nodes and set of edges. If these two sets of SIG have correspondences to the set of model elements and the set of control flows in SD, respectively, then no construct in SIG would be left inconsistent. This implies that these two conditions (1) and (2) are the necessary and sufficient conditions for the graph model to be consistent with a sequence diagram. 

3.6 Applications of SIG

In our thesis, we investigate three applications of SIG, namely code generation, selec- tion of subset of all message paths, and detection of infeasible paths. The integrated framework of our proposed approach for these three applications is shown in Fig. 3.8. Our integrated framework consists of six main modules as discussed below.

1. SIG Constructor: This module extracts meta-information of sequence dia- grams (as discussed in this chapter) and processes these information to build SIG.

2. Integration Colony Identifier: The main task of this module is to find subgraphs of SIG, which are enclosed by a pair of incoming and outgoing scope edges. These subgraphs are referred to as Integration Colonies.

49 3. Construction of SIG

Infeasible Paths Infeasible Detector paths

Sequence SIG Diagram Constructor

Subset of Message MM Paths MM paths Paths message SIG Synthesizer Selector paths

Integration Integration Code Colonies Code Generator Identifier Colonies

Figure 3.8: Block diagram of integrated framework.

3. Code Generator: This module processes integration colonies (subgraphs of SIG) and maps their model elements into code artifacts.

4. MM path Generator: In this module, paths of Integration Colonies are mapped into MM paths, where MM paths are sequences of model elements from the start to end of a method scope.

5. Message Paths Selector: This modules generates subset of all message paths after concatenating underlying MM paths as per their coverage and priorities.

6. Infeasible Paths Detector: In this module, infeasible message paths are detected.

3.7 Conclusion

In this chapter, we have proposed a graph model, called Sequence Integration Graph (SIG). We have devised a suitable technique to overcome the difficulties of attribut- ing all necessary information from XMI tagged elements and to capture them in an equivalent graph model. For this, the precedence relations among model elements (messages, starts and ends of fragments) are determined and thereafter are mapped

50 3.7. Conclusion into control flows of SIG. The difference between SIG and the conventional graph model (i.e. control flow graph) is that the SIG subsumes control flows and additionally contains method scope information of the messages. Please note that method scope information is captured in SIG by means of scope edge pairs (representing starts and ends of method scopes). This facilitates us to identify subgraphs of SIG, whose elements are in the same method scope. Processing of XMI of UML sequence diagram is indeed non-trivial task for large and complex applications, which none of the existing work [29, 65, 21, 20, 22] have reported. We have presented an approach for processing XMI of sequence diagram in details. In subsequent chapters, we process SIG to generate code of class methods, syn- thesize MM paths, and identify infeasible paths.

51

Chapter 4

Code Generation

In the last chapter, we have discussed our approach to construct SIG (Sequence Integration Graph) from UML 2.0 sequence diagrams. In this chapter, the application of SIG to code generation is discussed. In addition to control flow information, SIG captures method scope information, which has key role in generating code. In this work, we have investigated for mapping rules with which model information in SIG can be mapped into Java code. Our analysis and experiments with a few case studies reveals that using SIG we can generate a substantial amount of code for controller classes.

4.1 Our approach

Our code generation approach consists of three steps as shown in Fig. 4.1. Given SIG as an input, we first identify scope edge pairs representing the start and end of method scopes. For each such scope edge pair, we find the subgraph of SIG, whose model elements belong to the same method. Subsequently, we apply mapping rules to the model elements in the subgraphs to generate code for class methods. To illustrate our concept, we use an example of Issue Book sequence diagram as shown in Fig. 3.5.

Step 1: Identifying scope edge pairs We identify a scope edge pair (ˆa, ˆb) in SIG such thata ˆ and ˆb correspond to the start and end of a method scope, respectively. For this, depth-first-search traversal of SIG is used. While traversing SIG, if a node is found with an incoming scope edge whose

53 4. Code Generation

Identify Identify sub SIG scope graphs of edge pairs SIG

Apply Code mapping rules

Figure 4.1: Code generation framework. origin node corresponds to the message m of an M-R pair (m, r), then the scope edge is considered as the start of method scope and the same is pushed into a stack say ST . On the other hand, if traversal reaches a node with an outgoing scope edge ˆb whose origin node does not correspond to the message m of an M-R pair (m, r), then the scope edge ˆb is considered as the end of the method scope. Consequently, a scope edge, saya ˆ is popped from ST . Note that the scope edge pair (ˆa, ˆb) represents the subgraph of SIG which contains the model elements belonging to the same method scope and is enclosed bya ˆ and ˆb. This way, we identify all the other scope edge pairs and keep them in a list of scope edge pairs, LIST .

Example 4.1 Let us identify the scope edge pairs for our example SIG as shown in Fig. 3.7.

We start depth-first-search traversal of the SIG from the node V1. Since V1 has an outgoing scope edge (V1, V2) with its origin node (V1) corresponding to the message 1 of the M-R pair (1, 21), we push the scope edge into a stack ST . We find similar outgoing scope edge (V , CS ) at the node V and thus, push that scope edge, which 2 loop1 2 results in ST with two scope edges (V , CS ) and (V , V ). On the other hand, at 2 loop1 1 2 the node CE we find an outgoing scope edge (CE ,CS ) whose origin node (CE ) alt1 alt1 alt2 alt1 does not correspond to the message m of an M-R pair (m, r). Thus, we pop the edge (V , CS ) from ST , resulting a scope edge pair [(V ,CS ),(CE ,CS )]. Similarly, 2 loop1 2 loop1 alt1 alt2 we push the scope edge (V ,CS ) at the node V and pop the same while finding an 10 loop2 10

54 4.1. Our approach outgoing scope edge (CE ,CS ) at the node CE . This implies another scope edge alt4 alt5 alt4 pair [(V ,CS ),(CE ,CS )]. We insert these two scope edge pairs in LIST while 10 loop2 alt4 alt5 the scope edge (V1, V2) in ST remains unpaired. 

Step 2: Identifying the subgraph of SIG for a scope edge pair For each scope edge pair (ˆa, ˆb) in LIST , we identify the subgraph G′ of SIG such that the start node s and end node e of G′ are same as the end node of the edgea ˆ and ˆ start node of the edge b, respectively. So, the set of nodes NG′ and set of edges EG′ of ′ G are initialized as NG′ = {s, e} and EG′ = {}. EG′ is augmented by adding an edge

(other than scope edge) say tˆ̸∈ EG′ whose at least one connected node is in NG′ . As a consequence, a connected node of tˆ is added into NG′ if it is not already included.

The inclusion of an edge into EG′ and a node into NG′ continues until no more edge can be found, which implies that identification of the subgraph G′ is complete. After identification of G′, it is excluded from SIG by connecting the origin node ofa ˆ to the target node of ˆb and then removing the scope edge pair (ˆa, ˆb) from SIG. We continue to identify the subgraph of SIG corresponding to a scope edge pair in LIST until all scope edge pairs are processed. We refer each such subgraph of SIG as integration colony, which would be further processed in next chapter.

Example 4.2 Let us identify the subgraph G′ of SIG corresponding to the scope edge pair [(V ,CS ),(CE ,CS )] in our example SIG (see Fig. 3.7). In this case, CS and 2 loop1 alt1 alt2 loop1 CE are the start and end node of G′ as (V ,CS ) and (CE ,CS ) are incom- alt1 2 loop1 alt1 alt2 ′ ing and outgoing scope edges of G , respectively. The set of nodes NG′ and set of S E edges E ′ are initialized as N ′ = {C ,C } and E ′ = {}. In the first itera- G G loop1 alt1 G S E E tion, we augment these two sets as E ′ = {(C ,V ),(V ,C ),(V ,C )} and N ′ = G loop1 3 4 alt1 5 alt1 G S E S E {C ,V ,V ,V ,C }. In the next iteration, we obtain E ′ = {(C ,V ),(V ,C ), loop1 3 4 5 alt1 G loop1 3 3 loop1 S E S E S E S E (C ,V ), (V ,C ),(C ,V ), (V ,C )} and N ′ = {C ,V ,C ,C ,V ,V ,C }. alt1 4 4 alt1 alt1 5 5 alt1 G loop1 3 loop1 alt1 4 5 alt1 ′ Since no other edge can be included in EG′ , G (NG′ ,EG′ ) is the resultant final sub- graph as shown in Fig. 4.2(a). Similarly, we find other subgraphs of SIG as shown in Fig. 4.2(b) and (c), respectively. 

Step 3: Applying mapping rules From each subgraph G′ identified for a scope edge pair (ˆa, ˆb), we generate the code within class method whose start and end scope are marked bya ˆ and ˆb, respectively.

55 4. Code Generation

C S 1. Class BookRegister{ C S 1. Class UserRegister{ loop1 2. FindBook(BookID) loop2 2. FindUser(UserID) 3. { 3. { I < BookList.length & found=F 4. While (i

V 9 aU 20. } s S er != N 21. else l C u l u a lt5 ll N 22. { = r e s U 23. aUser = UR.FindU ser(UserID a ); V 1 5 24. if(aUser==null)

V 1 4 25. { C r e d itS id ta tu 26. IBB.Disp l S s = V layMessage(”Not a Valid User” ); a a lid V = C ! 27. } s a lt6 tu ta 28. else S V 1 7 it d e r 29. { C 30. CreditStatus= aUser.GetStatus(); V 1 6 31. if(C reditStatus!=Valid) V 1 8 32. { 33. IBB.D isplayMessage(”Not Credit Worthy”); 34. } 35. else V 1 9 E 36. { C a lt6 37. aUser.Upda teStatus(); V 2 0 38. a Book.SetStatus( E ”Issued”); 39. IssueRegister.AddRecord(UserID,BookID) C a lt5 40. IBB.D isplayMessage(”Book is b eing issued ”); E 41. } 42. } C a lt3 E 43. } C a lt 44. } 2 (c) 45. } 46. }

V 2 1 47. }

Figure 4.2: Sample code generated from SIG of Issue Book.

56 4.1. Our approach

For this, we propose a set of mapping rules which are presented in Table 4.1. We call this table Model-to-Code Map Table. Model-to-Code Map Table (Table 4.1) has three columns, namely MElement (model element), Branch, and CA (code arti- facts). MElement represents a model element such as control node (representing the start or end of a fragment), message node (representing a message) and predicate associated with an outgoing edge of a control node (representing the start of a frag- ment). Branch can be left, middle or right for a control node representing the start of an alt fragment. On the other hand, Branch is considered as null for the start of other fragments and it implies only one branch of corresponding control node. CA represents the mapped code artifacts. For example, the Rule 2 in Model-to-Code Map Table (see Table 4.1) specifies that whenever middle branch of a control node (rep- resenting the start of an alt fragment) of SIG be traversed, corresponding generated code artifacts (CA) would be “else if ( ”. Thus, these mapping rules specify how the model elements of G′ are to be mapped to the code artifacts. We have named our code generation algorithm GenerateCode. Taking a subgraph G′ of SIG as the input, this algorithm generates the code within a class method whose method scope contains the model elements of G′. For this, GenerateCode applies the set of mapping rules mentioned in Table 4.1 and uses the information of message nodes stored in Message Node T able (Table 3.3) synthesized during SIG construction. GenerateCode traverses G′ following depth-first-search traversal and maps the model elements into corresponding code artifacts in an output file, called < Code − file >. During traversal of G′, whenever GenerateCode reaches to a con- E trol node Cf (representing the end of a fragment f), the algorithm backtracks to the S control node Cf (representing the start of the fragment f) to explore the next branch S enumerated from Cf .

Example 4.3 Let us illustrate our code generation algorithm GenerateCode with the subgraph G′ shown in Fig. 4.2(a). First, we identify the class method for which the code is supposed to be generated from G′. Since all model elements in G′ (Fig. 4.2(a)) belong to the method scope of the message 2 (see Fig. 3.5), we find the necessary in- formation of corresponding message node V2 such as “BookRegister”(receiver class) and “FindBook”(method name) from Message Node T able (Table 3.3). The in- formation: receiver class and method name constitute first three lines of the code generated according to our algorithm (see Fig. 4.2(a)). Next, we apply code gener-

57 4. Code Generation

Table 4.1: Model to Code Map.

Rule MElement Branch CA S 1 Calt left if ( S 2 Calt middle else if ( S 3 Calt right else E } 4 Calt - S 5 Copt - if ( E } 6 Copt - S 7 Cloop - while ( E } 8 Cloop - S 9 Cbreak - if ( E } 10 Cbreak - return; 11 P (predicate) left P ) { 12 P middle P ) { 13 P right { 14 P - P ) {

15 Vi < SO,RO,RC,MT,PR,RVar > - RV ar = RO.MT (PR);

16 Vi < SO, RO, RC, −,PR, − > - return PR;

17 Vi < SO, RO, RC, −, −, − > - return ;

ation rules to the graph G′ starting from CS . For the node CS , we apply the loop1 loop1 Rule 7 of Model T o Map T able (Table 4.1) and obtain the partial code in the line 4 of Fig. 4.2(a)). Applying the Rule 14 for the predicate P = (i < BookList.length & found = F ) associated with the outgoing edge of CS , we obtain the code for the loop1 lines 4 and 5 of Fig. 4.2(a). To map V3 into corresponding code artifacts, we apply the Rule 15 (Table 4.1) using the information of V3 and generate code for the line 6 of Fig. 4.2(a). By applying the rules to the remaining nodes in G′, we obtain the rest of code for the method shown in Fig. 4.2(a). We then consider other two subgraphs and generate code in a similar way as shown in Fig. 4.2(b) and (c). With reference to Fig. 4.2(c), we may note that the predicate (!=issued & !=reserved) labeled to the right most outgoing edge (CS ,V ) of the node CS has been simplified as a single alt3 10 alt3 ’else’ (see the line 21). This is because, we first apply the Rule 3 for the node CS alt3

58 4.1. Our approach

Algorithm GenerateCode ′ ˆ Input : G < s, e, VG′ ,EG′ >, scope edge pair (ˆa,b), Model-to-Code Map, Message-Node-Table Output: Code file < Code − file > Begin Find the origin node n of the scope edgea ˆ; /*Print class name and method with parameter*/ Find the record < SO,RO,RC,MT,PR,RVar > corresponding to n in Message Node T able; Print(”Class”, RC,”{”) into < Code − file >; Print(MT , ”(”, PR, ”) {”) into < Code − file >; Initialize a stack ST =null. Initialize Curr Node = s and Next Node = null; while !IsEmpty(ST ) or Curr Node ≠ null do if Child(Curr Node) ≠ TOP (ST )) || Curr Node ≠ e then ; /*If children of Curr Node are not already stacked or end node is not reached*/ for each child node (Child Node) of Curr Node do P ush(ST, Child Node); end end Next Node= TOP (ST ); switch Curr Node do case Control Node Branch = CheckBranch(Curr Node, Next Node); /*CheckBranch returns left/right/middle depending on the position of Next Node on left/middle/right branch of Curr Node otherwise null is returned.*/ if Branch==Null then Find < CA > record corresponding to Curr Node in Model-to-Code Map. end else Find < CA > record corresponding to Curr Node for the value of Branch in Model-to-Code Map. end Add < CA > into < Code − file >; /*Generate code artifacts for the control node*/ Find the predicate P associated with < Curr Node, Next Node >; if P exists then Find < CA > corresponding to P for the value of Branch in Model-to-Code Map; Add < CA > into < Code − file >; /*Generate code artifacts for predicate associated with control node representing the fragment start*/ end end case Message Node Find the record < SO,RO,RC,MT,PR,RVar > corresponding to Curr Node from Message Node T able; Find < CA > record corresponding to Curr Node in Model-to-Code Map; Apply < CA > on < SO,RO,RC,MT,PR,RVar > and obtain the expression, say E; Add E into < Code − file >; /*Generate code artifacts for the data node*/ end endsw E if Curr Node==Cf then S ; /*Reset Curr Node if there exists a unexplored branch of Cf */ S if Contains(ST ,Child(Cf )) then S Curr Node= Cf ; P op(ST ); /*Discard next node from top of ST due to reset of Curr Node*/ end end else Curr Node = P op(ST ); end end Print(”}”) into < Code − file >; /*End of method MT */ Print(”}”) into < Code − file >; /*End of class*/ End

59 4. Code Generation

(which gives ‘else’) and then Rule 13 for the predicate (which gives ‘{’). 

We point out the following aspects of our proposed code generation approach.

• Given a sequence diagram, our approach supports code generation for multiple methods which may belong to different classes.

• Our approach applies code generation rules to the graph model (i.e. SIG). The advantage of using SIG is that it has explicit method scope information that helps us to easily identify the different class methods for which the code is to be generated.

• The generated code from SIG follows the syntax of Java programming language.

We now mention few limitations and assumptions regarding code generation from sequence diagrams.

• The generated code from SIG for class methods may not be the complete code, since all behavioral information necessary for complete code of class methods is not captured in a sequence diagram. For example, the code artifacts such as definitions/modifications/reading variables, initialization of references etc. are not captured in sequence diagrams and hence, are not generated from SIG.

• Recursive calls are modelled in sequence diagram as self messages. Code gener- ation of self messages are handled like other messages. However, base condition of recursive calls are usually not modelled in sequence diagrams and therefore, it needs to be manually added by programmers.

• Code is not generated for all class methods, but only for few class methods whose method scopes are captured in the sequence diagram. To capture method scope of a class method, there must have been atleast one message sending within its method scope. For example, code is not generated for GetStatus (in Fig. 3.5) as it does not send a message to an object within its method scope.

• Generated code is not completely executable as the type information of the return variables, predicate variables, local variables within class methods are missing. This is because, their type information are not specified in the sequence diagram and hence, can not be resolved at the time of code generation (e.g. i, found in Fig. 3.5).

60 4.2. Analysis and Results

• In cases where the actual object references can not be determined from analysis of sequence diagram, we assign ad hoc names to these object references which the programmer can resolve during code refinement. This of course will not affect the logic of the code.

• For our code generation approach to work, we assume that the sequence dia- grams must contain reply for those messages which send atleast one message in their method scopes. In fact, this assumption should be adhered by a complete and quality design.

• A sequence diagram may not contain all the interactions between the concerned objects. To flag warnings about missing interactions, class diagrams can not be used. This is because, from class diagrams we can say which class method can possibly call other class methods, but not the actual methods which would be called within a particular class method. Note that class diagram is meant to capture structure of software, not behavior of the same. However, state-chart diagrams can be used for this purpose. Because, this di- agram has provision to specify object-method invocations in do action part of object’s states. To flag missing interactions, it is necessary to check whether receiving object-state of a message (referred in sequence diagram) has some object-method invocations in do action part and those object-method invoca- tions are actually included in sequence diagrams or not.

• In case of loop fragment, our approach generates code for artifacts like loop condition, interactions modelled inside loop fragment. However, the generated code does not include loop increment operator since this kind of detailed in- formation is not modelled in sequence diagrams. To fill up such missing code artifacts, programmer’s intervention is required.

• Programmer needs to resolve several issues in the generated code. These include insertion of Java library call for the string comparison, try-catch block for the exception handling etc.

4.2 Analysis and Results

In this section, we analyze completeness of the generated code, time and space com- plexities of our code generation approach. Also, we substantiate the efficacy of our

61 4. Code Generation

Table 4.2: Types of statements within a class method.

Type of statement Generated from sequence diagrams 1. Local variable declaration × 2. Variable initialization × √ 3. Conditional/unconditional method calls 4. Variable definition/modification × 5. Exception handling constructs × approach with some case studies. Our analysis and experimental results are presented in the following.

4.2.1 Completeness of generated code

By the term ‘completeness’ of code for an application, we mean complete implemen- tation of all the class methods in the application, which include the code artifacts related to flow of control, object manipulation, and user interaction for each method [30]. To analyze completeness attribute of the generated code, we measure how much code has to be written manually to fill class methods after their partial code is gen- erated from sequence diagrams. As Java is our subject language, we need to know what possible types of statements can constitute a class method. With reference to the basic constructs of Java language [69, 70, 71], we find that the total five possible types of statements: local variable declaration, variable initialization, con- ditional/unconditional method calls, variable definition/modification, exception han- dling constructs may be contained within a class method (see Table 4.2). Let us investigate which types of statements out of those mentioned in Table 4.2 can be generated from UML models. We may note that messages and fragments are the constituent elements of UML 2.x sequence diagram and they can be mapped into unconditional and conditional method calls, respectively. In other words, we can use sequence diagrams to generate conditional/unconditional method calls in- cluding library calls (see Table 4.2). On the other hand, the statements like variable definition/modification can be generated from other UML models, specifically stat- echart diagrams. However, the statements such as local variable declarations and their initializations, exception handling constructs are difficult to generate from UML

62 4.2. Analysis and Results Table 4.3: Characteristics of programmers.

Experience Programmer Course enrolled Type Years

P1 Ph.D. Research 4

P2 M. Tech Industry 2

P3 M. Tech Industry 1

P4 B. Tech - -

diagrams, in particular from sequence diagrams. This is because, these constructs are unlikely considered while designing system behavior with higher level of abstrac- tion and hence, are not specified in UML designs. Therefore, the possible types of statements that can be generated from UML behavioral diagrams (e.g. sequence, statechart etc.) are the conditional/unconditional method calls, and variable defini- tion/modification. To evaluate the effectiveness of our code generation approach, we use the lines of code (LOC) metric [72]. For this, we need to estimate the actual lines of code (com- prising of conditional/unconditional method calls and variable definition/modification) required to implement a class method. In this regard, we may note that statements of types: declaration of local variables, their initialization, and exception handling are not being considered for measuring the actual lines of code of a class method. This is because, the modern IDE tools such as Eclipse [73], NetBeans [74] support the generation of those three types of statements and hence, their coding effort can be ignored. The determination of actual lines of code for a class method depends on skills and experience levels of the programmers involved in coding. To eliminate their biasness, we have considered the estimates taken from four programmers with varying skills and experiences. Table 4.3 shows nature of experience of four programmers. Note that all programmers had prior experience with using UML and Java. We have considered Restaurant Automation System (RAS)[75] as a case study. The RAS automates various functionalities of a restaurant such as order processing, bill processing, managing food items etc. We have prepared design specifications for RAS following SRS document and using MagicDraw 16.0 [66]. The design specifi- cation includes (1) one use case diagram consisting of seven use cases, (2) one class

63 4. Code Generation diagram, (3) seven sequence diagrams for seven use cases, (4) statechart diagrams of few classes. The RAS consists of three types of classes (1) controller classes (2) boundary classes, (3) entity classes. The names of use cases and three types of classes of RAS are presented in Table 4.4. The number of messages, fragments, and collabo- rative objects contained in seven sequence diagrams of RAS are shown in Table 4.5. A few important design artifacts of RAS is reported in Appendix A of this thesis. For experiments, each programmer was asked to implement incomplete code generated from the sequence diagrams of RAS using Java NetBeans IDE 6.1 [74] and following UML design specifications and to measure lines of code and relevant parameters. Coding effort of four programmers varied from 10 to 14 man-days. At the end of the experiments, the measures of the following parameters were collected.

1) Number of methods per class,

2) Number of methods per class whose partial code is generated,

3) Total lines of code per class required for each programmer,

4) Median and standard deviation for the estimates of third parameter of all pro- grammers,

5) Total lines of generated code per class, and

Table 4.4: Characteristics of RAS.

Use cases Classes Controller Entity Boundary Generate Statistics GenerateStatisticsController ItemRegister GenerateStatisticsForm Process Order ProcessOrderController PaymentRegister SelectOrderForm Deliver Order ManageItemController ChequePayment ManageItemForm Generate Bill MakeOrderController OrderRegister PaymentForm Manage Item RestaurentController BillRegister PaymentSlip Pay Bill Payment OrderForm Pay Bill Order MenuForm Bill Item

64 4.2. Analysis and Results

Table 4.5: The characteristics of seven sequence diagrams of RAS.

Use case #Messages #Fragments #Objects Make Order 10 0 5 Deliver Order 21 13 9 Manage Item 17 9 6 Process Order 34 18 9 Generate Bill 19 6 9 Generate Statistics 16 6 9 Pay Bill 23 5 12

6) Percentage of total lines of generated code per class.

Experimental results are shown in Table 4.6. Form Table 4.6, we can observe that the total 98 lines of code are generated from the sequence diagrams of RAS, filling up six methods of the class PrcocessOrderController partially. To complete the code for all methods of that class, it requires approximately 133 lines of code, which is the median value on the estimates of the total lines of code (125, 137, 143, 130) required for the four programmers. In other words, the lines of generated code estimated for the PrcocessOrderController class is 73.40%. Similarly, we find the estimates for the remaining classes as shown in Table 4.6. From Table 4.6, we have the following observations: (1) all controller classes such as ProcessOrderController, ManageItem- Controller, MakeOrderController, GenerateStatisticsController (except only Restau- rentController) have more than 34% lines of generated code. Note that the class RestaurentController has no generated code since this class is specially designed to start RAS and in fact, the sequence diagrams of RAS do not contain a message to an object of RestaurentController class; (2) boundary classes such as PaymentForm, OrderForm, MenuForm, PaymentSlip (except SelectOrderForm, GenerateStatistics- Form, ManageItemForm) have below 3% lines of generated code; (3) the entity classes such as Item, Bill, Order, Payment, ChequePayment, PaymentRegister have zero line of generated code. However, a few entity classes such as ItemRegister, BillRegister, OrderRegister have some generated code. These observations are also evident from the chart shown in Fig. 4.3. To evaluate relative performance of code generation for three types of classes, we have estimated the total LOC required, total LOC generated, and % of total LOC

65 4. Code Generation laent htteata ieo u ytm(A)i . KLOC. 4.6 is (RAS) system our of size actual the that note Please h oa O pcfidi Table in specified LOC total The Item OrderRegister BillRegister PaymentForm RestaurentController GenerateStatisticsController MakeOrderController ManageItemController PaymentRegister ChequePayment Payment Order Bill ItemRegister PaymentSlip MenuForm OrderForm GenerateStatisticsForm ManageItemForm SelectOrderForm ProcessOrderController name Class methods # 18 11 11 11 1 2 4 9 4 3 5 2 6 5 5 5 8 4 2 5 9 4.6 al 4.6: Table ersnstettllnso eairlcd ecuigsrcua oepu uognrtdcode). auto-generated plus code structural (excluding code behavioral of lines total the represents ehd with methods # e.code Gen. 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 0 3 1 3 6 eut fcd eeainfrRSsystem. RAS for generation code of Results [P1] 125 15 41 19 25 10 12 13 10 36 66 68 33 26 61 27 10 25 4 5 9 [P2] oa LOC Total 137 17 48 22 34 12 14 16 12 41 60 75 39 24 68 42 12 22 5 7 8 [P3] 143 18 49 24 29 10 10 18 12 14 39 62 80 44 28 58 36 10 31 6 4 [P4] 130 13 38 23 30 15 11 20 18 11 42 76 63 31 25 70 29 15 27 7 8 ein[A] Median oa LOC Total 133 44 22 29 14 11 71 25 64 32 5 9 16 11 16 40 64 36 11 26 6 ...... 5 5 . 5 5 5 5 5 5 5 5 5 5 oa LOC Total t.dev. Std. 1 1 2 5 2 3 2 1 3 2 1 2 7 7 5 1 5 6 2 3 7 ...... 29 82 21 35 16 69 36 29 65 75 70 64 11 50 90 70 67 85 36 77 88 e.LOC Gen. [B] 14 98 0 0 0 0 0 0 6 6 6 0 0 1 6 7 1 3 0 8 9 fGn LOC Gen. of % ( = B 54 63 11 43 72 34 73 0 0 0 0 0 0 37 0 0 9 9 2 0 / 2 ...... A ...... 00 00 00 00 00 00 00 00 37 79 77 00 . 54 15 76 07 72 61 40 5 5 )*100

66 4.2. Analysis and Results

Figure 4.3: A comparison of lines of generated code for different classes of RAS.

generated together for all controller, boundary, and entity classes as reported in Ta- ble 4.7. From Table 4.7, we can observe that 48% of total LOC of controller classes is generated from sequence diagrams, whereas for boundary and entity classes, it is approximately 6% and 11%, respectively. While investigating the reasons for such dif- ference, we find that 48% of lines of code of the controller class methods are message invocations or statements specifying their conditions which are captured in sequence diagrams. On the other hand, overall percentage of the generated code for the entity classes is far less compared to the controller classes. Based on the performance of code generation of the entity classes, we divide them into two categories. In the first category, a few entity classes such as ItemRegister, BillRegister, OrderRegister have moderate amount of generated code. After through study of these three classes, we

Table 4.7: Results of generated LOC for three types of classes in RAS.

Group of classes Total LOC Total LOC generated % of total LOC generated Controller classes 268 129 48.13 Boundary classes 263 18 6.84 Entity classes 161 18 11.18

67 4. Code Generation

find that they implement logic for object selection based on user input by delegating method calls to other domain object, which is captured in the sequence diagrams of RAS and that is why, they have generated code. In the second category, the entity classes such as Item, Bill, Order, Payment, ChequePayment, PaymentRegister have zero line of generated code since they manipulate or redefine their attributes only, which is not captured in the sequence diagrams. Further, sequence diagrams designed for use cases are not used to model detailed interactions with boundary objects such as what would happen to swing components after receiving a user input, or how in- formation will be visualized by manipulation of swing components etc. This is the main reason for the lower percentage of LOC generated for the boundary classes. Our results indicate that code generation from sequence diagrams of use cases is more effective for controller classes compared to entity and boundary classes. This observation is justified even for large systems, which usually contain large number of use cases (that can be complex too) and hence, have less percentage of generated code. Even in that case, majority of messages in sequence diagrams would belong to controller class methods rather than entity and boundary class methods and as a result, controller classes would have significantly more generated code than boundary and entity classes. We now determine abstraction level of sequence diagrams and thereafter, evaluate performance of code generation with different levels of abstraction. To determine the level of abstraction, we consider the characteristics of the sequence diagrams such as number of objects (NO) involved, and number of messages (NM) exchanged among collaborating objects. Note that higher is the value of the product of NO and NM of a sequence diagram, lower is the level of abstraction of the sequence diagram and vice-versa. For our design level sequence diagrams (also referred to DSD) of RAS, we have computed the values of NO, NM, NO × NM and the percentage of generated code as shown in Table 4.8. In Table 4.8, we can observe that the percentage of generated code varies from 16% to 67% depending on code size and abstraction levels of the sequence diagrams. To quantify abstraction level of our sequence diagrams with respect to the code, we need to consider reverse engineered sequence diagrams (referred to RSD). To obtain such reverse engineered sequence diagrams, we consider those methods of controller classes which correspond to some message in our sequence diagrams under exper- iments. For each use case, we have reverse engineered one controller class method except Manage Item and Generate Statistics for which we have used three class meth-

68 4.2. Analysis and Results

Table 4.8: Performance of code generation from design level sequence diagrams (DSD) of RAS.

Use Case Characteristics of DSD Total LOC Gen.LOC %Gen. #DSD NO NM NO × NM LOC Make Order 1 5 10 50 25 9 36 Process Order 1 9 41 369 93 58 62 Deliver Order 1 9 23 207 54 36 67 Manage Item 1 5 14 70 108 17 16 Generate Bill 1 9 19 171 44 26 59 Pay Bill 1 12 24 288 118 28 24 Gen. Statistics 1 8 17 136 44 17 39

Table 4.9: Performance of code generation from reverse engineered sequence diagrams (RSD) of RAS.

Use Case Characteristics of RSD Total LOC Gen.LOC %Gen.LOC #RSD NO NM NO × NM LOC Make Order 1 7 20 140 25 22 88 Process Order 1 14 50 700 93 71 76 Deliver Order 1 11 26 286 54 41 75 Manage Item 3 17 29 493 108 39 36 Generate Bill 1 14 32 448 44 42 95 Pay Bill 1 17 49 833 118 58 49 Gen. Statistics 3 12 28 336 44 31 70 ods. The characteristics of RSD and their percentage of generated code are shown in Table. 4.9. Since RSD additionally contains library calls compared to DSD (design level sequence diagrams), the value of the product NO × NM and percentage of generated code for RSD should be higher than that of DSD, which is also evident from two charts shown in Fig. 4.4 and 4.5. To measure abstraction level of our sequence diagram (DSD) with respect to RSD, we define a term named as Sequence Diagram Abstraction Level (SDAL) in the fol-

69 4. Code Generation

Design level sequence diagram (DSD) Reverse engineered sequence diagram (RSD)

900 800 700 600 500 400 NO*NM 300 200 100 0 Make Order Process Deliver Manage Generate PayBill Generate Order Order Item Bill Statistics

Figure 4.4: Comparison of design details (NO × NM) in DSD and RSD of RAS.

lowing NO × NM SDAL = 1 − D D (4.1) NOR × NMR where NOD and NMD be the number of objects and messages in DSD; NOR and

NMR be the number of objects and messages in RSD.

When abstraction level of sequence diagram (DSD) is as much lower as in reverse engineered sequence diagram (RSD) (i.e. NOD=NOR and NMD=NMR), the value of SDAL becomes zero. For a sequence diagram with higher value of SDAL, the product of NOD and NMD must be less than the product of NOR and NMR, which implies less detailed information as well as less amount of generated code for DSD compared to RSD. We compute the values of SDAL for our sequence diagrams (DSD) of RAS as shown in Table. 4.10. For example, sequence diagram of Make Order has SDAL value as 0.64, which signifies that its level of abstraction is 64% higher than corresponding reverse engineered sequence diagram. From Table. 4.10, we find that five sequence diagrams (i.e. DSD) of RAS are of 60% or more higher level of abstraction than the corresponding reverse engineered sequence diagrams (RSD) and their percentage of generated code vary from 16% to 59%.

70 4.2. Analysis and Results

Design level sequence diagram (DSD) Reverse engineered sequence diagram (RSD)

100% 90% 80% 70% 60% 50% 40% 30% Generated code 20% 10% 0% Make Order Process Deliver Manage Generate PayBill Generate Order Order Item Bill Statistics

Figure 4.5: Comparison of code generation from DSD and RSD of RAS.

Table 4.10: Performance of code generation with different levels of abstraction in DSD of RAS.

Use Case SDAL % Gen. Code Make Order 0.64 36 Process Order 0.47 62 Deliver Order 0.28 67 Manage Item 0.86 16 Generate Bill 0.62 59 Pay Bill 0.65 24 Generate Statistics 0.60 39

4.2.2 Complexity analysis

We analyze two tasks: SIG construction and code generation from SIG to compute time and space complexities of our proposed approach. For this, we use the following parameters of a sequence diagram.

71 4. Code Generation

m = Number of messages. f = Number of fragments. n = Number of model elements = m + f.

k1 = Number of meta objects per message.

k2 = Number of meta objects per fragment.

Time complexity The number of basic operations required to accomplish the individual steps of our approach are the following.

SIG construction

Step 1: Identification of message sets, reply message sets, fragment sets, mes-

sage sets of all fragments requires the number of basic operations = m × k1 + 2 m × k1 +f × k2 + m × f = O(n × n)= O(n ).

Step 2: To map m messages and f fragments into nodes of SIG, the number of operations required = m + f = O(n).

Step 3: Determination of M-R pairs and their message sets needs to compare each message with other messages and thus, the number of basic operations required = m × m = O(n2).

Step 4: To find fragment structure, each fragment has to be checked whether it is contained in another fragment. Thus, the required number of operations = f × f = O(n2).

Step 5: Mappings of precedence relations into edges of SIG requires the number of basic operations = p × n = O(n) where p is some constant and p << n.

Step 6: Edge labeling requires the checking guard conditions of f fragments and thus, number of required operations = p × f = O(n) where p is some constant and p << n.

Step 7: For identifying scope edges, the required number of operations = p × n = O(n).

Code generation from SIG

72 4.2. Analysis and Results

Step 1: Identification of scope edge pairs requires depth-first-search traversal of SIG and therefore, the number of operations required = m + 2 × f + p × n = (p + 1) × n + f = O(n)[76].

Step 2: The number of operations required to identify subgraphs of SIG for all scope edge pairs = m + 2 × f = O(n).

Step 3: To map model elements of SIG into code artifacts, total m + 2 × f nodes and p × n edges have to be processed. Thus, the required number of operations = m + 2 × f + p × n= O(n).

Therefore, time complexity of our code generation approach = O(n2).

Space complexity The space requirement to accomplish two tasks of our approach is mentioned in the following.

SIG construction

For SIG construction, the elements that need to be stored are as follows: (i)

m × k1 meta objects for m messages, (ii) f × k2 meta objects for f fragments, (iii) m message objects, (iv) f fragment objects, (v) total m elements for mes- sage sets of fragments, (vi) total l × m elements for the message sets of l M- R pairs, (vii) p × n precedence relations, (viii) total m + 2 × f nodes and p × n edges of SIG. Therefore, the total space required for SIG construction

= m × k1 + f × k2 + m + f + m + l × m + p × n + m + 2 × f + p × n

= (k1 + 3 + l) × m + 2 × p × n + (k2 + 3) × f

= (12 + 3 + l) × m + 2 × p × n + (2 + 3) × f for k1 = 12, k2 = 2 = O(n) as m + f = n and (2 × p) << n.

Code generation from SIG

For code generation, we use a stack to store unexplored children of the nodes during depth-first-search traversal of SIG. Therefore, the required space = m + 2 × f = O(n)[76].

Thus, space complexity of our approach is O(n).

73 4. Code Generation

Table 4.11: Memory and performance analysis of our approach.

Use Case Memory (in kB) Time (ms) Make Order 6.85 287.86 Process Order 20.71 3155.73 Deliver Order 14.93 1389.78 Manage Item 11.38 729.30 Generate Bill 11.20 917.83 Pay Bill 11.83 1221.08 Generate Statistics 10.75 796.06

4.2.3 Memory and performance analysis

We have performed memory and performance analysis of our code generation ap- proach by using Java profiling technique. For this, we have used Eclipse Test & Performance Tools Platform (TPTP) [77]. Our experiment was carried out in the system with i3 processor (2.40 GHz), 4GB RAM, and Windows 7 operating system. The profiling results obtained for different use cases of RAS are shown in Table 4.11.

4.3 Comparison with related work

We now compare our work with the existing work [31, 40, 30, 78] which closely resemble our work. Jakimi et al. [31] use a global single sequence diagram capturing the behavior of the entire system for code generation. The differences between our approach and Jakimi et al.’s approach [31] are as follows. First, our approach uses fragments to model conditional messages in sequence diagrams following UML 2.x syntax, whereas Jakimi et al. [31] use UML 1.x syntax to model those. Second, our approach uses SIG to handle method scope information. On the other hand, Jakimi et al. [31] have not reported how to handle method scope information which is necessary for generation of code of different class methods. For behavioral code generation, Thongmak and Muenchaisri [40] use only UML 1.x sequence diagrams, whereas Usman and Nadeem [30], Parada et al. [45] use UML 2.0 sequence, class, and activity diagrams. The differences between our approach and

74 4.4. Conclusion the existing approaches of Thongmak et al. [40], Usman et al. [30], and Parada et al. [45] are as follows. First, our sequence diagrams contain the messages belonging to multiple class methods. On the other hand, sequence diagrams that have been used in the existing work [40, 30, 45] contain the messages belonging to single class method. In practice, sequence diagrams that are designed to model the behavior of complex use cases [46, 47] usually contain messages belonging to multiple class methods. In such case, our code generation approach is most suitable since the existing techniques [40, 30, 45] have ignored method scope information. Second, our approach supports code generation for multiple methods which may belong to different classes, whereas the existing work [40, 30, 45] support code generation for single class method. Third, our approach applies code generation rules to graph model, whereas the existing work [40, 30, 45] apply the rules to meta models. The advantage of using our graph model is that it has explicit method scope information that helps us to identify easily different class methods for which code has to be generated. On the other hand, it is difficult to identify such class methods from meta-models.

4.4 Conclusion

We have proposed an approach to generate the behavioral code (code within class methods) from UML 2.x sequence diagrams. For this, we have used a graph model called SIG, which subsumes control flows and additionally contains method scope information of interactions. The explicit method scope information present in our graph model helps to identify different class methods for which code has to be gener- ated. Our experimental results show that approximately 48% lines of code for controller classes of RAS could be generated from design level sequence diagrams, whereas for boundary and entity classes, it is 6% and 11%, respectively. Our results indicate that code generation from sequence diagrams designed for use cases is most effective for controller classes compared to entity and boundary classes. Note that sequence diagram is not meant to model detail interactions with boundary objects and hence, the proposed code generation approach is not suitable for boundary classes. On the other hand, instead of sequence diagrams, statechart diagrams would be appropriate to generate code for entity classes. Apart from code generation, our graph model (SIG) can be used for determining

75 4. Code Generation effective coverage of MM paths, which we shall discuss in the next chapter.

76 Chapter 5

Control of MM Path Coverage

An MM path represents interleaved sequence of messages and method execution paths. The coverage of MM paths is important so far integration testing of object-oriented systems is concerned. Object-oriented systems typically have many MM paths, where multiple MM paths are included in single execution path. In practice, all execution paths can not be taken up for testing and therefore, test engineers often have no choice but to select a subset of all execution paths either in ad-hoc manner or based on their intuitive priorities. It can be possible that the selected execution paths may not include all MM paths or include a less critical MM paths multiple times, but may not include a highly critical MM path not even once. That is why, controlling the coverage of MM paths is necessary while selecting a subset of all execution paths. In this chapter, we have aimed to address the above mentioned problem. We use Sequence Integration Graph (SIG) to synthesize MM paths and thereafter, merge them into a coverage model capturing call relationships among MM paths and priority information. This coverage model is used to select a subset of all execution paths after determining the effective coverage of their underlying MM paths. Our experimental results substantiate the usefulness of our coverage model for selecting a subset of all paths.

5.1 MM paths for sequence diagram

In this section, we first discuss the concept of MM paths in general and then for sequence diagram. In object-oriented systems, objects collaborate each other through exchange of

77 5. Control of MM Path Coverage

O2 On O1 mn() m1() m2() m3 s2 mn sn m1 s1 m2

’ ’ ’’ p 2 p 2 ’’’ pn p 1 p 2 ” p 1

e1 en e2

Figure 5.1: Example of MM path. messages. Whenever an object receives a message, a path in the corresponding method is executed. During execution of the path, at some point the object may send a mes- sage to another object, which in turn to other object and so on. As a result, an execution path of an object-oriented system spans across multiple object methods, thereby resulting an interleaved sequence of messages and method execution paths. Such a path is called an Method-Message path and is abbreviated as MM path [33, 34].

Example 5.1 Let us illustrate the concept of MM path using an example shown in Fig. 5.1. In Fig. ′ ′′ 5.1, we can see that two possible execution paths < s1, p1, e1 > and < s1, p1 , e1 > can occur when the object O1 receives the message m1. Note that s1 and e1 are the ′ ′′ start and end points of the method m1(), whereas p1 and p1 are the points on two ′ paths of the method m1(). Consider the execution of the path < s1, p1, e1 > where ′ at the point p1 the object O1 sends the message m2 to the object O2, which in turn sends to the object O3. In this fashion, messages are sent from one object to the next object until it is received by the object say On. Note that the execution path → ′ → → ′′′ → · · · → → m1 < s1, p1 > m2 < s2, p2 > mn < sn, en > represents an inter- leaved sequence of messages and method execution paths and is an MM path. 

In practical situations, MM paths can be arbitrary long. In effect, even if we know that some MM path has fault(s), locating faults in such a lengthy MM path becomes difficult. In order to facilitate in fault localization process, it is necessary to

78 5.1. MM paths for sequence diagram

O1 O2

m1() m2() s2 m1 s1 m2

’ ’ ’’ p 2 p 2 ’’’ p 1 p 2 ” p 1

e1 e2

Figure 5.2: Example of redefined MM path. reduce the length of MM path. One possible way is by considering one message and its corresponding method execution path. With this consideration, an MM path is specified as sequence of message and corresponding method execution path.

Example 5.2 Let us illustrate redefined MM path by using an example shown in Fig. 5.2. In

Fig. 5.2, we see that for the message m1, there are two possible method execu- ′ ′′ tion paths < s1, p1, e1 > and < s1, p1 , e1 > and hence, two MM paths namely M1= → ′ → ′′ m1 < s1, p1, e1 > and M2= m1 < s1, p1 , e1 >. Similarly, for the message m2, we → ′ → ′′ find three MM paths namely M3 = m2 < s2, p2, e2 >, M4 = m2 < s2, p2 , e2 >, → ′′′ and M5 = m2 < s2, p2 , e2 >. Depending on run time execution conditions for the

MM path M1, one among three MM paths M3, M4, M5 is executed. 

In sequence diagram, method execution path p is modeled by means of sequence of messages occurring along p. A message sequence say m1 → m2 → · · · → mn is called an MM path, if m2 → · · · → mn represents a sequence of messages occurring in the method scope of the message m1. The MM paths in a sequence diagram can be obtained from SIG (discussed in Chapter 3 of this thesis) of the sequence diagram. With reference to an SIG, an MM path is defined as sequence of model elements occurring between the start and end of a method scope.

79 5. Control of MM Path Coverage

D1

ObjectA ObjectB ObjectC ObjectD D2

m1 m2 D3 m3

m4 S alt1 alt C alt1 S C1 ! C1 m5 C alt2 C2 ! C2 D4 D5 C S alt3 E alt alt C3 C 2 m6 ! C3 alt1 D6 D 7 D8

alt3 alt m7 C E alt3 m8 C E alt2 m9

D9

(a) A sequence diagram. (b) SIG.

Figure 5.3: An example sequence diagram and corresponding SIG.

Example 5.3 We illustrate the enumeration of MM paths from an SIG (Fig. 5.3(b)) of a sequence diagram (Fig. 5.3(a)). In Fig. 5.3(b), D1, D2, ··· , D9 represent the message nodes for the messages m1, m2, ··· , m9, whereas (CS , CE ), (CS , CE ), (CS , CE ) alt1 alt1 alt2 alt2 alt3 alt3 represent the pairs of control nodes for the start and end of the fragments namely alt , alt , alt . We may note that D → CS → D → CE and D → CS → 1 2 3 3 alt1 4 alt1 3 alt1 D → CE are two possible execution sequences of model elements, which can occur 5 alt1 as a consequence of execution of the message 2 (corresponds to the message node D ). This results in two MM paths: M = D → D → CS → D → CE and 2 1 2 3 alt1 4 alt1 M = D → D → CS → D → CE . Similarly, for the message 1 (corresponds to 2 2 3 alt1 5 alt1 the message node D ), we find three MM paths: M = D → D → CS → D → 1 3 1 2 alt2 6

80 5.2. Our approach

CE → D , M = D → D → CS → CS → D → CE → CE → D , and alt2 9 4 1 2 alt2 alt3 7 alt3 alt2 9 M = D → D → CS → CS → D → CE → CE → D . 5 1 2 alt2 alt3 8 alt3 alt2 9 The SIG shown in Fig 5.3(b) has total six paths (P1, ··· , P6). Each of these paths includes multiple MM paths. For example, the path P1 includes two MM paths namely M1 and M[ 4. ] P = D → D → D → CS → D → CE → CS → CS → D → CE → CE → D 1 1 2 3 alt1 4 alt1 alt2 alt3 7 alt3 alt2 9 P = D → D → D → CS → D → CE → CS → CS → D → CE → CE → D 2 1 [ 2 3 alt1 5 alt1 ] alt2 alt3 7 alt3 alt2 9 P = D → D → D → CS → D → CE → CS → CS → D → CE → CE → D 3 1 2 3 alt1 4 alt1 alt2 alt3 8 alt3 alt2 9 P = D → D → D → CS → D → CE → CS → CS → D → CE → CE → D 4 1 [ 2 3 alt1 5 alt1 ] alt2 alt3 8 alt3 alt2 9 P = D → D → D → CS → D → CE → CS → D → CE → D 5 1 2 3 alt1 4 alt1 alt2 6 alt2 9 P = D → D → D → CS → D → CE → CS → D → CE → D 6 1 2 3 alt1 5 alt1 alt2 6 alt2 9

Further, M1 is only included in the paths P1, P3 and P5. That means, if we select the subset {P2, P4, P6}, then the MM path M1 would not be covered. Therefore, the issue arises regarding the selection of a subset of all paths such that their underlying MM paths are effectively covered. 

5.2 Our approach

Our approach for selecting a subset of all paths consists of three steps as shown in Fig. 5.4. Given SIG as an input, we first identify MM paths. We then capture call relationships among MM paths as well as their priority information into a coverage model called MM Coverage Model. This MM Coverage Model is used to determine the effective coverage of MM paths following two coverage criteria and accordingly, generate a subset of all paths. To illustrate our concept, we use an example of Generate Show Statistics sequence diagram shown in Fig. 5.5.

5.2.1 MM path generation

We have already mentioned that an MM path is a sequence of model elements from the start to end of a method scope. To generate MM paths from a given SIG, we need to identify subgraphs of the SIG, whose model elements belong to the same method scope. We name each such subgraph of SIG as an integration colony.

81 5. Control of MM Path Coverage

Construct Generate MM SIG MM paths MM paths coverage model

coverage model

Subset of Coverage all paths Generate paths criteria

Figure 5.4: A framework for selecting a subset of all paths in a sequence diagram.

For MM path generation, we follow two steps as mentioned below.

1. Identify integration colonies.

2. Transform integration colony paths into MM paths.

Step 1: Identify integration colonies An integration colony is characterized by a pair of incoming and outgoing scope edges representing the start and end of a method scope. To identify integration colonies, we need to find such edge pairs and corresponding subgraphs enclosed by the scope edge pairs. We now discuss these two tasks in the following. a) Finding a list of scope edge pairs Integration colonies can be nested, that is, an integration colony may contain other, which may contain another and so on. It is therefore necessary to identify scope edge pairs for the integration colonies starting from the innermost one to the next outer level. For this, we follow depth-first-search of SIG. While traversing SIG, if a node is found with an incoming scope edge whose origin node corresponds to the message m of an M-R pair (m, r) (see the section 3.1 of the chapter 3 of this thesis), then the scope edge is considered as the start of method scope and thus, is

82 5.2. Our approach

interaction GenerateShowStatistics[ GenerateShowStatistics ]

<> <> <> <> <> <> <> : StatisticsBoundary : StatisticsController : CommissionRegister CommissionList[i] : Commission aScreen : Screen : FundRegister : Fund

1: GenerateShowStatistics(==Option, =aShowTitle, =Month)

opt-1 opt [Option=Commission] 2: TotalCommission = ComputeTotalCommission()

loop-1 loop [i

4:

5: DisplayCommission(Msg="Total Commission paid is", =TotalCommission)

opt-2 opt [Option=MovieSale] 6: TotalShowSale = ComputeTotalShowWiseSale(==aShowTitle):"" loop-2 loop [i< FundList.length] 7: Flag = IsSaleofShow(=aShowTitle) opt-3 opt [Flag=true] 8: GetFundAmount()

9:

10: DisplayMessage(Msg="Total Sale for the show", =TotalShowSale)

opt-4 opt [Option=MonthSale] 11: TotoalMonthSale = ComputeTotalMonthWiseSale(==Month) loop-3 loop [i

14:

15: DisplayCommission(Msg="Total Sale for the month", =TotalMonthSale)

16:

Figure 5.5: Sequence diagram of Generate Show Statistics use case. 83 5. Control of MM Path Coverage pushed into a stack say ST . On the other hand, if traversal reaches a node with an outgoing scope edge ˆb whose origin node does not correspond to the message m of an M-R pair (m, r), then the scope edge ˆb is considered as the end of method scope. Consequently, a scope edge, saya ˆ is popped from ST . Here, (ˆa, ˆb) is a scope edge pair since they represent the start and end of the same method scope. Similarly, we identify other scope edge pairs and add them into a list of scope edge pairs, LIST .

Example 5.4 For the SIG of Generate Show Statistics shown in Fig. 5.6, we find three scope edge pairs namely, [(D ,CS ), (D , D )], [(D ,CS ), (D , D )], 2 loop1 4 5 6 loop2 9 10 [(D ,CS ), (D , D )], while the scope edge (D , CS ) remains unpaired.  11 loop3 14 15 1 opt1 b) Finding subgraphs of SIG For each scope edge pair (ˆa, ˆb) in LIST , we identify the subgraph G′ of SIG such that the start node s and end node e of G′ are the same as the target node of the ˆ edgea ˆ and origin node of the edge b, respectively. So, the set of nodes NG′ and set ′ of edges EG′ of G are initialized as NG′ = {s, e} and EG′ = {}. EG′ is augmented by adding an edge (other than scope edge) tˆ ̸∈ EG′ whose at least one connected node is in NG′ . As a consequence, a connected node of tˆ is added into NG′ if it is not already included. The inclusion of an edge into EG′ and a node into NG′ continues until no more edge can be found, which implies that identification of the subgraph G′ is complete. Note that G′ represents the integration colony for the scope edge pair (ˆa, ˆb). After identification of G′, it is excluded from SIG by connecting the origin node ofa ˆ to the target node of ˆb and then removing the scope edge pair (ˆa, ˆb) from SIG. We repeat the above procedure until all scope edge pairs in LIST are processed. In this way, we identify a set of integration colonies and subsequently, exclude them from SIG. This results in a residue graph of SIG, which we call as main integration colony (MIC). Please note that these two steps (a) finding a list of scope edge pairs and (b) finding subgraphs of SIG are same as the step 1 and 2 of code generation approach, respectively (see the preceding chapter). These two steps are common between MM paths generation and code generation approaches and hence, are combined in imple- mentation (see integrated framework in section 3.6).

Example 5.5: With reference to the SIG of Generate Show Statistics, we find four integration colonies: ICa, ICb, ICc, and ICd (see Fig. 5.6). Note that ICa is the

84 5.2. Our approach

D1

S S Cloop1 Copt1 D1

D3 D2 S Copt1 E Cloop1 D5 D2

D4 E D5 Copt1

S Cloop 2 S E Copt1 Copt2

D 7 S D6 Copt2

S Copt D 3 6 S S D10 C Cloop2 loop3 S D10 D8 Cloop3 E D7 D12 Copt2 E E D C C 12 opt2 opt3 S S S C opt3 Copt5 Copt4 S S E C Copt5 opt4 Cloop2 D8 D13 D11 S D11 D13 Cloop1 D9 C E C E D15 opt3 opt5 D D E 15 3 Copt5 E E E E Cloop2 C Copt4 E loop3 Cloop1 Copt4 E Cloop3 D 16 D D9 D14 D16 4

D14 ICa ICb ICc ICd

(a) SIG (b) Integration colonies

Figure 5.6: SIG and four integration colonies for Generate Show Statistics use case. main integration colony obtained after excluding ICb, ICc, ICd from SIG. 

Step 2: Transform integration colony paths into MM paths: In the previous step, we have identified all integration colonies for an SIG. We now discuss how to transform integration colony paths into MM paths in the following.

85 5. Control of MM Path Coverage a) Generating execution sequences of model elements We enumerate all basic paths from each integration colony using GenerateBasicPath- Set algorithm [79]. This algorithm uses a stack to store unvisited children of a node while traversing SIG in depth-first-search manner. When traversal reaches to an end node, generation of one basic path completes. Subsequently, the algorithm continues to generate other basic paths by backtracking to a node until its child node becomes the top element of stack. The above procedure is repeated until the stack becomes empty. The GenerateBasicPathSet algorithm is presented below.

Algorithm GenerateBasicPathSet Input : SIG (Sequence Integration Graph) Output: List of basic paths (ListOfBasicP aths) Set Current Node as the start node of SIG; Set Stack as null; Set Current Basic Path as empty; while Stack is not empty or Current Node is not null do Add Current Node into Current Basic Path; /* Visit the node in DFS traversal and construct basic path */ while Current Node is not end node of SIG do Store unvisited child of Current Node into Stack; Pop a child node from Stack and add it into Current Basic Path; Set Child Node as Current Node; end /* Generation of Current Basic Path is complete */ Add Current Basic Path into ListOfBasicP aths; Copy Current Basic Path into Next Basic Path; /* Continue the generation of Next Basic Path */ Backtrack from end node of Next Basic Path to a node until its Child Node is in top of stack and discard trailing parts of backtracked node; Pop a node from Stack and assign to Current Node; Set Next Basic Path as Current Basic Path; end

Note that these paths actually represent execution sequences of model elements belonging to the same method scope. But, problem arises when integration colonies have few control flows which are not explicit. We refer such control flows as implicit control flows (see dotted lines and crosses in Fig. 5.7), which may also cause some hidden paths.

Example 5.6

86 5.2. Our approach

D1 D1 D1

S C S S opt C C break loop C C C

D2 D2 D2

CE CE CE opt break loop

D3 D3 D3

a) opt b) break c) loop

Figure 5.7: Three graph models with implicit control flow.

We now look at few example graph models with implicit control flow. In Fig. 5.7, we can observe that first graph model (Fig. 5.7(a)) has implicit control flow S E → S → E → (Copt, Copt) and hence, a hidden path m1 Copt Copt m3, that is the path without executing opt fragment. On the other hand, the second graph model (Fig. E 5.7(b)) has implicit information such as stopping control flow at Cbreak, transfer- S ring control flow from Cbreak to m3, which correspond to two paths: one with ex- ecuting break fragment and another without executing it. They are actually two → S → → E → S → hidden paths: m1 Cbreak m2 Cbreak and m1 Cbreak m3. Similarly, E S third graph model (Fig. 5.7(c)) has implicit control flows such as (Cloop,Cloop) and S (Cloop, m3). Taking these control flows into account, we find actual execution path as → S → → E → S →  m1 Cloop m2 Cloop Cloop m3.

To generate hidden paths and transform generated paths into actual execution se- quences of model elements, we propose three rules R1, R2, and R3, which are pre- sented in Table 5.1. They are applied to the paths containing a fragment of types: loop, break, and opt, respectively. Applying R1, we add two edges: loop back edge E S S E (Cloop, Cloop), loop exit edge (Cloop, b) in a path and delete the edge (Cloop, b) from E the path. On the other hand, R2 reduces Cbreak in a path as sink node and introduces another path by excluding break fragment. In the same way, R3 introduces a new

87 5. Control of MM Path Coverage Table 5.1: Three rules.

S E Cf , Cf : Start and end nodes of a fragment f a, b : Two messages/fragment nodes S −→c S Cf a : An edge from Cf to a labeled with predicate c S S Cf a : Add an edge from Cf to a E 9 E Cf a : Delete an edge from Cf to a

S −→c E → ⇒ E S !c E 9 R1 :(Cloop a & Cloop b) (Cloop Cloop b & Cloop b)

S −→c E → ⇒ E 9 || R2 :(Cbreak a & Cbreak b) (Cbreak b) S 9c E 9c S !c (Cbreak a & Cbreak b & Cbreak b)

S −→c → E ⇒ S !c E S 9 9 E R3 :(Copt a & b Copt) (Copt Copt & Copt a & b Copt)

path by excluding the opt fragment. Note that R2 and R3 may need to be applied repeatedly in order to exclude each break/opt fragment from a path. b) Transform execution sequences of model elements into MM paths Let P be an execution sequence of model elements enabled by a message m and V be the corresponding message node. We transform P into an MM path say P ′ after concatenating V to P , which is symbolically represented as P ′ ⇒ V → P

In case of main integration colony, V is the start node and is already included in P . Therefore, this mapping is redundant for the main integration colony. Please note that execution sequences of model elements which we obtain from integration colonies are essentially basis paths and hence, their mapped MM paths are basis MM paths only.

Example 5.7 By applying the above steps to the SIG of Generate Show Statistics, we generate 1 2 ··· 8 1 1 2 the following MM paths: Ma , Ma , , Ma from ICa, Mb from ICb, Mc , Mc from 1 2  ICc and Md , Md from ICd as shown in Table 5.2.

88 5.2. Our approach

Table 5.2: MM paths obtained from four integration colonies of Generate Show Statis- tics use case.

M 1 → S → · · · → E → S → · · · → E → S → · · · → E → a = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

2 → S → E → S → · · · → E → S → · · · → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

3 → S → · · · → E → S → E → S → · · · → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

4 → S → · · · → E → S → · · · → E → S → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

5 → S → E → S → E → S → · · · → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

6 → S → E → S → · · · → E → S → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

7 → S → · · · → E → S → E → S → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

8 → S → E → S → E → S → E → Ma = D1 Copt1 Copt1 Copt2 Copt2 Copt4 Copt4 D16

M 1 = D → CS → D → CE → CS → D b 2 loop1 3 loop1 loop1 4

M 1 = D → CS → D → CS → D → · · · → D c 6 loop2 7 opt3 8 9

M 2 = D → CS → D → CS → CE → · · · → D c 6 loop2 7 opt3 opt3 9

M 1 = D → CS → D → CS → D → · · · → D d 11 loop3 12 opt5 13 14

M 2 = D → CS → D → CS → CE → · · · → D d 11 loop3 12 opt5 opt5 14

5.2.2 Construction of MM coverage model

In the previous section, we have discussed about the generation of MM paths from a given SIG. Now, we are interested to find calling relationships (which MM paths call whom) and the average number of paths covering model elements in MM paths. For this, we introduce a coverage model called MM Coverage Model. An MM coverage model is a set of connected trees encapsulating coverage infor- mation of MM paths such as number of messages covered, length and density (i.e. the average number of paths covering an edge) of overlapping part of the MM paths. The construction of MM coverage model consists of three steps.

1) Merge MM paths of each integration colony into a tree.

89 5. Control of MM Path Coverage

2) Add connector edges between trees to capture call relationships among MM paths.

3) Assign weight to an edge same as the number of paths covering the edge.

The above mentioned steps are discussed in details as follows.

1) Build trees: We merge MM paths of each integration colony into a tree. After doing the same for all integration colonies, we obtain a set of trees.

2) Add connector edges between trees: To capture call relationships among MM paths of a pair of trees, we add a connector edge (an undirected dotted edge) between two occurrences of the same node in the trees. For such pair of trees, the tree whose root node has a connector edge is referred to as child tree and other one is called parent tree, whose non-root node has a connector edge. Note that some MM paths (it may be all) of parent tree can call an MM path of child tree.

3) Assign edge weights: We assign a weight (called eW eight) to each edge (except connector edges). The eW eight of an edge is defined as the number of paths covering the edge. The assignment starts from some leaf node in the tree for the main integration colony and continues until all edges are assigned. Note that a node can have multiple occurrences in different branches of the same tree or two different trees. For assignment, we use the following rules where the term ‘current node’ refers to a node currently being visited.

a) If an outgoing edge of current node is not assigned, then go downward in that branch until a leaf node or a weighted edge is found. b) If current node is connected to a child tree, then check whether all branches containing an occurrence of current node are assigned from leaf node up to that occurrence of current node. If it is so, then follow the Rule (d) for assigning the child tree, otherwise complete assignment in these branches using the Rules (a)-(e). c) If all outgoing edges of current node are already assigned, then assign a weight to its incoming edge the same as the sum of weights of all outgoing edges. In case the current node does not have an outgoing edge, assign the weight of the incoming edge as one.

90 5.2. Our approach

d) For assigning a child tree, find its parent tree and their connected node say n and then follow the steps. i) Compute the sum of weights assigned to the outgoing edges of all occurrences for the node n in the parent tree. ii) Assign that sum to the incoming edge of each leaf node in the child tree. The rest of the edges are assigned following the Rules (a)-(e). iii) Find the weight W of outgoing edge of the root node in the child tree. iv) Distribute W among the incoming edges of all occurrences of the node n in the parent tree as per their outgoing edge weights. e) Apply the Rules (a)-(d) until all edges are weighted.

Example 5.8 We explain the construction of MM coverage model for SIG of Generate Show Statistics in the following. Initially, we merge the MM paths of four integration colonies: ICa, ICb, ICc, and ICd of Generate Show Statistics into four trees Ta, Tb,

Tc and Td, respectively (in Fig. 5.8, also see the labels on their root nodes). We mark the MM paths (see Table 5.2) by the labels at the end nodes of each tree (see Fig. 5.8).

In Fig. 5.8, we observe that the non-root node D2 in Ta is the same as the root node in Tb. Thus, we add a connector edge between these two occurrences of D2 in 1 3 4 Ta and Tb. This connector edge signifies that four MM paths namely Ma , Ma , Ma , 7 Ma in ICa can call an MM path in ICb. Similarly, we find other connector edges as shown in Fig. 5.8. 1 We start edge assignment from a leaf node of Ta, say D16 in the tree branch Ma and continue this up to the connector node D11. At this point, we need to complete 2 3 5 the assignment in other tree branches (i.e. Ma , Ma , Ma ) which have an occurrence of D11. Once the edges from leaf nodes up to all occurrences of D11 are assigned, we start assigning the tree Td (which is connected to Ta). For this, we initialize the incoming edge weight of each leaf node in Td as four, which is the sum of weights assigned to the outgoing edges of four occurrences of D11 in Ta. Following the above rules, we assign the weights to the edges in Td as shown in Fig 5.8. In this regard, we S may note that the incoming edge weight of Copt5 is set as eight because it is the sum of two outgoing edge weights.

Once the assignment in Td is complete, we renew assignment in the tree Ta. Prior

91 5. Control of MM Path Coverage to this, we find the weight (i.e. eight) assigned to the outgoing edge of the root node in Td and then assign equal divisions of eight to the incoming edges of four occurrences of D11 in Ta. In effect, the incoming edge of each occurrence of D11 in Ta would have the weight two. Similar way, we find the remaining edge weights as shown in Fig. 5.8. 

5.2.3 Path generation from MM coverage model

In the previous section, we have discussed how to construct MM coverage model. Now we use this MM coverage model to generate a subset of all paths so that all MM paths of the sequence diagram are effectively covered. For this, we propose two coverage criteria as given below.

• All MM Paths (AMP): Given a set of paths (P) and all MM paths of a sequence diagram, P must cover all MM paths at least once.

• Prioritized MM Paths (PMP): Given a set of paths (P) and all MM paths of a sequence diagram, P must cover all MM paths at least once and individual coverage amount of MM paths has to be according to their priorities, which can be the number of messages covered, number of possible object-states for message sending etc.

Determining the minimum limits for number of paths To satisfy our proposed coverage criteria, the number of generated paths must be greater than or equal to the minimum limit (MIN Limit). To determine this, we need to consider the coverage of all MM paths at least once. Since invocation of any MM path is originally rooted at the main integration colony (MIC), all MM paths (say total η) of MIC have to be included at least once in the generated paths. Therefore, the MIN Limit is the greater than or equal to η. In effect, two possible situations can arise in the following.

• For MIN Limit = η: In this case, the number of MM paths that can invoke the MM paths of an integration colony is greater than or equal to the total number of MM paths in that integration colony. As invocation of all MM paths are rooted at the main integration colony (MIC), coverage of all MM paths (total η) of MIC in this case would imply the coverage of all MM paths in sequence diagram. Note that one path can include only one MM path of the

92 5.2. Our approach

Ta

D1 18

T S c 9 Copt1 C E 9 D6 opt1 D 12 9 2 9 S C S loop2 C opt2 D5 12 T 6 3 9 b C E D7 opt2 D2 D6 C E 12 3 opt1 3 9 9 S 1 S C Copt Td S 6 opt3 D 4 S C 10 C loop1 E D opt2 6 C 2 11 9 3 opt4 6 C E 1 8 opt3 D8 E 3 D11 D6 D3 Copt2 6 6 S D16 C 3 1 loop3 3 9 C E E loop2 C 8 8 E opt3 1 S Ma D10 Cloop Copt4 D15 1 6 6 E 2 D12 9 S Copt 1 3 C E 4 E loop2 C loop2 8 Copt 1 2 E S D E Cloop 6 11 C Copt 1 6 opt4 3 2 D C S 9 16 1 4 opt5 3 D9 S 1 S Cloop2 Copt4 Ma6 D S D4 MC2 15 D16 4 C 6 E 2 1 opt4 Copt5 1 1 Mb1 Ma5 2 D9 D 4 13 E E E D11 C Copt C opt4 4 M 1 opt4 E 4 C C D11 1 1 loop3 1 1 C E 1 4 opt5 D16 D D15 16 4 D16 D15 M 4 C S a M 2 loop3 1 a E Ma7 Cloop3 1 4 E Copt4 4 E D14 C 1 opt4 M 2 S 1 d Cloop3 D16 4 D16 Ma3 D14 Ma1 Md1

Figure 5.8: MM coverage model of Generate Show Statistics.

93 5. Control of MM Path Coverage

MIC. Therefore, the minimum limit for the number of paths would be the total number of MM paths (η) of MIC.

• For MIN Limit > η: In this situation, the number of MM paths that can invoke the MM paths of some integration colony is less than the total num- ber of MM paths in that integration colony. Therefore, additional coverage of few MM paths in MIC is necessary to invoke the remaining MM paths. We first consider one additional coverage of the MM path of MIC to include the maximum number of remaining MM paths and continue this until all remaining MM paths are covered. Let η′ be the number of MM paths in MIC, whose additional coverage have been considered to include the remaining MM paths. Therefore, the value of minimum limit (MIN Limit) would be as η + η′.

Path generation satisfying the proposed coverage criteria From MM coverage model, we find the length and density of common subpaths of MM paths. This information helps us to prioritize the MM paths and hence, decide their coverage values. Based on the coverage of MM paths, we concatenate them to generate a subset of all paths. All these steps are described in the following.

1) Prioritizing MM paths: For prioritization of MM paths, we consider two aspects of their common subpaths: length of overlap and density (number of paths covering a model element) [24]. To measure this, we propose a metric

called AWPL (Average Weighted Path Length). For an MM path Mi, AWPL is defined below e + e + ··· + e AW P L(M ) = 1 2 l i l

where e1, e2, ··· , el represent eW eights (number of paths covering an edge) of

l edges of Mi (see the construction of MM coverage model in Section 5.2.2 of this chapter). Note that higher AWPL value for an MM path implies that the MM path has common subpath with higher length and density (the number of paths covering the subpath). Its significance is that even if a single fault exists in higher AWPL valued MM path, then it would cause more number of failures to occur during execution and hence, such MM paths should be covered more. We compute AWPL for all MM paths of each integration colony and then find their ranks according to descending order of AWPL values.

94 5.2. Our approach

2) Determining coverage of MM paths: We use ranks of MM paths to deter-

mine their coverage. First, we need to find the number of paths (say, G(Mi))

to include an MM path Mi. The value of G(Mi) is computed depending on

whether Mi belongs to the main integration colony or other integration colony as follows.

• For Mi in main integration colony

We initially set G(Mi) as one for all MM paths (say, η be the total number) to ensure each of their coverage by at least one path. With this initial assignment, the number of remaining paths that still need to be considered is N = N − η. We then sequentially find the additional coverage of MM paths as per the decreasing order of AWPL values as follows. { Max(Mi) if N ≥ (Max(Mi) − G(Mi)) G(Mi) = G(Mi) + N if N < (Max(Mi) − G(Mi))

Each time G(Mi) is updated, the number of remaining paths has to ad-

justed as N = N − G(Mi).

• For Mi in other integration colony

Let GT otal be the sum of G(Mi) of all MM paths those call an MM path

Mk of some integration colony (IC). The G(Mk) is computed as follows

AW P L(M ) G(M ) = G × k (5.1) k T otal Total AWPL

where Total AWPL refers to the sum of AWPL of all MM paths in IC.

In case of decimal value for G(Mk), we promote it to next higher integer, if decimal part has the value greater than or equal to 0.5, otherwise it is demoted to next lower integer.

3) Concatenating MM paths: In the previous step, we have decided the cov- erage of MM paths based on their ranks. In this step, we generate paths by concatenating MM paths as follows.

i) Concatenation starts from the main integration colony from which we select

the highest priority MM path Mi with non-zero G(Mi) value.

ii) We identify the integration colony whose MM paths are called by Mi and

then select the highest priority MM path (with non-zero G(Mi) value) from

95 5. Control of MM Path Coverage

that integration colony. This is repeated until the selected MM path does not call another MM path. iii) Check whether combination of the MM paths selected in the preceding step is duplicate with some previously generated paths. If it is the case, discard some selected MM path(s) with the next higher priority MM paths. iv) Concatenate the selected MM paths by replacing their call nodes with re- spective called MM paths. For example, an MM path a → b → c has a call node b from which another MM path b → 1 → 2 is invoked. After con- catenating them, we generate a path as a → b → 1 → 2 → c. Once a path

is generated, G(Mi) values of all constituents MM paths are decremented by one.

We may note that the generated paths would include all MM paths at least once and their coverage would be as per their priorities. For generated paths to satisfy All MM Paths, the above procedure can be used along with a prioritization metric called mW eight (representing the number of messages covered by an MM path) instead of AW P L. The motivation of choosing the prioritization metric mW eight lies in the fact that an MM path with higher mW eight is likely to call more number of MM paths, implying more coverage of distinct MM pairs.

Example 5.9 We illustrate our technique for generation of a subset of all paths with reference to Generate Show Statistics sequence diagram. In this case, the number of paths to be generated must be greater than eight (the minimum limit) to satisfy our proposed coverage criteria. Let us consider the generation of ten paths with satisfying Priori- tized MM Paths coverage.

First, we compute AWPL values of all MM paths using four trees Ta, Tb, Tc, Td of the MM coverage model for Generate Show Statistics (Fig. 5.8) and find their ranks as shown in Table 5.3. 1 Out of ten paths, we initially assign eight paths to cover each MM path (i.e. Ma , ··· 8 , Ma ) of the main integration colony at least once. We then check possibility of additional coverage of these eight MM paths with decreasing order of their AWPL 4 3 values. Note that Ma and Ma are the first and second highest AWPL valued MM paths and are possible to cover once more with the remaining two paths. Considering 3 4 this, G(Mi) for Ma and Ma would have the value two, whereas other six MM paths

96 5.2. Our approach

Table 5.3: Ranks of all MM paths (see Table 5.2) for each integration colony.

MM path AW P L Rank MM path AW P L Rank 1 8 Ma 74/13 = 5.69 6 Ma 44/7 = 6.28 3 2 1 Ma 56/11 = 5.09 8 Mb 45/5 = 9.00 1 3 1 Ma 65/11 = 5.90 4 Mc 66/8 = 8.25 2 4 2 Ma 71/11 = 6.45 2 Mc 60/7 = 8.57 1 5 1 Ma 47/9 = 5.22 7 Md 44/8 = 5.50 2 6 2 Ma 53/9 = 5.88 5 Md 40/7 = 5.71 1 7 Ma 62/9 = 6.88 1

Table 5.4: Ten paths generated from MM coverage model.

Path Constituent MM paths Path Constituent MM paths 7 1 4 1 2 P1 Ma + Mb P2 Ma + Mb + Mc 4 1 1 8 P3 Ma + Mb + Mc P4 Ma 3 1 2 3 1 1 P5 Ma + Mb + Md P6 Ma + Mb + Md 6 2 1 1 2 2 P7 Ma + Mc P8 Ma + Mb + Mc + Md 5 2 2 1 1 P9 Ma + Md P10 Ma + Mc + Md

of ICa would have the value one each. 4 7 1 In Fig. 5.6, we observe that the total six paths containing either of Ma , Ma , Ma , 3 1 1 Ma can call Mb and therefore, G(Mb ) would have the value six. Similarly, we obtain 1 ≈ 2 ≈ 1 ≈ 2 ≈ G(Mc ) = 2.45 2, G(Mc ) = 2.54 3, G(Md ) = 2.45 2, G(Md ) = 2.54 3 (applying Eq. 5.1).

Based on the coverage of MM paths, we concatenate them and find ten paths (P1,

··· , P10), which are presented in Table 5.4. While generating the path P2, we select 2 higher priority MM path Mc , whereas for the path P3, we select less priority MM 1 path Mc in order to avoid generation of duplicate path. We may note that these ten paths include all thirteen MM paths of Generate Show Statistics at least once and 4 3 the maximum coverage (i.e. two) of two higher priority MM paths (e.g. Ma , Ma ). 

97 5. Control of MM Path Coverage 5.3 Experimental results and analysis

We have carried out our experiments with the following objectives.

i) Investigate whether MM coverage model and control flow graph can be used to select a subset of message paths in sequence diagram satisfying All MM Paths and Prioritized MM Paths coverage.

ii) Compare computation overhead for identifying MM paths using SIG and control flow graph.

5.3.1 Subject design

Since our approach targets to determine effective coverage of MM paths in design level UML sequence diagrams, availability of UML design specifications is necessary to validate our approach. We have designed two systems for the purpose: Restau- rant Automation System (RAS) and Auditorium Management System (AMS)[75]. RAS consists of twenty one classes and its code size is 4.6 KLOC, whereas AMS has eighteen classes and its code size is 3.8 KLOC. The functionalities of RAS have been discussed in Section 4.2.1 of the Chapter 4. On the other hand, AMS automates var- ious functionalities of an auditorium such as book ticket, cancel ticket, manage show, generate show statistics etc. A few important design artifacts of RAS and AMS are given in Appendixes A and B, respectively. Following the standard design principles [13], we have prepared their design spec- ifications which include one use case diagram per system, a use case description per use case, one class diagram per system, one activity diagram and one sequence dia- gram per use case, statechart diagrams for few domain classes. Since our approach uses only sequence diagrams, we focus on them only. We have discussed about design specifications of RAS in the Section 4.2.1 of Chap- ter 4. Now, we describe the same for AMS. The design specification of AMS includes six sequence diagrams for the following use cases: Book Ticket, Cancel Ticket, Com- pute Sale Commission, Pay Commission, Manage Show, Generate Show Statistics. These sequence diagrams contain three types of objects as follows: (1) controller ob- jects: TicketController, CommissionController, ShowController, StatisticsController, (2) boundary objects: TicketBoundary, CommissionBoundary, ShowBoundary, Statis- ticsBoundary, Screen, (3) entity objects: IndexSeatRegister, SeatRegister, Ticke- tRegister, Ticket, FundRegister, Fund, Refund, CommissionRegister, Commission,

98 5.3. Experimental results and analysis

Table 5.5: The characteristics of six sequence diagrams of AMS.

Use case #Messages #Fragments #Objects Book Ticket 20 4 11 Cancel Ticket 22 4 11 Compute Sale Commission 18 5 9 Pay Commission 12 5 6 Manage Show 19 9 6 Generate Show Statistics 16 8 7

ShowRegister, Show. The characteristics of six sequence diagrams of AMS are given in Table 5.5. To show effectiveness of our proposed approach, we select those sequence dia- grams which have the following characteristics: (1) at least three integration colonies; (2) considerable number of fragments and message paths; (3) dissimilar design struc- ture. Examining all sequence diagrams of RAS and AMS with respect to above design criteria, we have chosen three sequence diagrams for the use cases namely Deliver Order (RAS), Generate Show Statistics (AMS), and Manage Show (AMS). The characteristics of subject designs are shown in Table 5.6.

We now briefly describe the functionalities of Deliver Order, Generate Show Statistics, and Manage Show use cases in the following.

• Deliver Order: This use case checks validity of a given order, availability of items specified in the order. If all required items of the order are available with sufficient quantities, then they are made reserved. When an order is delivered, quantities of all its items are updated, following which the order is registered for billing. Table 5.6: Characteristics of subject designs.

Sequence diagram #IC 1 #Fragment #MM paths MIN Limit Deliver Order 3 13 17 13 Generate Show Statistics 4 8 13 8 Manage Show 3 9 22 18 1Integration Colonies

99 5. Control of MM Path Coverage

• Generate Show Statistics: This use case generates various statistical infor- mation such as number of movie tickets sold for a month, total sale for a movie etc.

• Manage Show: This use case manages various tasks such as add show, update show, cancel show etc.

5.3.2 Compare MM coverage model with CFG

We now compare MM coverage model with control flow graph whether they can be used to select a subset of all paths satisfying our two proposed coverage criteria. For comparison, we use control flow graph of sequence diagram because it has been used as test coverage model in the existing approaches [18, 19, 21, 20, 22].

All MM Paths coverage Evaluation metric: We count the number of distinct MM paths covered by the paths generated from MM coverage model and control flow graph.

Experiment: Following our proposed approach, we generate paths from MM cov- erage model of each subject sequence diagram. For this, we use the prioritization metric called mW eight (the number of messages covered by an MM path) because an MM path with higher mW eight is likely to call more number of MM paths, implying more coverage of distinct MM pairs. The paths are generated in several rounds with increasing the number of paths over successive rounds. On the other hand, we transform SIG of each sequence diagram into corresponding control flow graph by replacing scope edges with control edges. For path generation, we first prioritize the paths of control flow graph using the same prioritization metric mW eight and then select higher mW eight valued paths. In this case, we select the same number of paths over successive rounds as we did for MM coverage model. In our subsequent discussions, we refer to the generated paths for MM coverage model and control flow graph as M-paths and C-paths, respectively.

Results and discussion: We have collected the coverage data of M-paths and C-paths for three sequence diagrams as shown in Tables 5.7, 5.8, and 5.9, respectively. From these data, we can see the following: (1) M-paths cover all MM paths in each round, whereas C-paths do not cover all (except for N=29 of Deliver Order); (2) when the

100 5.3. Experimental results and analysis Table 5.7: Coverage of MM paths for Deliver Order.

R 1 N 2 #MM paths A − B SR 3 W 4 M-paths (A) C-paths (B) 1 13 17 9 8 +8 +36 2 15 17 10 7 +7 3 18 17 11 6 +6 4 20 17 12 5 +5 5 22 17 13 4 +4 6 24 17 14 3 +3 7 26 17 15 2 +2 8 28 17 16 1 +1 9 29 17 17 0 W = 36 > 34 (critical value of W at 1% significance level) [80] 1Round, 2 Number of paths, 3 Signed rank 4 Sum of signed ranks

Table 5.8: Coverage of MM paths for Generate Show Statistics.

R N #MM paths A − B SR W M-paths (A) C-paths (B) 1 8 13 8 5 +10 +55 2 9 13 9 4 +7.5 3 10 13 9 4 +7.5 4 11 13 9 4 +7.5 5 12 13 9 4 +7.5 6 13 13 10 3 +5 7 14 13 11 2 +3.5 8 15 13 11 2 +3.5 9 16 13 12 1 +1.5 10 17 13 12 1 +1.5 z = 2.49 > 2.326 (critical value of z at 1% significance level) [80]

101 5. Control of MM Path Coverage

Table 5.9: Coverage of MM paths for Manage Show.

R N #MM paths A − B SR W M-paths (A) C-paths (B) 1 18 22 9 13 +9 +45 2 22 22 10 12 +8 3 26 22 11 11 +7 4 30 22 13 9 +6 5 34 22 15 7 +5 6 38 22 16 6 +4 7 42 22 17 5 +3 8 46 22 19 3 +2 9 49 22 21 1 +1 W = 45 > 43 (critical value of W at 0.5% significance level) [80] number of generated paths is close to the minimum limit, the coverage difference be- tween M-paths and C-paths is large and decreases gradually with an increase in path numbers. To verify whether the coverage difference between M-paths and C-paths is signif- icant or not, we need to find their population means. After examining the coverage information, we see that the population of M-paths has single-valued data indicating the total number of MM paths in sequence diagrams, whereas the population of C- paths has the range of data values with linear increase to the number of paths. This implies that their populations are not normally distributed and hence, t-test can not be applied. In this case, we apply non-parametric Wilconxon Signed Rank test [80] where null hypothesis is that the difference between their population means is zero. We should discard the null hypothesis if test statistic is found to be greater than critical value. For this, we first determine absolute coverage difference between the M-paths and C-paths for all rounds, find their signed ranks as well as total signed ranks (W) as reported in Tables 5.7, 5.8, and 5.9. In case the number of sample data is equal to ten or more, we further determine the parameter z (see Table 5.8). From these tables, we can observe that the value of test statistic (z or W ) in each case is greater than critical value for the number of sample data used in our experiment. This implies that C-paths cover significantly less number of MM paths than required

102 5.3. Experimental results and analysis to satisfy All MM Paths. In other words, MM coverage model can be used to select a subset of message paths satisfying All MM Path coverage, whereas control flow graph can not be used for the same.

Prioritized MM Paths coverage Evaluation metric: We count the number of prioritized MM paths covered by M- paths (generated from MM coverage model) and C-paths (generated from control flow graph). Further, we need to determine how much critical parts are covered by M-paths and C-paths, which depends on relative criticalness of their constituent MM paths. To quantify this, we define a term called priority score (PS). The priority score is computed for an MM path depending on whether it belongs to the main integration colony or some other integration colony in the following.

For main integration colony: The priority score (PS) of an MM path (Mi) is defined as the ratio of AWPL of Mi to the total AWPL of all MM paths in in the main integration colony (MIC), that is,

AW P L(Mi) PS(Mi) = ∑n (5.2) AW P L(Mi) i=1

For other integration colony: An MM path of an integration colony (IC) is exe- cuted only when it is called by some MM path in the main integration colony (MIC) or another integration colony. Therefore, the value of priority score of an MM path, say Mi of IC depends on the average (let it be PSavg) of priority scores of those MM paths which call Mi. Note that PSavg is the average priority score for any MM path belonging to IC. With this consideration, priority score (PS) is defined for an MM path Mi as the product of PSavg with the ratio of AWPL of Mi to the total AWPL of all MM paths belonged to IC, that is,

× AW P L(Mi) PS(Mi) = PSavg ∑n (5.3) AW P L(Mi) i=1

Example 5.10

Let us compute the average priority score (PSavg) for ICb (see Fig. 5.6 and Table

103 5. Control of MM Path Coverage

5.2). In this case, we need to consider the priority scores (i.e. 0.12, 0.124, 0.136, 1 3 4 7 0.145) of four MM paths namely Ma , Ma , Ma , Ma since they call an MM path in 0.12+0.124+0.136+0.145 ICb (see Fig. 5.6). Therefore, the PSavg for ICb would be 4 = 0.131. 

Experiment: Following our proposed approach, we generate M-paths from MM cov- erage model obtained for all sequence diagrams. On the other hand, C-paths are generated from control flow graphs after prioritizing their paths based on AWPL val- ues and thereafter, selecting higher AWPL valued paths. In both cases, the same number of paths have been generated in several rounds.

Results and discussion: We have collected the coverage data for M-paths and C-paths of three sequence diagrams as shown in Tables 5.10, 5.11, and 5.12, respectively. From these coverage data, we can see that M-paths cover all MM paths in each round for Deliver Order (total 17 MM paths), Generate Show Statistics (total 13 MM paths), Manage Show (total 22 MM paths). On the other hand, C-paths do not cover all MM paths in each case, except one round for which the number of paths is close to the maximum limit. Likewise our previous experiment, we also apply non-parametric Wilconxon Signed Rank test [80] to verify whether the difference between population means of M-paths and C-paths is significant. From Tables 5.10, 5.11, and 5.12, we can observe that the test statistic is greater than the critical value for respective sequence diagrams. This implies that even with using same prioritization metric, C-paths cover significantly less number of prioritized MM paths than required to satisfy Prioritized MM Paths. To examine up to what extent M-paths and C-paths cover critical parts, we com- pute their total priority scores. Note that priority score of individual M-paths or C-paths can be obtained by summing up the priority scores of their constituent MM paths. Unlike M-paths, C-paths do not cover all MM paths of sequence diagrams. Few C-paths are seen to have additional coverage of an MM path in the main inte- gration colony (MIC), whereas some MM paths in MIC are not covered even single time. Such C-paths should be excluded from computation of the total priority score. This is because, each MM path in the main integration colony of sequence diagrams should be covered by at least one C-path to satisfy Prioritized MM Paths coverage. To illustrate this situation, let us consider the sample of eight C-paths generated for Generate Show Statistics as shown in Table 5.13. From Table 5.13, we can observe

104 5.3. Experimental results and analysis Table 5.10: Prioritized MM path coverage of Deliver Order.

N M-paths C-paths Signed rank #Pri. MM paths 1 PS 2 #Pri. MM paths PS d= d’= (A) (B) (C) (D) (A-C) (B-D) W(d) W(d’) 13 17 1.541 9 0.867 +7 +7 + 28 +28 15 17 1.814 10 1.254 +6 +6 18 17 2.176 11 1.729 +5 +5 20 17 2.408 12 2.071 +4 +4 22 17 2.635 13 2.395 +3 +3 24 17 2.842 16 2.723 +1.5 +1.5 26 17 3.084 16 2.965 +1.5 +1.5 28 17 3.324 17 3.324 0 0 W (d) = W (d′) = 28 > 24 (critical value of W(d) at 2.5% significance level) [80] 1Number of prioritized MM paths, 2Priority score

Table 5.11: Prioritized MM path coverage of Gen. Show Statistics.

N M-paths C-paths Signed rank #Pri. MM paths PS #Pri. MM paths PS d= d’= (A) (B) (C) (D) (A-C) (B-D) W(d) W(d’) 8 13 1.99 9 1.159 +8.5 +8 +45 +44 9 13 2.318 9 1.485 +8.5 +9 10 13 2.631 10 1.981 +6.5 +7 11 13 2.814 10 2.349 +6.5 +6 12 13 3.182 11 2.849 +5 +5 13 13 3.55 12 3.441 +2.5 +2 14 13 3.918 12 3.752 +2.5 +4 15 13 4.086 12 3.976 +2.5 +3 16 13 4.308 12 4.2 +2.5 +1 17 13 4.534 13 4.59 0 -1 z(d) = 2.03 > 1.96 (critical value of z(d) at 2.5% significance level) [80] z(d′) = 1.99 > 1.96 (critical value of z(d’) at 2.5% significance level) [80]

105 5. Control of MM Path Coverage

Table 5.12: Prioritized MM path coverage of Manage Show.

N M-paths C-paths Signed rank #Pri. MM paths PS #Pri. MM paths PS d= d’= (A) (B) (C) (D) (A-C) (B-D) W(d) W(d’) 18 22 1.591 10 0.752 +8 +8 +36 +36 22 22 1.946 12 1.212 +7 +7 -1 26 22 2.324 14 1.73 +6 +6 =+35 30 22 2.748 15 2.244 +5 +5 34 22 3.164 16 2.751 +4 +4 38 22 3.535 17 3.124 +3 +3 42 22 3.911 18 3.672 +2 +2 46 22 4.252 20 4.183 +1 +1 49 22 4.525 22 4.551 0 -1 W (d) = 36 > 34 (critical value of W(d) at 1% significance level) [80] W (d′) = 35 > 29 (critical value of W(d’) at 5% significance level) [80]

Table 5.13: Eight C-paths (P1 ··· P8) for Generate Show Statistics.

Constituent MM paths PS Constituent MM paths PS

4 1 2 6 1 P1 = Ma + Mb + Mc 0.328 P5 = Ma +Mc 0 7 1 1 1 2 2 P2 = Ma + Mb 0.276 P6 = Ma + Mb + Mc + Md 0.37 4 1 1 1 1 1 2 P3 = Ma +Mb + Mc 0 P7 = Ma +Mb + Mc + Md 0 6 2 1 1 2 1 P4 = Ma + Mc 0.185 P8 = Ma +Mb + Mc + Md 0 Total MM paths=9 Total Priority Score=1.159

2 3 5 8 that Ma , Ma , Ma , Ma are not covered at all. To identify the C-paths for which these four MM paths could not be covered, we examine eight C-paths with increasing order of AWPL values (i.e. P8, ··· , P1). We find that P3, P5, P7, P8 are the paths which 1 4 6 2 3 5 8 cover Ma , Ma , Ma additionally without having single coverage of Ma , Ma , Ma , Ma .

That is why, P3, P5, P7, P8 are excluded from total priority score calculation. In other words, the priority scores of P3, P5, P7, P8 are zero. Following the above mentioned approach, we compute the total priority scores of

106 5.3. Experimental results and analysis

M-paths and C-paths for all rounds as shown in Tables 5.10, 5.11, and 5.12. Since their coverage data are not normally distributed, we apply Wilconxon Signed Rank test [80] and find that test statistic is greater than critical value for individual se- quence diagrams. This test proves that M-paths cover critical MM paths significantly more number of times than C-paths, which substantiates effectiveness of MM coverage model to satisfy Prioritized MM Paths.

5.3.3 Comparing computation overhead with SIG and CFG

In our proposed approach, we have identified MM paths from SIG, where the starts and ends of the MM paths are explicitly marked by means of scope edges and are implicit in control flow graph (CFG). We now investigate whether computation over- head for identifying MM paths can be reduced using SIG instead of CFG. By the term ‘computation overhead’, we mean the total number of operations to be per- formed for identifying MM paths.

Determining computation overhead The computation overhead for SIG incurs due to two tasks: (1) identification of integration colonies from SIG, (2) generation of MM paths from those integration colonies. On the other hand, overhead for CFG is due to three tasks: (1) generation of paths from control flow graph, (2) identification of MM paths from those paths, (3) discard duplicate MM paths, if any. While considering overhead for the individual tasks, we find that the path generation overhead is approximately same for both SIG and CFG. This is because, the total number of nodes and edges that need to be pro- cessed for generating paths both from all integration colonies in SIG and from CFG is almost same (differs only in the number of scope edges). With this consideration, the path generation overhead may be excluded from our analysis. We use following notations in our analysis.

107 5. Control of MM Path Coverage

Notation

ΓSIG : Total computation overhead with SIG.

ΓCFG : Total computation overhead with CFG. N : Total number of basic paths in CFG/SIG.

Γi : Total computation overhead for a basic path Pi where 1 ≤ i ≤ N.

µi : Number of MM paths contained in Pi.

λi : Number of nodes of Pi.

Computation overhead for CFG Since method scope information is implicit in the generated paths of CFG, we need to verify whether each node in a basic path of CFG corresponds to the start or end of some constituent MM path. Subsequently, we need to check whether newly identified

MM paths are duplicate. Therefore, the total computation overhead (Γi) for a basic path (say pi) of CFG would be the processing of all nodes (λi) in the path pi plus checking duplicate among all previously generated MM paths (i.e. total µ1 + µ2 +

··· + µi−1 MM paths contained in all previous paths). In other words,

∑i−1 Γi = λi + µk where 1 ≤ i ≤ N k=1

Therefore, the total computation overhead (TCFG) for N paths of CFG would be

∑N ΓCFG = Γi i=1

= (λ1 + λ2 + ··· + λN ) + (N − 1) × µ1 + (N − 2) × µ2 + ··· + µN−1

To simplify our analysis without loss of generality, we can assume λ1 = λ2 = ··· = λN

= λ and µ1 = µ2 = ··· = µN = µ. With this consideration, we can rewrite above expression as follows.

ΓCFG = λ × N + µ × [1 + 2 + ··· + (N − 1)] × × N−1 = N [λ + µ ( 2 )] ≈ × × N N [λ + µ 2 ]

108 5.3. Experimental results and analysis

Computation overhead for SIG

The total computation overhead (i.e. TSIG) incurs due to the processing of all nodes in two rounds, more specifically, determining scope edge pairs in first round and identifying subgraphs corresponding to the scope edge pairs in second round. If the total nodes of SIG is δ, then total computation overhead (TSIG) for SIG would be

ΓSIG = δ + δ = 2 × δ

As λ is the average number of nodes per path, we can write

δ λ = N

Overhead comparison:

We simplify the ratio of ΓCFG and ΓSIG as follows.

Γ N × [λ + µ × N ] CFG = 2 ΓSIG 2 × δ 1 N × µ = + [δ = λ × N] 2 4 × λ

For ΓCFG > ΓSIG, the following condition must hold.

N × µ ∆ = > 2 (5.4) λ

For ∆ to be greater than two for a sequence diagram, each path of the sequence diagram must contain at least two MM paths on average and the total number of paths must be greater than the average number of nodes per path. These two conditions hold in practical real-life application systems. To verify this, we compute values of the following parameters: µ (average number of MM paths contained in a path), λ (average number of nodes per path), N (total number of paths) for our subject designs as shown in Table 5.14. The reported values of ∆ in Table 5.14 clearly indicates reduction of overhead for identifying MM paths using SIG than CFG. Note that reduction of overhead is more for Manage Show due to the avoidance of duplicate checks of MM paths in larger number of paths than other two sequence diagrams.

109 5. Control of MM Path Coverage

Table 5.14: Computation of ∆ for three subject sequence diagrams.

Sequence diagram µ λ N ∆ Delivery order 68/30 = 2.26 47/30 = 1.56 30 11.36 Generate Show Statistics 51/18 = 2.83 32/18 = 1.77 18 7.69 Manage Show 130/50 = 2.6 37/50 = 0.74 50 44.41

5.3.4 Threats to validity

We now discuss following validity threats in our approach.

1. Construct threat: This threat poses a concern about selection of right param- eters used in our experiments. First parameter is the number of MM paths covered by the generated paths and is necessary to verify the satisfiability of our proposed coverage criteria. On the other hand, we have used another pa- rameter priority score to measure relative criticalness of the generated paths in terms of AWPL values of their constituent MM paths. Note that the same metric AWPL has been used for prioritizing test cases for system level testing of object-oriented systems [24].

2. Internal threat: This threat concerns the occurrence of errors while building SIG from sequence diagrams, identifying MM paths from SIG, constructing MM coverage model, generating paths from MM coverage model. To avoid this kind of errors, we have manually checked the generated artifacts (SIG, MM paths, coverage model, paths) with their actual ones.

3. External threat: This threat arises from a concern about generalization of sub- ject designs used in our experiment. The subject sequence diagrams we had to develop of our own since detailed and useful UML specification (specifically sequence diagrams) for example systems are scarcely available as open source. Note that sequence diagrams obtained from reverse engineering of open source software could not used for our experiments because they have only one inte- gration colony containing code artifacts belonging to single class method and hence, do not satisfy the criteria used for selecting subject designs. On the other hand, limited number of subject designs used in our experiments should not be influencing factor because effectiveness of our proposed approach stands for all sequence diagrams whose each path contains on average two or more MM

110 5.4. Comparison with related work

paths, which is usually seen in practical real-life application systems. Further, we argue that our approach is useful even for large and complex systems because their complex use cases are decomposed into a number of relatively simpler use cases following the standard decomposition principle [91]. Due to this decompo- sition, the design specifications of larger softwares may contain a larger number of sequence diagrams for use cases than our case studies (RAS & AMS), but abstraction levels of our sequence diagrams (which is measured by the product of number of objects and messages) are similar to individual sequence diagrams of larger softwares.

5.4 Comparison with related work

We compare our work with the existing work of Pilskalns et al. [18], Sarma et al. [19], Nayak et al. [20], Dinh-Trong et al. [21], Bandyopadhyay et al. [22], Ali et al. [23] which have used UML sequence/collaboration diagrams for integration testing of object-oriented systems. The differences between our work and the existing work [18, 19, 20, 21, 22, 23] are the following. First, the existing work select all messages paths for test case generation, whereas our approach selects a subset of message paths to cover all MM paths at least once and as per their priorities. Second, the existing work [18, 19, 20, 21, 22] are based on control flow graph of sequence diagram, whereas our approach is based on MM coverage model. For a limited number of message paths to be tested, MM coverage model can be used to select them satisfying the proposed coverage criteria, whereas control flow graph can not be used for the same. Third, Ali et al.’s SCOTEM [23] captures state-transitions of the objects associated with message paths, whereas our MM coverage model captures call relationships among MM paths as well as their priority information. The n-path coverage (subsumes the coverage of all state-transitions along message paths at least once) of SCOTEM [23] can not be used to cover message paths as per their priorities, which can be achieved with our approach. Our work is closely related to Rountev et al.’s proposal of several coverage criteria based on sequence diagrams [32]. The differences between Rountev et al.’s work [32] and our work are as follows. First, our approach determines the effective coverage of MM paths using MM coverage model, whereas Rountev et al.’s work keeps track of

111 5. Control of MM Path Coverage run-time coverage information using IRCFG (Interprocedural Restricted Control Flow Graph). Second, our MM coverage model captures call relationships among MM paths as well as their priority information, whereas IRCFG captures call relationships among RCFG (control flow graph for each method), but no priority information. Third, our All MM Paths coverage is equivalent to All-RCFG-Paths coverage of IRCFG. That is, the coverage of all MM paths implies the coverage of all RCFG-paths and vice- versa. When the number of message paths is equal to the minimum and maximum path limits, Prioritized MM Paths becomes equivalent to All-RCFG-Paths and All- IRCFG-Paths, respectively. In case the number of paths lies between two path limits, Prioritized MM Paths is stronger than All-RCFG-Paths with regard to the coverage of message paths as per their priorities.

5.5 Conclusion

In this chapter, we have proposed an approach to select a subset of message paths in sequence diagram with effective coverage of their underlying MM paths. For this, we build an MM coverage model capturing call relationships among MM paths and their priority information and follow two coverage criteria: All MM Paths (AMP) and Prioritized MM Paths (PMP). If priority of MM paths is our concern, we use Prioritized MM Paths, otherwise All MM Paths should be used. To prioritize the MM paths, we propose a novel metric called AW P L (Average Weighted Path Length), where higher AWPL value for an MM path implies that the MM path has common subpath with higher length and density (the number of paths covering the subpath). Its significance is that even if a single fault exists in higher AWPL valued MM path, then it would cause more number of failures to occur during execution and hence, such MM paths should be covered more. Our experimental results show that MM coverage model can be used for selecting a subset of all message paths satisfying our proposed coverage criteria, whereas control flow graph can not be used for the same even with prioritization of message paths. Presence of explicit marks for the starts and ends of MM paths in SIG helps us to identify them with less computation overhead compared to control flow graph as substantiated by our analysis. For satisfying our proposed coverage criteria, the number of messages paths must be greater than or equal to the minimum limit. As a message path has to be tested

112 5.5. Conclusion with multiple test data commensurate as per its criticality and test budget, individual priority scores of the message paths can be indicative of their number of test data. In this work, we have assumed that all message paths are feasible, that is, they are executable with some test data. We shall investigate infeasibility of message paths in the next chapter.

113

Chapter 6

Identification of Infeasible Paths

In the last two chapters, we have discussed about two applications of SIG namely code generation and determination of effective coverage of MM paths. In this chap- ter, we propose an approach to detect infeasible paths in sequence diagrams using SIG. We introduce two interaction patterns called Null Reference Check (NLC) and Mutually Exclusive (MUX). To identify infeasible paths for these two patterns, we de- sign two algorithms and apply them on the integration colonies and their constituent MM paths. We investigate the effectiveness of two patterns on few case studies and determine test effort saving by using the information about infeasible paths.

6.1 Two interaction patterns of sequence diagrams

In this section, we introduce two interaction patterns namely NLC and MUX.

Null Reference Check (NLC) interaction pattern

An NLC interaction pattern consists of modeling both null and non-null return values for a variable in one method and checking null value of the variable in another method. The variable must remain unmodified between the point of its return and checking null value (we call it as nullify test). If the variable under nullify test of an NLC pattern is supposed to have both null and non-null values in a path, then the path would be infeasible because the variable can not have two values unless it is modified.

115 6. Identification of Infeasible Paths

Example 6.1 Consider an example sequence diagram containing an NLC interaction pattern as shown in Fig. 6.1(a). Here, we can see that DoP rocess1() calls F indObject(), which in turn returns either null or non-null (ObjectPool[i]) value depending on availabil- ity of an object reference in the list. This return value is assigned to the variable aObject, which is subsequently used for checking null value by the fragment alt2 in DoP rocess1(). Let us identify infeasible path(s) due to the NLC interaction pattern present in an example sequence diagram (Fig. 6.1(a)). For this, we consider the corresponding

SIG (Fig. 6.1(b)) which has four paths P1, P2, P3, P4 as shown below.

S E S S E S E P1 = m1 → m2 → C → m3 → C → C → C → m4 → C → C → m6 → C → m8 loop1 loop1 loop1 alt1 alt1 alt2 alt2 S E S S E S E P2 = m1 → m2 → C → m3 → C → C → C → m4 → C → C → m7 → C → m8 loop1 loop1 loop1 alt1 alt1 alt2 alt2 S E S S E S E P3 = m1 → m2 → C → m3 → C → C → C → m5 → C → C → m6 → C → m8 loop1 loop1 loop1 alt1 alt1 alt2 alt2 S E S S E S E P4 = m1 → m2 → C → m3 → C → C → C → m5 → C → C → m7 → C → m8 loop1 loop1 loop1 alt1 alt1 alt2 alt2

interaction Pattern1[ Pattern1 ]

<> <> <> <> <> <> a : Boundary b : Controller c : Register d : ObjectX e : ObjectX f : Screen m1 1: DoProcess1() m1 2: aObject = FindObject(Name=) m2 loop1 loop m2 [i < ObjectPool.length & found=false] S 3: found = Match(Name=) m3 C loop1

l l a alt1 alt u S O n C b [found=true] j 4: ObjectPool[i] m4 != alt e t 2 c c t= je b n u m3 O l a l [found=false] 5: null m5 m6 m7 E alt2 C alt loop1 [aObject!=null] 6: DoAction() m6 CE f alt2 S o [aObject=null] u 7: DisplayMessage(=) m7 e C n alt1 d tru = = f nd a u ls o e 8: f m8 m8 m4 m5

CE alt1

(a) Sequence diagram. (b) SIG.

Figure 6.1: Sequence diagram with NLC interaction pattern and its SIG.

116 6.1. Two interaction patterns of sequence diagrams

Let us consider a path say P2 in the SIG. For the path P2 to be executable, its two predicates (found = true) and (aObject = null) must be true. The satisfiability of the predicate (found = true) implies that aObject would be assigned with non-null value returned from F indObject(), whereas the satisfiability of other predicate (i.e. aObject = null) contradicts the former implication. Further, we can observe that aObject has not been modified between message 2 and alt2, which confirms the in- feasibility of P2. Similarly, we can find that the path P3 is also infeasible. 

Characteristics: From the above discussions, we can summarize the characteris- tics of NLC interaction pattern as follows.

1. An NLC interaction pattern consists of modeling both null and non-null return values for a variable in one method and checking null value of the variable in another method.

2. The variable must not be modified between the point of its return and null value check.

3. The paths on which the variable is supposed to have both null and non-null values are infeasible.

Context of occurrence: Let us now investigate the situations where NLC interac- tion patterns can occur. There may arise some situations where an object is required to be selected on the basis of user input from a list and this responsibility may be delegated to an object A by another object B. While performing the responsibility, the delegated object A returns a valid object reference (i.e. non-null value), if it is found, otherwise null reference is returned. On receiving the object reference, the object B needs to check null value of that reference before using it. This causes some paths to have both null and non-null values of the object reference without modifica- tion of the same, which implies that the paths are infeasible. In essence, when object selection is modeled in sequence diagrams by means of delegation from one object to another, NLC interaction patterns are likely to occur.

Mutually Exclusive (MUX) interaction pattern

A MUX interaction pattern consists of state-based interaction of an object which

117 6. Identification of Infeasible Paths is permitted to take one action among a set of actions (e.g. sending a message, defin- ing/redefining a variable etc.) based on its own current state or other object’s state. That means, all these actions are mutually exclusive. In this situation, a set of paths that cover at least two mutually exclusive actions are infeasible.

Example 6.2 To illustrate the concept of MUX interaction pattern, let us consider the sequence diagram shown in Fig. 6.2(a). Here, the object b gets state information of the object c by invoking the message GetStatus. Depending on the state of the object c (which is stored in the variable Status), the object b executes one opt fragment as per follow- ing: opt1 for Status = ‘StateA‘; opt2 for Status = ‘StateB’; opt3 for Status=‘StateC’. Since the guard conditions of three opt fragments do not satisfy each other and the common influencing variable (i.e. Status) has not been modified between opt1 and opt3, only one opt fragment will be executed at a time, that is, opt fragments are

m 1

interaction Pattern2[ Pattern2 ] m 2

<> <> <> <> <> a : Boundary b : Controller c : ObjectX d : Screen e : ObjectY C S o p t 1 1: DoProcess2() m1 Status = `StateA’ 2: Status = GetStatus() m2 m 3 opt1 opt [Status="StateA"]3: DisplayMessage(="Message 1 for the State A") m3 C E o p t 1

S opt2 C opt o p t 2 [Status="StateB"] 4: DisplayMessage(="Message 2 for the State B")m4 Status = `StateB’

m 4

opt3 opt E C op t [Status="StateC"] 5: DoAction() m5 2

C S o p t 3 6: m6 Status = `StateC’

m 5

C E o p t 3

m 6

(a) Sequence diagram. (b) SIG.

Figure 6.2: Sequence diagram with MUX interaction pattern and its SIG.

118 6.1. Two interaction patterns of sequence diagrams mutually exclusive. Note that all three opt fragments belong to the method scope of DoP rocess2. Let us find infeasible paths in the sequence diagram due to the MUX interaction pattern. For this, we consider SIG for the sequence diagram (Fig. 6.2(a)) as shown in Fig. 6.2(b). The SIG has eight paths P1, P2, ··· , P8 as given below.

→ → S → → E → S → → E → S → → E → P1 = m1 m2 Copt1 m3 Copt1 Copt2 m4 Copt2 Copt3 m5 Copt3 m6 → → S → E → S → → E → S → → E → P2 = m1 m2 Copt1 Copt1 Copt2 m4 Copt2 Copt3 m5 Copt3 m6 → → S → → E → S → E → S → → E → P3 = m1 m2 Copt1 m3 Copt1 Copt2 Copt2 Copt3 m5 Copt3 m6 → → S → → E → S → → E → S → E → P4 = m1 m2 Copt1 m3 Copt1 Copt2 m4 Copt2 Copt3 Copt3 m6 → → S → → E → S → E → S → E → P5 = m1 m2 Copt1 m3 Copt1 Copt2 Copt2 Copt3 Copt3 m6 → → S → E → S → → E → S → E → P6 = m1 m2 Copt1 Copt1 Copt2 m4 Copt2 Copt3 Copt3 m6 → → S → E → S → E → S → → E → P7 = m1 m2 Copt1 Copt1 Copt2 Copt2 Copt3 m5 Copt3 m6 → → S → E → S → E → S → E → P8 = m1 m2 Copt1 Copt1 Copt2 Copt2 Copt3 Copt3 m6

We observe that the paths P1, P2, P3, P4 cover at least two mutually exclusive opt fragments, that is, they include at least one node between the start and end of each covered fragment. Therefore, P1, P2, P3, P4 are infeasible paths. 

Characteristics: From the above discussions, we can summarize the characteris- tics of a MUX interaction pattern as follows.

1) A MUX interaction pattern consists of a set of control blocks which are executed in mutually exclusive manner.

2) To be mutually exclusive, the control blocks must satisfy the following condi- tions.

i) They must belong to the same method scope.

ii) They must be independent of each other, that is, one must not contain the other.

iii) They must be influenced by at least one common variable which is not modified between these control blocks.

iv) The predicates associated with the control blocks must not be true alto- gether.

119 6. Identification of Infeasible Paths

3) A path that covers at least two mutually exclusive control blocks of MUX in- teraction pattern is infeasible.

Context of occurrence: Let us investigate the situations where MUX interaction patterns can occur. The basic artifacts of a MUX interaction pattern is a set of control blocks encapsulating a set of actions of an object for its different states. Such control blocks can be modeled in sequence diagrams by means of a set of operands of an alt fragment (if-else if-else) or a sequence of opt (if ) fragments. In case of extending behavior of an object, a new control block (i.e. new operand) would require to be added into alt fragment, otherwise else part would behave unexpected way. This means that the behavior encapsulated in alt fragment does not remain closed upon the addition of new object state, thus violating the open-closed principle, that is, “software entities (classes, modules, functions etc.) should be open for extension, but closed for modification” [81]. However, such violation would not arise in case of modeling using opt fragments, because behavior encapsulated in the existing opt fragments would not get affected due to the addition of new object state. In essence, when state-dependent interactions of an object are modeled in sequence diagrams satisfying the open-closed principle, MUX interaction patterns are likely to occur.

6.2 Our approach

In this section, we discuss our technique to detect infeasibility of message paths with respect to MUX and NLC patterns. Given an SIG of sequence diagram, we check whether an integration colony of the SIG has an instance of MUX pattern and then, verify infeasibility of the constituent MM paths. On the other hand, we check a pair of integration colonies as to whether they contain an instance of NLC pattern. The pair of MM paths of an integration colony pair containing a NLC pattern are then verified against the infeasibility conditions of NLC pattern. To illustrate our concept, we use the Manage Show sequence diagram designed for Auditorium Management System (AMS)[75]. In Manage Show, user is asked to enter show title, show timings, show days, and option. Depending on the selected option, which can be either add show or update show or delete show, different scenarios can occur and these have been modeled in the sequence diagram as shown in Fig. 6.3. As we can see in Fig. 6.4, the corresponding SIG contains three integration colonies (Fig. 6.4(b), (c), and (d)), where the main integration colony has eighteen MM paths

120 6.2. Our approach

interaction ManageShowSD[ ManageShowSD ]

<> <> <> <> <> SB : ShowBoundary SC : ShowController SR : ShowRegister ShowList[i] : Show aScreen : Screen

1: ManageShow(Option=, ShowAttribute=, Values=)

opt-1 opt [Option="Add Show"] 2: <> aShow : Show 3: SetShowDetails(ShowAttributes="", Values="")

4: AddShow(=aShow)

5: Display(Msg="Show has been successfully added")

opt-2 opt [Option="Update Show"] 6: aShow = FindShow(ShowTitle=)

loop-1 loop [i < ShowList.size & found=false] 7: found = MatchShow(ShowTitle=):""

alt-1 alt [found=true] 8: ShowList[i]

[else] 9: null

alt-2 alt [aShow!=null] 10: Display(Msg=="Show has not been found")

[else] 11: UpdateShow(ShowAttributes=, Values="")

opt-3 opt [Option="Cancel Show"] 12: aShow = FindShow(ShowTitle=)

loop-2 loop [i < ShowList.size & found=false] 13: found = MatchShow(ShowTitle=)

alt-3 alt [found=true] 14: ShowList[i]

[else] 15: null

alt-4 alt [aShow!=null] 16: SetStatus(Status="Cancelled")

17: Display(Msg=="Show is successfully canceled")

[else] 18: Display(Msg=="Show has not been found")

19:

Figure 6.3: Sequence diagram of Manage Show use case. 121 6. Identification of Infeasible Paths and other two integration colonies have two MM paths each (see Table 6.1).

For MUX interaction pattern: We propose an algorithm named as Detect Infeasibility MUX to detect infeasibility of MM paths due to MUX interaction pattern. We ap- ply the algorithm to the individual integration colonies and process them as follows.

Step 1: (Identify independent control blocks): Please note that independent control blocks are those which do not contain other control block(s). To identify them, we perform depth-first-search traversal of the given integration colony and mark the depth in which a control block is found. During traversal whenever we see a control S S E node Cf representing the start of a control block (Cf ,Cf ), we increment the value of S a variable level (which is initialized the value zero) by one and push Cf along with E its level number into a stack ST . On the other hand, if we see a control node Cf S E representing the end of a control block (Cf ,Cf ), then we check whether all nodes of the control block have been explored completely or not. If it is not, we backtrack to S S E an unexplored child of Cf and resume traversal. Otherwise, we add (Cf ,Cf ) along with its level number into a list of control blocks, LIST and decrement the value of level by one. All control blocks in LIST, which are in the same depth of hierarchy (represented by the value of level), are the independent control blocks.

Example 6.3 Applying step1 to all integration colonies for Manage Show, we find the four sets of independent control blocks shown in Table 6.2. From this Table, we can see that the main integration colony has two sets of independent control blocks, whereas IC1 and IC2 have one set each. 

Step 2 (Determine correlated control blocks): Two independent control blocks are said to be correlated if they have at least one common variable in their associated predicates. For each set of independent control blocks in LIST , we find a set (Sv) of common predicate variables. If Sv is found to be non-empty, the associated control blocks are correlated and are added to the list, LIST ′.

Example 6.4 In our example with four sets of independent control blocks listed in Table 6.2, we S E S E S E find that (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), (Copt3 ,Copt3 ) are correlated. This is because, they

122 6.2. Our approach

m1 m 1

S S Copt1 C opt1 S Cloop1 Option=”Add Show” m2

m 2 m7

m3

m 3 E Cloop1 m4 m 4

S m5 Calt1 m 5

E E C opt1 m8 m9 C opt1

S S E Copt2 C opt2 Calt1 Option = “Update Show” S C loop1 m6 m 6 (c) Integration colony (IC1) I < ShowList.size & found= false

S l a ul S S C n ho m 7 alt2 != C w alt2 = ow n h ul aS l E m10 m11 m 10 m 11 C loop1

e E E u S S r f CC CC t S o alt22 alt22 = u S d C alt1 n n d C u != loop2 o f tr u e C E m 8 m 9 E opt2 Copt2

m13 E S C alt1 S C opt3 Copt3 Option=”Cancel Show” E Cloop2 m12 m 12 S C loop2

ll u a I < ShowList.size & found= false S n S S S != C h C w alt4 o alt4 C ho w alt3 S = a n u m 13 ll m 16 m16 m 18 m18 E m14 m15 C loop2 m 17 m17

f e o u u E r S n SE t C d C = alt3 ! alt C d = SE 3 C alt2 4 n tr C u u Calt24 fo e

m 14 m 15 C E (d) Integration colony (IC2) opt3 E Copt E 3 C alt3

m 19 m19 (a) SIG (b) Main integration colony

Figure 6.4: SIG and three integration colonies for Manage Show use case.

123 6. Identification of Infeasible Paths

IC2 IC1 Main integration colony M M M M M M M M M M M M M M M M M M M M M M 9 10 11 12 13 14 15 16 17 18 1 1 1 2 2 2 3 4 5 6 7 8 d c b d c b b b b b b b b b b b b b b b b b ======m m m m m m m m m m m m m m m m m m m m m m 1 1 1 1 1 1 1 1 1 12 6 1 12 6 1 1 1 1 1 1 1 1 → → → → → → → → → → → → → → → → → → → → → → C C C C C C C C C C C C C C C C C C C C C C loop opt loop opt opt opt opt opt opt opt opt opt opt opt opt opt opt opt opt opt S S S S S S S S S S S S S S S S S S S S loop loop S S al 6.1: Table 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 → → → → → → → · · · → → · · · → → · · · → → · · · → → → → → → · · · → → · · · → → · · · → → · · · → 2 2 → → → → C m C C C C C C C C m m opt opt opt opt opt opt opt opt opt m m E E E E E E E E E 2 7 7 → 13 13 1 1 1 1 1 1 1 1 1 → → → → → → → → → → → → → C C C C C C C C m C C Mptso nerto ooisof colonies integration of paths MM opt opt opt opt opt opt opt opt E E E E E E E E C C C C C C C C C 3 loop loop C C E E opt → opt opt opt opt opt opt opt opt 1 1 1 1 1 1 1 1 S S S S S S S S S loop loop E E → → → → → → → → 1 1 2 2 2 2 2 2 2 2 2 m → → 2 2 → → → → → · · · → · · · → · · · → · · · → C C C C C C C C 4 → → opt opt opt opt opt opt opt opt S S S S S S S S C C → C m m C C loop loop C C S S opt opt opt 2 2 2 2 2 2 2 2 E E E 6 6 m loop loop m m m m S S → · · · → → · · · → → → · · · → · · · → · · · → · · · → → → 2 2 2 1 1 5 10 11 10 11 → → → → → 2 2 → C C C C → · · · → → · · · → → · · · → → · · · → → → opt opt E E alt alt S S C C C C C C m m m m opt opt opt 2 2 2 2 S S S alt alt C C S S opt E 10 11 10 11 → → → → alt alt S S m m 3 3 3 1 1 1 → · · · → → · · · → → · · · → → · · · → → → → → → 3 3 10 11 → C C m m → → opt opt S S C C C C 10 11 → · · · → → · · · → C m m m m C opt opt opt opt E E E E opt 3 3 E m m 12 12 8 9 opt → → S → · · · → → · · · → 2 2 2 2 → → 14 15 3 → → 2 C C → → → → → C C C C → → → C C alt alt E E C C opt opt opt opt E E E E C C C C alt alt E E m 2 2 C C alt alt C S S C C opt opt opt opt aaeShow Manage 2 2 2 2 S S S S 1 1 → → 19 opt opt opt E E alt alt E E E 4 4 → → → → 3 3 3 3 m m → → 3 3 2 2 2 C C → · · · → → · · · → → · · · → → · · · → C C C C 16 18 → → → opt opt E E m m opt opt opt opt S S S S → → · · · → 2 2 C C C 16 18 3 3 3 3 → → opt opt opt S S S m → · · · → → · · · → → · · · → → · · · → → → 3 3 3 17 C C m m m m m C → → → opt opt S S · · · → s case. use alt 16 16 18 18 E 17 3 3 C C C C 4 → → · · · → · · · → → → → opt opt opt opt E E E E → m m m m 3 3 3 3 C m m C C C 16 16 18 18 C → → → → opt E opt opt alt 17 17 E E E opt E C C → → · · · → · · · → 4 3 3 3 m m m m · · · → · · · → opt opt E E 3 → → → → m m 19 19 19 19 → 3 3 17 17 C m → → m m C C m opt E C C 19 · · · → · · · → 19 19 opt opt E E m m 19 opt opt E E 3 3 3 19 19 → 3 3 → → → → C C m m m opt opt E E m m 19 19 19 3 3 19 19 → → m m 19 19

124 6.2. Our approach

Input: An integration colony and its MM paths; Output: Set of infeasible MM paths; Begin; Initialize a stack ST as null; Initialize level as zero;; /*Mark hierarchy level of a control block*/ Initialize Current Node as the start node of the integration colony; ; /*Step 1: Identify independent control blocks (ICB)*/ Visit the integration colony following depth-first-search traversal; while Current Node ≠ null do S if Current Node represents the fragment start (Cf ) then Increment the value of level by one; Push the Current Node along with its level into ST ; end E if Current Node represents the fragment end (Cf ) then if a child node of Current Node remains unexplored then Backtrack to the unexplored child and resume traversal; end else ; /*Exploration of the control block is complete*/ S E Add the control block (Cf , Cf ) along with its level into LIST ; Pop the top element from ST ; Decrement the value of level by one; end end end ; /*Step 2: Determine correlated control blocks*/ for each pair of control blocks in LIST do if their level value matches then Compute their set of common predicate variables; if common variable set is non-empty then Add the control block pair along with common variable set into LIST ′; end end end ; /*Step 3: Identify mutually exclusive control blocks*/ for each pair of control blocks in LIST ′ do Visit the subgraph of the integration colony enclosing the pair of control blocks following BFS traversal; if a common predicate variable is modified in the subgraph then Exclude the pair of control blocks from LIST ′ ; /*They can not be mutually exclusive control blocks*/ continue; end ; /*Check each fragment operand (which has one predicate) for mutually exclusiveness*/ Conflict found = false; for each pair of predicates associated with the control block pair do if atleast one predicate contains ‘||’ operator then Continue ; /*The predicate pair can not be mutually exclusive*/ end Identify the predicate clause pair (Cl1, Cl2) which reference the same variable; Conflict Status = Call CheckConflictP redicateClauseP air(Cl1, Cl2); if Conflict Status == true then Conflict found = true ; /*Conflicting predicate pair has been found*/ Add (Cl1, Cl2) into CONF LICT LIST ; end end if Conflict found == false then Exclude the pair of control blocks from LIST ′; end end ; /*Identify infeasible MM paths*/ for each MM path of the input integration colony do if the MM path covers a pair of control blocks in LIST ′ then if the MM path contains a conflicting clause pair (Cl1, Cl2) in CONF LICT LIST then Mark the MM path as infeasible; end end end End 125 Algorithm 1: Detect Infeasibility MUX 6. Identification of Infeasible Paths

Input: Two predicate clauses Cl1, Cl2 Output: Boolean value Begin; ; /*This procedure checks contradiction between two predicate clauses, if any*/ Find pair of operators (OP1, OP2) for the predicate clause pair (Cl1, Cl2); Find pair of right operands (RO1, RO2) for (Cl1, Cl2); if (OP1 = ”instanceof“ && OP2 = ”=“ && RO2 = ”null“) || (OP2 = ”instanceof“ && OP1 = ”=“ && RO1 = ”null“) then return true; end Conflict Status= Call CheckConflictOperators(OP1, OP2); if Conflict Status = false & RO1 ≠ RO2 then return true; end if Conflict Status = true & RO1 = RO2 then return true; end else return false; end End Algorithm 2: CheckContradictPredicateClausePair

Input: Two operators OP1, OP2 Output: Boolean value Begin; if (OP1 = ”>“ && (OP2 = ”≤“ || OP2 = ”<“)) then return true; end if (OP1 = ”<“ && (OP2 = ”≥“ || OP2 = ”>“)) then return true; end if ((OP1 = ”≥“ || OP1 = ”>“) && OP2 = ”<“) then return true; end if ((OP1 = ”≤“ || OP1 = ”<“) && OP2 = ”>“) then return true; end if (OP1 = ”≠ “ && OP2 = ”=“) || (OP1 = ”=“ && OP2 = ”≠ “) then return true; end return false; End Algorithm 3: CheckConflictOperators have a common predicate variable Option as evident from their predicates shown in

Table 6.2: Independent control blocks of graph model for Manage Show.

Integration colony Level Independent control blocks Sv Correlated S E S E S E { } MIC one (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), (Copt3 ,Copt3 ) Option Yes two (CS ,CE ), (CS ,CE ) {aShow} Yes alt2 alt2 alt4 alt4 IC1 one (CS ,CE ), (CS ,CE ) {found} Yes loop1 loop1 alt1 alt1 IC2 one (CS ,CE ), (CS ,CE ) {found} Yes loop2 loop2 alt3 alt3

126 6.2. Our approach

Table 6.3. Similarly, other three sets have one common predicate variable each and hence, they are also correlated (see Table 6.3). 

Step 3 (Identify mutually exclusive control blocks): For this, we check whether a set of correlated control blocks of an integration colony satisfies the following two conditions.

(i) The integration colony must not have a node defining/redefining a common predicate variable of the correlated control blocks.

(ii) All predicates associated with the correlated control blocks must not be true altogether for a domain value of the common predicate variable(s).

To verify the second condition, we consider whether the predicate clauses conflict each other. If the above two conditions are satisfied, then the correlated control blocks would be mutually exclusive, that is, only one of them would be executed. An MM path is considered as infeasible if it covers at least two mutually exclusive control blocks and one conflicting predicate clause pair.

Example 6.5 In our example, we verify four sets of correlated control blocks found in three integration colonies for mutual exclusiveness. For this, we find the predicates asso- ciated with each set of correlated control blocks as shown in Table 6.3. Considering { S E S E S E } the first set, that is, (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), (Copt3 ,Copt3 ) , we find that their predicates: Option=‘Add Show’, Option=‘Update Show’, Option=‘Delete Show’ con- tradict each other. Further, we do not find single node in the main integration colony S E between Copt1 and Copt3 that defines/redefines the common variable Option. This S E S E S E implies that (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), and (Copt3 ,Copt3 ) are mutually exclusive. On the other hand, we find a message node m12 corresponding to the message 12 that de- fines the common variable aShow of the set of correlated control blocks {(CS ,CE ), alt2 alt2 (CS ,CE )}, violating the first necessary condition. This rules out the possibility alt4 alt4 of that set of correlated control blocks to be mutually exclusive. For similar rea- son, the other two sets of correlated control blocks are not mutually exclusive. This means, the main integration colony has only one set of mutually exclusive control { S E S E S E } blocks (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), (Copt3 ,Copt3 ) and as a consequence, its MM paths are subject to the infeasibility verification. After checking the coverage of these MM

127 6. Identification of Infeasible Paths

Table 6.3: Correlated control blocks of three integration colonies.

Set of correlated Set of predicates Satisfiability Common variable Mutually control blocks of predicates modification exclusive

S E (Copt1 ,Copt1 ) Option=‘Add Show’ S E (Copt2 ,Copt2 ) Option=‘Update Show’ No No Yes S E (Copt3 ,Copt3 ) Option=‘Delete Show’ (CS ,CE ) aShow≠ Null alt2 alt2 (CS ,CE ) aShow≠ Null Yes Yes No alt4 alt4 (CS ,CE ) i

b b b b b b b b paths, we find that twelve MM paths namely M7 , M8 , M9 , M10, M11, M12, M13, M14, b b b b M15, M16, M17, M18 cover at least two out of three mutually exclusive control blocks S E S E S E  (Copt1 ,Copt1 ), (Copt2 ,Copt2 ), (Copt3 ,Copt3 ) and hence, they are infeasible.

Correctness of Detect Infeasibility MUX: Correctness of this algorithm depends on correctness of the following three steps.

Step 1: Identify independent control blocks

Step 2: Determine correlated control blocks

Step 3: Identify mutually exclusive control blocks

We now prove correctness of individual steps by method of contradiction.

Lemma 6.2.1. Detect Infeasibility MUX correctly identifies independent control blocks

Proof: Let us assume that Detect Infeasibility MUX identifies any two arbitrary control blocks (CB1 and CB2) as independent control blocks. It can be possible that CB1 can contain CB2. However, as per the definition of independent control blocks, they must not contain each other, which contradicts our former assumption. 

128 6.2. Our approach

Lemma 6.2.2. Detect Infeasibility MUX correctly determines correlated control blocks

Proof: Let us assume that Detect Infeasibility MUX determines any two inde- pendent control blocks (CB1 and CB2) as correlated control blocks. It can be possible that CB1 and CB2 do not have a common predicate variable. However, as per the definition of correlated control blocks, CB1 and CB2 must have common predicate variable, which contradicts our former assumption. 

Lemma 6.2.3. Detect Infeasibility MUX correctly identifies mutually exclusive con- trol blocks

Proof: Let us assume that Detect Infeasibility MUX identifies any two correlated control blocks (CB1 and CB2) as mutually exclusive control blocks. There can be situations where the common variable used in predicates of CB1 and CB2 has been modified or their predicates are true altogether. As per the infeasibility property for MUX pattern, such pair of control blocks CB1 and CB2 can not be mutually exclu- sive, which contradicts our assumption.  From Lemmas 6.2.1, 6.2.2, 6.2.3, we can infer that Detect Infeasibility MUX is correct.

For NLC interaction pattern: We propose an algorithm that we have named as Detect Infeasibility NLC. This algorithm detects whether concatenation of a pair of MM paths is infeasible or not. Detect Infeasibility NLC has two major steps as discussed below. Step 1 (Identify correlated integration colony pairs): We consider a pair of integration colonies (IC1, IC2) as correlated, iff IC2 has a pair of incoming and out- ˆ ˆ going scope edges (l,m ˆ ) and IC1 has an edge (x,y) between the origin node of l and the target node ofm ˆ . Each such pair (IC1, IC2) along with the edge (x,y) is added to the list LIST , which is initially empty.

Example 6.6

Applying the step 1 to our example (see Fig. 6.4), we find that IC1 has the in- coming scope edge (m , CS ) and the outgoing scope edge (CE , CS ), whereas 6 loop1 alt1 alt2 the main integration colony (MIC) has an edge (m , CS ) between the origin node 6 alt2 of (m , CS ) and target node of (CE , CS ). This implies that (MIC, IC ) is a 6 loop1 alt1 alt2 1 correlated integration colony pair. Similarly, we find another pair (MIC, IC2). 

129 6. Identification of Infeasible Paths

Step 2 (Check infeasibility for concatenation of MM paths): For each pair of correlated integration colonies (IC1, IC2) in LIST , we find the set of potentially infeasible MM pairs {M1} × {M2} such that M1 is an MM path of IC1 and contains a nullify test predicate (one which checks null value) for a variable RV ar, whereas

M2 is an MM path of IC2 and has a return value RV al for RV ar. For each such MM pair (M1, M2), we now check whether they satisfy the following conditions.

i) M1 must not have a node defining/redefining RV ar between the point of its return and the nullify test predicate.

ii) The nullify test predicate must not be true for the return value (RV al).

If these two conditions are found to be true, then a contradictory situation arises where the expected value of RV ar to satisfy the predicate does not match its prior assigned return value (RV al), which is not possible unless RV ar is modified. In such situation, the MM pair (M1, M2), that is, the concatenation of M1 and M2, is infea- sible.

Input: SIG, set of integration colonies and their MM paths; Output: Infeasible MM path pairs; Begin; ; /*Step 1: Identify correlated integration colony pairs*/ for each integration colony IC1 do for each integration colony IC2 do Identify the pair of incoming and outgoing scope edges (ˆl,m ˆ ) which enclose IC2 in the SIG; Find the origin node x of the scope edge ˆl; Find the target node y of the scope edgem ˆ ; if IC1 has the edge (x,y) then Add the pair of integration colonies (IC1, IC2) along with the edge (x,y) into LIST ; ; /*IC1 and IC2 are correlated*/ end end end ; /*Step 2: Check infeasibility for concatenation of MM paths of correlated integration colonies*/ for each pair of integration colonies in LIST do Find the product set {M1} × {M2} where M1 is an MM path of IC1 and references a return variable RV ar; M2 is an MM path of IC2 and is associated with a return value RV al; for each pair of MM paths (M1, M2) in the product set do Find the predicate clause Cl used for nullify check of RV ar along M1; if Cl exists and RV ar is not modified between x and Cl then ; /*x is the place of return*/ ConflictStatus = CheckConflict(Cl,RV al); if ConflictStatus == true then Mark the pair of MM paths (M1, M2) as infeasible; end end end end End Algorithm 4: Detect Infeasibility NLC

130 6.2. Our approach

Input: Predicate clause Cl, return value RV al Output: Boolean value Begin; if Cl is (RV ar ≠ null) and RV al = null then return true; end if Cl is (RV ar == null) and RV al ≠ null then return true; end return false; End Algorithm 5: CheckConflict

Example 6.7

Examining the coverage information of MM paths of the pair (MIC, IC1), we b b b b b b b b b b b b find that M3 , M4 , M7 , M8 , M11, M12, M13, M14, M15, M16, M17, M18 of MIC have c c nullify test predicate for aShow. On the other hand, two MM paths M1 and M2 of

IC1 have the return values ShowList[i] and null for aShow, respectively. Therefore, { b b b b b b b b b b all MM pairs in the product set M3 , M4 , M7 , M8 , M11, M12, M13, M14, M15, M16, b b } × { c c} M17, M18 M1 , M2 are subject to infeasibility verification. Consider an MM b c b ̸ c pair (M3 , M2 ) where M3 has the nullify test predicate as aShow = null and M2 has null return value that can not satisfy the predicate. The fact that aShow has not been modified between the point of its return and nullify test predicate confirms the b c infeasibility of the MM pair (M3 , M2 ). Similarly, we find other infeasible MM pairs { b b b b b b } × { c} { b b b b b } × { c} M4 , M8 , M12, M14, M16, M18 M1 and M7 , M11, M13, M15, M17 M2 . Repeating the above procedure to the pair (MIC, IC2), we obtain the product { b b b b b b b b b b b b } × { d d} set M5 , M6 , M9 , M10, M11, M12, M13, M14, M15, M16, M17, M18 M1 ,M2 whose each MM pair is subject to infeasibility check. Verifying the necessary conditions, we { b b b b b b } × { d} find the following infeasible MM pairs M6 , M10, M12, M14, M16, M18 M1 and { b b b b b b } × { d}  M5 , M9 , M11, M13, M15, M17 M2 .

Correctness of Detect Infeasibility NLC: Correctness of this algorithm depends on correctness of the following two steps.

Step 1: Identify correlated integration colony pairs

Step 2: Determine infeasibility of MM pairs (pair of MM paths)

We now prove correctness of individual steps by contradiction.

Lemma 6.2.4. Detect Infeasibility NLC correctly identifies correlated integration colony pairs

131 6. Identification of Infeasible Paths

Proof: Let us assume that Detect Infeasibility NLC identifies any two arbitrary integration colonies (IC1 and IC2) as correlated. That means, IC2 may have incoming and outgoing scope edges whereas IC1 does not have an edge between the source node of that incoming scope edge and the target node of that outgoing scope edge. As per the definition of correlated integration colony pair, they can not be correlated, which contradicts our former assumption. 

Lemma 6.2.5. Detect Infeasibility NLC correctly determines infeasibility of MM pairs

Proof: Let us assume that Detect Infeasibility NLC yields an arbitrary MM pair

(M1, M2) belonging to a correlated integration colony pair as infeasible MM pair.

That means, M1 may modify common predicate variable or may have the nullify test predicate of a return variable, which is satisfiable with the value returned from M2.

However, as per the infeasibility property for NLC pattern, such MM pair (M1, M2) can not be infeasible, which contradicts our former assumption. 

From Lemmas 6.2.4, 6.2.5, we can imply that Detect Infeasibility NLC is correct.

Discussion: Let us now examine whether the modification of the object reference under nullify test of an NLC pattern has any effect on our infeasibility detection technique. Figure 6.5 illustrates two possible cases of the modification. In case the object reference under consideration is modified with null value (Fig. 6.5(a)), its null value check will be void and as a result, unreachable code will arise. On the other hand, if the object reference is modified with a non-null value (Fig. 6.5(b)) then the code corresponding to the return of the original object reference will have no effect and hence, will act as dead code. These two undesirable situations are not supposed to occur in meaningful code (see Fig. 6.5). In these undesirable situations, the infeasibility decision made by our algorithm for NLC pattern would not be appropriate. This is because, the object reference used for checking null value is not the same reference that was actually returned, violating the infeasibility conditions for NLC pattern. To prevent such undesirable situations from occurring, we suggest code review prior to applying our technique to discard any modifications made to the object ref- erence between the point of its return and nullify test. This, of course, would help to avoid unreachable or dead code arising out of the above modification.

132 6.3. Experimental results

c1:m1() c1:m1() c2:m2() c2:m2()

… …. .

If(found==true) If(found==true) { { rVar=o2.m2(); return List[i] rVar=o2.m2(); return List[i] Dead . } . } code . else . else . { . { rVar=null return null; rVar=aReference return null; Nullify . } Nullify . } test . test . . . If(rVar != null) If(rVar != null) { { Unreachable .. .. code } } else else { { .. .. } }

(a) Scenario1: reference variable ‘rVar’ is (b) Scenario2: reference variable ‘rVar’ is modified with null value modified with some other reference

Figure 6.5: Two scenarios with modification of the object reference under nullify test of NLC pattern.

6.3 Experimental results

In this section, we discuss our experimental results. This is followed by an analysis of the results.

6.3.1 Objectives

The objectives of our experiments are as follows.

i) Investigate the extent to which identified MUX and NLC interaction patterns make MM paths, MM pairs, and scenarios infeasible.

ii) Investigate the test effort savings achieved by excluding infeasible paths.

iii) Investigate whether the locations of interaction patterns influence the number of infeasible paths.

iv) Compare the computation overhead in detecting infeasible paths using SIG and control flow graph.

133 6. Identification of Infeasible Paths 6.3.2 Subject programs

Our approach targets to detect infeasible message paths in design level UML sequence diagrams. For experiments, we use sequence diagrams of RAS and AMS. The design specifications of RAS and AMS have been discussed in Section 4.2.1 of the Chapter 4 and Section 5.3.1 of the Chapter 5, respectively. To mitigate the subjective bias caused due to the usage of two systems of our own, we choose three open source applications namely Crimson (11.4 KLOC), Soot (274.5 KLOC), OpenGTS (139.4 KLOC). For them, we apply reverse engineering techniques to obtain their designs from the code. The purpose of using reverse en- gineered sequence diagrams for Crimson, SOOT, OpenGTS is to find the evidences of our infeasibility patterns in real-life software applications. We briefly discuss their functionalities in the following.

Crimson: Crimson is a Java XML parser which supports XML 1.0 [82].

SOOT: Soot is an optimization framework for Java bytecode [83]. Soot consists of three intermediate representations namely Baf (a streamlined representation of bytecode), Jimple (a typed 3-address intermediate representation), Grimp (an ag- gregated version of Jimple suitable for decompilation and code inspection) [84]. It supports transformations between these intermediate representations as well as opti- mization techniques on these representations.

OpenGTS: OpenGTS is an open source project designed to provide web-based GPS tracking services for vehicles [85]. OpenGTS supports the following features: (1) integration with mapping service providers such as Google Maps, Microsoft Virtual Earth; (2) customizable reports to show historical data for a specific vehicle; (3) cus- tomizable geofenced areas (geozones) to provide arrival/departure events on reports etc. The average cyclomatic complexity, LOC and number of class methods of different packages of Crimson, SOOT, OpenGTS are given in Table 6.4.

6.3.3 Effect of MUX and NLC interaction patterns

In this subsection, we analyze the effect of MUX and NLC interaction patterns on infeasibility of MM paths, MM pairs, and scenarios. First, we discuss our experiments

134 6.3. Experimental results

Table 6.4: The characteristics of open source softwares.

Sys Package Avg. cyclomatic LOC Number of complexity methods org.apache.crimson.parser 5.31 4316 220 org.apache.crimson.tree 2.41 3766 385 org.apache.crimson.util 9.44 467 23 Crimson org.xml.sax.helpers 1.76 1711 183 soot 2.09 9604 1049 soot.coffi 3.28 9349 363 soot.dava.toolkits.base.AST.structuredAnalysis 3.52 2712 190 soot.dava.toolkits.base.AST.transformations 6.03 6598 207 soot.jimple.toolkits.pointer 2.72 2213 208 soot.jimple.toolkits.thread.mhp 3.96 4235 194 soot.jimple.toolkits.thread.synchronization 7.08 3968 97 soot.jimple.toolkits.typing 5.25 6909 240 Soot soot.javaToJimple 3.58 6489 375 soot.dava 3.87 1789 74 soot.dava.toolkits.base.finders 6.04 2255 81 soot.jimple.toolkits.annotation.arraycheck 4.63 3618 149 soot.jimple.toolkits.annotation.nullcheck 3.47 1392 80 soot.toolkits.astmetrics 2.74 696 50 soot.jimple.spark.builder 1.92 466 43 org.opengts.db 2.90 17881 1262 org.opengts.db.dmtp 3.64 2904 140 org.opengts.db.tables 2.48 23978 2124 org.opengts.tools 11.75 2392 38 org.opengts.geocoder 1.83 849 93 org.opengts.war.track 6.52 2338 65

OpenGTS org.opengts.util 2.62 29590 2706 org.opengts.servers.gtsdmtp 2.54 827 57 org.opengts.war.track.page 6.15 11092 272 org.opengts.war.events 9.59 461 10 org.opengts.war.report.field 2.47 5497 463

135 6. Identification of Infeasible Paths with RAS and AMS. Subsequently, we investigate the effects of MUX and NLC pat- terns in three open-source applications. RAS and AMS: We count the number of MUX and NLC patterns found in different sequence diagrams of RAS and AMS as shown in Table 6.5. From this Table, we can observe that almost all sequence diagrams (except Generate Statistics of RAS and Generate Show Statistics of AMS) contain one or more NLC pattern(s), whereas seven sequence diagrams have one MUX pattern each (see Table 6.5). Following our approach, we build SIG models for all design level sequence di- agrams of RAS and AMS, identify their integration colonies, apply our algorithm Detect Infeasibility MUX on the individual integration colonies to detect infeasible MM paths. We then measure how many of the total MM paths of a SIG are infeasible as shown in Table 6.6. From Table 6.6, we can observe that 14% to 80% MM paths of different sequence diagrams are infeasible due to MUX pattern(s). Now, we examine whether there exists some correlation between the cyclomatic complexity (i.e. total number of MM paths) of the integration colony containing MUX pattern(s) and % infeasible MM paths. Since their distributions are not known, we

Table 6.5: Number of MUX and NLC patterns in the sequence diagrams for use cases of RAS and AMS.

System Use case #Integration colonies No of interaction patterns (IC) MUX NLC Deliver Order 3 1 2 Manage Item 3 1 2 Process Order 4 1 3

RAS Generate Bill 2 0 1 Generate Statistics 1 1 0 Pay Bill 3 0 1 Book Ticket 2 0 1 Cancel Ticket 2 0 1 Compute Sale Commission 3 0 1

AMS Pay Commission 2 1 1 Manage Show 3 1 2 Generate Show Statistics 4 1 0

136 6.3. Experimental results

Table 6.6: Infeasibility of MM path, MM pairs, scenario paths.

MM path MM pair Scenario

Sys. Use case Total Infeasible % Infeasible Total Infeasible % Infeasible Total Infeasible % Infeasible (A) (B) (C) (D) (E) (F ) (G) (H) (I) Deliver Order 17 4 23.52 30 19 63.33 30 21 70 Manage Item 22 12 54.54 48 44 91.66 50 44 88 Process Order 105 84 80.00 534 507 94.94 678 663 97.78

RAS Generate Bill 6 0 0 8 4 50 8 4 50 Generate Statistics 8 4 50 0 0 0 8 4 50 Pay Bill 6 0 0 8 3 37.5 6 3 50 Book Ticket 5 0 0 6 3 50 6 3 50 Cancel Ticket 5 0 0 5 2 40 4 2 50 Compute Sale Commission 6 0 0 6 2 33.33 6 3 50

AMS Pay Commission 7 1 14.28 10 6 60 10 6 60 Manage Show 22 12 54.54 48 44 91.66 50 44 88 Generate Show Statistics 13 4 30.76 20 15 75 18 12 66.67

apply Spearman’s non-parametric method [86] to compute the individual ranks of both cyclomatic complexity and % infeasible MM paths (see Table 6.7), which are used to determine the correlation coefficient (r) as per the following equation. ∑ (6 × d2) r = 1 − (6.1) n × (n2 − 1) = 0.86 (n = 7)

The high value of r indicates strong positive correlation between them at 0.025% sig- nificance level [87]. From this, we can imply that the occurrence of MUX interaction pattern(s) in an integration colony with higher cyclomatic complexity causes higher percentage of infeasible MM paths. Next, we apply our algorithm Detect Infeasibility NLC on pairs of integration colonies of individual SIGs to detect infeasible MM pairs, that is, concatenation of

137 6. Identification of Infeasible Paths

Table 6.7: Correlation of infeasible MM paths with cyclomatic complexity. AR − Use case CR rank rank = 2 Cyclomatic complexity % infeasible MM paths A C d d (A) (C) (AR) (CR) Deliver Order 17 23.52 4 6 2 4 Manage Item 22 54.54 2.5 2.5 0 0 Process Order 105 80.00 1 1 0 0 Generate Statistics 8 50 6 4 −2 4 Pay Commission 7 14.28 7 7 0 0 Manage Show 22 54.54 2.5 2.5 0 0 Generate Show Statistics 13 30.76 5 5 0 0 pairs of MM paths are infeasible or not. We also take other infeasible MM pairs into account, for which at least one MM path is infeasible due to MUX pattern(s). The total MM pairs and their infeasibility percentage of all SIGs are reported in Table 6.6. From Table 6.6, we can find that different graph models have 33% to 94% infea- sible MM pairs. To identify potential dependent factors of % infeasible MM pairs, we compute Spearman’s correlations [86] (1) between % infeasible MM paths (C) and % infeasible MM pairs (F ), (2) between total MM pairs (D) and % infeasible MM pairs (F ). For them, we compute their individual ranks and obtain the correlation coefficient r as 0.96 at 0.005% significance level for the first pair (C and F ) and 0.92 at 0.005% significance level for the second pair (D and F )[87] (see Table 6.8). This indicates strong positive correlation for both pairs. From this, we can conclude that (1) higher percentage of infeasible MM paths in graph model induces higher percent- age of infeasible MM pairs, (2) the occurrence of NLC/MUX interaction pattern in a graph model with higher number of MM pairs causes higher percentage of infeasible MM pairs. Next, we identify the infeasible scenarios after checking whether they contain some infeasible MM path or MM pair. From Table 6.6, we can observe that different graph

138 6.3. Experimental results Table 6.8: Correlation of infeasible MM pairs with their dependent factors. ) ) F C ) D ) ) Use case ) DR CR DR CR FR − − FR FR = = Rank ( Rank ( Rank ( 2 1 2 1 2 2 % Infeasible MM path ( Total MM pairs ( % Infeasible MM pairs ( C D F d d d d Deliver Order 23.52 30 63.33 5 4 5 0 0 1 1 Manage Item 54.54 48 91.66 2.5 2.5 2.5 0 0 0 0 Process Order 80.00 534 94.94 1 1 1 0 0 0 0 Generate Bill 0 8 50.00 9 7.5 7.5 −1.5 2.25 0 0 Pay Bill 0 8 37.50 9 7.5 10 1 1 2.5 6.25 Book Ticket 0 6 50.00 9 9.5 7.5 −1.5 2.25 −2 4 Cancel Ticket 0 5 40.00 9 11 9 0 0 −2 4 Compute Sale Commission 0 6 33.33 9 9.5 11 2 4 1.5 2.25 Pay Commission 14.28 10 60.00 6 6 6 0 0 0 0 Manage Show 54.54 48 91.66 2.5 2.5 2.5 0 0 0 0 Generate Show Statistics 30.76 20 75 4 5 4 0 0 −1 1 models in RAS and AMS have 50% to 97% infeasible scenarios. To identify potential dependent factors of the percentage of infeasible scenarios, we determine Spearman’s correlations [86] (1) between % infeasible MM paths (C) and % infeasible scenarios (I), (2) between % infeasible MM pairs (F ) and % infeasible scenarios (I) (see Ta- ble 6.9). The values of correlation coefficient for these two pairs are 0.88 at 0.001% significance level and 0.94 at 0.001% significance level, respectively [87]. This implies that a graph model with a larger number of infeasible MM paths or MM pairs, results in higher percentage of infeasible scenarios. Open source applications: In our experimentation, we have used reverse engi- neered sequence diagrams of Crimson, SOOT, OpenGTS. Note that the SIG of a re- verse engineered sequence diagram contains the artifacts belonging to single method and thus, is equivalent to an integration colony. For this reason, we apply the algo- rithm for MUX pattern on the individual SIGs of the reverse engineered sequence

139 6. Identification of Infeasible Paths

Table 6.9: Correlation of infeasible scenarios with theirs dependent factors. ) ) C ) F I ) )

Use case ) CR FR CR FR IR − − IR IR = = Rank ( Rank ( Rank ( 3 2 3 4 2 4 % Infeasible MM paths ( % Infeasible MM pairs ( % Infeasible scenarios ( C F I d d d d Deliver Order 23.52 63.33 70 6 5 4 −2 4 −1 1 Manage Item 54.54 91.66 88 2.5 2.5 2.5 0 0 0 0 Process Order 80.00 94.94 97.78 1 1 1 0 0 0 0 Generate Bill 0 50.00 50 10 7.5 9.5 −0.5 0.25 2 4 Generate Statistics 50.00 0 50 4 12 9.5 5.5 30.25 −2.5 6.25 Pay Bill 0 37.50 50 10 10 9.5 −0.5 0.25 −0.5 0.25 Book Ticket 0 50.00 50 10 7.5 9.5 −0.5 0.25 2 4 Cancel Ticket 0 40.00 50 10 9 9.5 −0.5 0.25 0.5 0.25 Compute Sales Commission 0 33.33 50 10 11 9.5 −0.5 0.25 −1.5 2.25 Pay Commission 14.28 60.00 60 7 6 6 −1 1 0 0 Manage Show 54.54 91.66 88 2.5 2.5 2.5 0 0 0 0 Generate Show Statistics 30.76 75 66.67 5 4 5 0 0 1 1

diagrams to determine how many of the total number of MM paths of an SIG are infeasible. Table 6.10 shows the number of MM paths in different packages used for our experiments and the number of MUX patterns found in the respective packages. From Table 6.10, we can observe that the MUX patterns cause 30% infeasible MM paths (72 out of 236) in Crimson, 42% infeasible MM paths (1174 out of 2773) in SOOT, 26% infeasible MM paths (540 out of 2044) in OpenGTS.

For NLC pattern, we identify the pairs of SIGs, where one SIG would contain the nullify test of a return variable and another SIG would contain both the null and non-null return values for the return variable. Applying our algorithm for NLC pattern to such pairs of SIG models, we measure how many of their total MM pairs

140 6.3. Experimental results

Table 6.10: Infeasible paths of open source applications for MUX patterns.

Sys. Source package # Instances # MM paths # Infeasible MM paths org.apache.crimson.parser 4 64 29 org.apache.crimson.tree 4 172 43

Crimson Total 236 72 (30.50%) soot.dava.toolkits.base.finders 1 810 324 soot.javaToJimple 10 291 222 soot.dava 2 736 296 soot.jimple.toolkits.thread.mhp 1 24 10 soot.jimple.spark.builder 1 40 18 soot.jimple.toolkits.annotation.arraycheck 1 440 160 soot.jimple.toolkits.annotation.nullcheck 1 432 144 Soot Total 2773 1174 (42.33%) org.opengts.db 2 1928 482 org.opengts.war.tools 1 60 20 org.opengts.dbtools 1 56 38 OpenGTS Total 2044 540 (26.41%) are infeasible. We have identified 47 instances of NLC patterns in Crimson, 166 in OpenGTS, 132 in SOOT as shown in package wise of Table 6.11. From this Table, we can observe that NLC patterns cause 55% infeasible MM pairs (630 out of 1145) in Crimson, 49% infeasible MM pairs (3656 out of 7360) in SOOT, 52% infeasible MM pairs (1047 out of 1982) in OpenGTS.

6.3.4 Influence of infeasible paths on test effort estimation

We now determine the test effort saving that can be achieved by excluding infeasible paths. For this, we apply the concept of Almeida et al.’s approach for test effort estimation based on use case points [88] in the following.

a) Estimating actor’s weight: In this step, a weight is assigned to an actor for each use case with which the actor interacts. An actor (can be human or an external system) is of three possible types: simple, medium, and complex [88].

141 6. Identification of Infeasible Paths

Table 6.11: Infeasible paths of open source applications based on NLC patterns.

Sys. Source package Destination package # Instances # MM pairs # Infeasible MM pairs org.apache.crimson.parser org.apache.crimson.parser 20 894 496 org.apache.crimson.tree org.apache.crimson.tree 19 189 103 org.apache.crimson.util org.apache.crimson.parser 1 8 4 org.apache.crimson.tree 3 22 11 Crimson org.xml.sax.helpers org.apache.crimson.parser 2 10 5 org.xml.sax.helpers 2 22 11 Total 1145 630 (55.02%) soot soot 19 245 135 soot.javaToJimple 1 9 5 soot.coffi soot.coffi 3 69 37 soot.dava.toolkits.base.AST soot.dava.toolkits.base.AST 11 1426 656 .structuredAnalysis .structuredAnalysis soot.dava.toolkits.base.AST 13 249 116 Soot .transformations soot.dava.toolkits.base.AST soot.dava.toolkits.base.AST 67 5117 2585 .transformations .transformations soot.jimple.toolkits.pointer soot.jimple.toolkits.pointer 8 77 35 soot.jimple.toolkits.thread soot.jimple.toolkits.thread 2 44 27 .mhp .mhp soot.jimple.toolkits.thread soot.jimple.toolkits.thread 5 46 22 .synchronization .synchronization soot.jimple.toolkits.typing soot.jimple.toolkits.typing 2 54 26 soot.jimple.toolkits.typing 1 24 12 .fast Total 7360 3656 (49.67%) org.opengts.db org.opengts.tools 4 44 19 org.opengts.db.tables 4 60 26 org.opengts.db 28 200 99 org.opengts.geocoder 2 24 10 org.opengts.war.* 7 75 41 org.opengts.cellid 1 4 2 org.opengts.util 1 8 4 org.opengts.db.dmtp 1 9 5 org.opengts.servers.template 1 6 3 org.opengts.db.dmtp org.opengts.war.report.dmtp 2 30 16 org.opengts.servers.gtsdmtp 2 43 24 org.opengts.db.dmtp 3 33 16 OpenGTS org.opengts.war.track.page.* 2 8 4 org.opengts.db.tables org.opengts.db.tables 52 675 368 org.opengts.db 13 121 66 org.opengts.tools 1 12 7 org.opengts.servers.* 8 138 70 org.opengts.db.dmtp 4 30 16 org.opengts.war.* 27 444 242 org.opengts.geocoder 2 12 6 org.opengts.cellid 1 6 3 Total 1982 1047 (52.82%)

142 6.3. Experimental results Table 6.12: Computation of UAW for RAS and AMS.

System Actor Number of use cases Weight Partial UAW Total UAW Manager 3 1 3 7 RAS Staff 4 1 4 Manager 4 1 4 6 AMS Agent 2 1 2

Simple actors interact with systems through GUI and have the weight one. The medium actors interact through protocols and are assigned the weight two. On the other hand, complex actors interact through API and have the weight three. Accordingly, we find the total weight of all actors and their sum as Unadjusted Actor Weight (UAW )[88]. For example, manager and staff interact with three and four use cases of RAS, respectively. Considering manager and staff as simple actors (i.e. weight is one), we find their UAW values as three and four, respectively and hence, the total UAW as seven. Similarly, we find the value of UAW for AMS as six (see Table 6.12). b) Estimating use case’s weight: In this step, a weight is assigned to a use case depending on the types and complexities of the constituent scenarios. The scenarios (which correspond to the paths in sequence diagrams) of a use case are categorized into two types: normal and exceptional [88]. A scenario is considered as exceptional if it sends a message to some boundary object showing incompleteness or unsuccessfulness of a task; otherwise, the scenario is treated

as normal. A weight PT is assigned to a use case according to the following expression

PT = N × PN + E × PE

where N and E are the number of normal and exceptional scenarios, PN and PE

are their weights [88]. The values of PN and PE are subject to the cumulative effort required to carry out design, implementation, and testing of the scenarios, which is substantially high for the normal scenarios compared to exceptional ones. Without loss of generality, the values of parameters can be assumed as

PN = 1 and PE = 0.5. That is,

PT = N + E × 0.5

143 6. Identification of Infeasible Paths

Table 6.13: Computation of UUCW for RAS and AMS.

Sys. Use case Normal Exceptional Total weight

scenario (N) scenarios (E) PT Deliver Order 8 [2] 22 [7] 19 [5.5] Process Order 672 [12] 6 [3] 675 [13.5] Generate Statistics 7 [3] 1 [1] 7.5 [3.5] Generate Bill 2 [1] 6 [3] 5 [2.5]

RAS Pay Bill 4 [2] 2 [1] 5 [2.5] Manage Item 49 [5] 1 [1] 49.5 [5.5] Make Order 1 [1] 0 [0] 1 [1] UUCW 762 [34] Book Ticket 2 [1] 4 [2] 4 [2] Cancel Ticket 2 [1] 2 [1] 3 [1.5] Compute Sale Commission 4 [2] 2 [1] 5 [2.5] Pay Commission 6 [2] 4 [2] 8 [3]

AMS Manage Show 17 [3] 33 [3] 33.5 [4.5] Generate Show Statistics 17 [5] 1 [1] 17.5 [5.5] UUCW 71 [19]

The PT values for all use cases (#UC) of a system are summed to obtain its Unadjusted Use Case Weights (UUCW) [88] by using the following expression

#∑UC UUCW = PT (6.2) i=1

By applying this to our case studies, we identify the normal and exceptional scenarios of all use cases and find how many of those are feasible. This has been shown in Table 6.13. For example, the use case Process Order has 672 normal and 6 exceptional scenarios, out of which twelve normal and three exceptional scenarios are feasible. Considering all scenarios and feasible scenarios of all use cases, we obtain the UUCW values of RAS as 762 and 34, respectively. Similarly, we find the pair of UUCW values for AMS as 71 and 19 (see Table 6.13). We add the UAW and UUCW values of a system to obtain its Unadjusted Use

144 6.3. Experimental results

Case Points (UUCP) value [88]. The UUCP of RAS and AMS are obtained as follows. [ ] { 762 + 7 = 769 for all scenarios UUCP = RAS 34 + 7 = 41 for feasible scenarios [ ] { 71 + 6 = 77 for all scenarios UUCP = AMS 19 + 6 = 25 for feasible scenarios

c) Estimation of use case points: In this step, the use case points are deter- mined using UUCP along with technical and environmental factors, which may have a bearing on testing. These factors include testing tools, documented inputs, documented environment, testing environment, test-ware reuse, dis- tributed systems, performance, security features and complex interfacing. Each factor is assigned with a value (v) in the range [0-5] according to its availability in the testing environment and a weight (w) in the interval [0-4] depending on its importance to the testing. The summation of the products of v and w of all factors is referred to as technical complexity factor (TEF )[88]. Tables 6.14 and 6.15 show the values and weights of nine technical and environmental fac- tors influencing our test environment. From this, we can compute the value of technical complexity factor (TEF ) as 59. The value of UUCP is adjusted with TEF to obtain Adjusted Use Case Points (AUCP) [88] as follows.

AUCP = UUCP × [0.65 + 0.01 × TEF ] (6.3)

By using Eq. 6.3, we find a pair of AUCP values (AUCPa, AUCPf ) of a system by considering all scenarios and all feasible scenarios, respectively. In case of RAS and AMS, the values of these pairs are as follows. [ ] { 769 × [0.65 + 0.01 × TEF ] = 953.56 for RAS AUCPa = 77 × [0.65 + 0.01 × TEF ] = 95.48 for AMS (6.4) [ ] { 41 × [0.65 + 0.01 × TEF ] = 50.84 for RAS AUCPf = 25 × [0.65 + 0.01 × TEF ] = 31 for AMS (6.5)

145 6. Identification of Infeasible Paths

Table 6.14: The values of technical and environmental factors.

Assigned Factor Description Reason for choosing the value value

F1 Testing tools 4 Testing tools are available for testing activity

F2 Documented inputs 5 Software documentation is fully available to the testing team.

F3 Development environ- 3 Information about development environment ment is partially available to the testing team.

F4 Test environment 3 Test environment is not fully defined.

F5 Test-ware reuse 0 Reuse of tools does not influence our test en- vironment.

F6 Distributed system 0 Test environment is not configured to support distributed systems.

F7 Performance 1 Software specification does not explicitly men- tion about performance requirements.

F8 Security features 0 Test environment is not configured to support security features.

F9 Complex interfacing 2 Testing teams interact using simple interfaces.

d) Estimating savings in test effort with infeasibility detection: If the testing effort per use case point is K person-hour, then the total test effort (E) for AUCP use case points would be as

E = AUCP × K

Let Ea and Ef be the test effort required for use case points AUCPa and

AUCPf , respectively. Assuming the value of K as 1 person-hour, we find the savings in test effort (S) for RAS and AMS as follows.

[ ] = Ea − Ef S = (953.56 − 50.84) person-hour RAS = 112.84 person-day [1 person-day = 8 person-hour ] [ ] S = (95.48 − 31) person-hour AMS = 8.06 person-day

That is, we can save 112 and 8 person-day testing effort for RAS and AMS after excluding infeasible paths. Now, we define the percentage savings in test

146 6.3. Experimental results

Table 6.15: Computation of TEF.

Assigned Extended Factor Weight Reason for choosing the value value value

F1 4 3 Automated test tools are necessary to use. 12 In case of their unavailability, manual process can be followed.

F2 5 4 Documented inputs are necessary for deciding 20 test coverage.

F3 3 4 Information regarding development environ- 12 ment is important as UML design specifica- tions are used for testing.

F4 3 3 It is important to define test environment. 9

F5 0 0 Reuse of tools is not relevant in our context. 0

F6 0 0 There is no need to consider the factors asso- 0 ciated with distributed systems.

F7 1 2 Performance is not so desirable. 2

F8 0 0 Security features are not required to consider 0 in our case.

F9 2 2 Consideration of complex interfacing is not so 4 important. TEF 59

effort (%S) as follows.

E − E %S = a f × 100% Ea

We mow examine the average test effort savings per use case for RAS and AMS.

For this, we compute the values of UUCPa (with all paths) and UUCPf (with feasible paths) of individual use cases and thus, obtain the average value of %S as 54.34% (see Table 6.16). This implies that we can save on average 54.34% test effort per use case for RAS and AMS with prior detection of infeasible paths irrespective of the test environment.

6.3.5 Influence of locations of interaction patterns

We now investigate whether location of an interaction pattern has any influence on the number of infeasible paths arising out of the pattern. To identify the location of an interaction pattern, we need to know the position of integration colony where

147 6. Identification of Infeasible Paths

Table 6.16: Computation of % S for use cases of RAS and AMS.

Sys. Use case UUCPa UUCPf %S Deliver Order 19 + 1 = 20 5.5 + 1 = 6.5 67.5% Process Order 675 + 1 = 676 13.5 + 1 = 14.5 97.85% Generate Statistics 7.5 + 1 = 8.5 3.5 + 1 = 4.5 47.05% Generate Bill 5 + 1 = 6 2.5 + 1 = 3.5 41.66%

RAS Pay Bill 5 + 1 = 6 2.5 + 1 = 3.5 41.66% Manage Item 49.5 + 1 = 50.5 5.5 + 1 = 6.5 87.12% Make Order 1 + 1 = 2 1 + 1 = 2 0% Book Ticket 4 + 1 = 5 2 + 1 = 3 40% Cancel Ticket 3 + 1 = 4 1.5 + 1 = 2.5 37.5% Compute Sale Commission 5 + 1 = 6 2.5 + 1 = 3.5 41.66% Pay Commission 8 + 1 = 9 3 + 1 = 4 55.55%

AMS Manage Show 33.5 + 1 = 34.5 4.5 + 1 = 5.5 84.05% Generate Show Statistics 17.5 + 1 = 18.5 5.5 + 1 = 6.5 64.86% Average % S 54.34% the interaction pattern occurs. For this, we consider two parameters called Path-In and Path-Out. For an SIG and one of its integration colony (say IC), the parameter Path-In of IC is defined as the number of incoming paths from the start node of SIG to the start node of IC, whereas the parameter Path-Out of IC is defined as the number of outgoing paths from the end node of IC to the end node of SIG. Presence of some interaction pattern (MUX or NLC) in an integration colony causes infeasibility of at least one path in that integration colony. Therefore, the minimum number of paths that can be infeasible due to some interaction pattern in an integration colony is the product of its Path-In and Path-Out. In this regard, we may note that positions of two integration colonies are interchangeable if they have the same value of product of Path-In and Path-Out. An example of such a pair of integration colonies is IC1 and IC2 of Manage Show (see Fig. 6.3 and Fig. 6.4). We now determine the product of Path-In and Path-Out of the integration colonies of all sequence diagrams and the number of infeasible MM pairs due to some interac- tion pattern in those integration colonies as shown in Table 6.17. Note that we have not included main integration colonies of all sequence diagrams as they represent core

148 6.3. Experimental results integral parts of the graph models and hence, are not subject to restructuring. From Table 6.17, we can observe the following. First, higher product of Path-In and Path- Out of an integration colony implies larger number of infeasible MM pairs. Second, integration colones that belong to same SIG and have equal product value contain the same number of infeasible MM pairs. The examples of such integration colonies

Table 6.17: Effect of the locations of integration colonies. Path-out ) ) × Use case AR BR ) Infeasible MM pairs ) Path-in A B 2 ( (

IC Path-in Path-out A Rank ( B Rank ( d= BR- AR d

IC1 1 15 15 17 8 8 0 0 Delivery Order IC2 2 2 4 2 13.5 17 3.5 12.25

IC3 2 10 20 22 5.5 5.5 0 0 Manage Item IC4 10 2 20 22 5.5 5.5 0 0

IC5 1 339 339 183 1 1 0 0

Process Order IC6 2 156 312 162 2.5 2.5 0 0

IC7 26 12 312 162 2.5 2.5 0 0

Generate Bill IC8 1 4 4 4 13.5 12 -1.5 2.25

Pay Bill IC9 1 3 3 3 16 14 -2 4

Book Ticket IC10 1 3 3 3 16 14 -2 4

Cancel Ticket IC11 1 2 2 2 18 17 -1 1

Compute Sale Commission IC12 1 3 3 2 16 17 1 1

Pay Commission IC13 1 5 5 6 12 10 -2 4

IC14 2 10 20 22 5.5 5.5 0 0 Manage Show IC15 10 2 20 22 5.5 5.5 0 0

IC16 1 8 8 3 9 14 5 25

Generate Show Statistics IC17 2 3 6 6 10.5 10 -0.5 0.25

IC18 6 1 6 6 10.5 10 -0.5 0.25

149 6. Identification of Infeasible Paths are IC3 and IC4 of Manage Item, IC6 and IC7 of Process Order etc. We now examine the correlation between the product of Path-in and Path-Out of an integration colony and the number of affected infeasible MM pairs. For this, we apply Spearman’s correlation [86] and find the value of correlation coefficient to be 0.95. This implies a strong positive correlation between them. From this statistical observation, we can conclude that if we can scale down the value of the product of Path-In and Path-Out of an integration colony, then the number of infeasible paths can be reduced. One possible way to achieve this is to reconstruct the corresponding sequence diagram, that is, to move its integration colony downward/upward without affecting the overall behavior.

6.3.6 Compare computation overhead with SIG and CFG

In this subsection, we compare computation overhead for detecting infeasible paths using SIG and control flow graph (CFG). Let us consider an SIG with N scenarios, M MM paths, where a scenario contains on an average P number of MM paths. To detect infeasible scenarios using SIG, we need to check their constituent MM paths with respect to MUX pattern and MM pairs with respect to NLC pattern. For this, it is necessary to estimate the number of MM pairs to be checked. Regarding this, we point out two observations: (1) each scenario contains one MM path from main integration colony; (2) no other MM paths can call MM path of the main integration colony. With this, we can imply that a scenario with P MM paths can call at most (P -1) number of MM paths, which means that the maximum (P -1) MM pairs can be contained in that scenario. Therefore, computation involved (TSIG) in detecting infeasibility of N number of scenarios using SIG is for checking M number of MM paths and M × α number of MM pairs, where α is a factor satisfying the inequality M × α < N × P . That is,

TSIG = M + M × α

Let us compute the overhead for detecting infeasible paths using an equivalent control flow graph with N number of scenarios. Unlike SIG, we need to check all scenarios of control flow graph individually, where the starts and ends of constituent MM paths are implicit. In case of control flow graph, the computation overhead

(TCFG) is equivalent to verifying P number of MM paths and (P -1) number of MM

150 6.3. Experimental results pairs for each scenario. In other words,

TCFG = [P + P − 1] × N = [2P − 1] × N

The ratio of TCFG and TSIG is as follows.

T (2P − 1) × N CFG = = Z TSIG M + M × α

Since P × N > M × α and N > M, we find

(P + 1) × N > M + M × α

Applying this, we reduce Z as

2P − 1 Z > (6.6) P + 1

For TCFG to be higher than TSIG (i.e. Z > 1), P must have a value that is greater than or equal to 2. To validate the above theoretical reduction, we compute the values of the param- eters N, M, M × α, P , Z for all sequence diagrams of RAS and AMS as shown in Table 6.18. From this Table, we can see that the value of P is greater than two for all sequence diagrams (except Generate Statistics) whose scenarios contain one MM path each. This substantiates that detection of infeasible paths incurs higher overhead using control flow graph than using SIG, when the scenarios on an average contain more than one MM path.

6.3.7 Threats to validity

We now discuss the threats to the validity of our approach.

1. Construct threats: This threat concerns selection of appropriate parameters for our experiments. First parameter that we have measured is the percentage of infeasible paths as used in the existing work [51, 35]. In addition to this, we have used another parameter (S) to determine the savings in test effort based on use case points [88, 89]. This measurement is justified for refinement of test effort estimation after taking infeasible paths into consideration. Further, we

151 6. Identification of Infeasible Paths

Table 6.18: Computation of Z for the different sequence diagrams of RAS and AMS.

Sequence diagram N M M × α P Z Deliver Order 30 17 30 2.26 2.24 Manage Item 50 22 48 2.6 3.00 Process Order 678 105 534 3.84 7.08 Generate Bill 8 6 8 2 1.71 Generate Statistics 8 8 0 1 1 Pay Bill 6 6 8 2.66 1.85 Book Ticket 6 5 6 2 1.63 Cancel Ticket 4 5 4 2.5 1.77 Compute Sale Commission 6 6 4 2.66 2.59 Pay Commission 10 7 10 2 1.76 Manage Show 50 22 48 2.6 3 Generate Show Statistics 18 13 20 2.83 2.54

have considered the product of two parameters: Path-In and Path-Out while investigating influence of location of an interaction pattern on the number of infeasible paths arising out of that pattern. Our investigation is similar to the existing study on effects of the number of predicates to path infeasibility [90].

2. Internal threats: This threat concerns the occurrence of errors in the construc- tion of graph models, detection of infeasible MM paths (pairs). To avoid this kind of errors, we have manually checked the graph models with correspond- ing sequence diagrams, infeasible MM paths (pairs) detected by our proposed algorithms with the actual infeasible MM paths (pairs). To establish the re- lationships among different parameters, we have computed Spearman’s rank correlation coefficient [86], which is a standard practice followed in the litera- ture.

3. External threats: This threat arises from a concern about the generality of case studies used in our experiments as well as the contexts where MUX and NLC interaction patterns occur. To study the effects of these two patterns, we have used two systems (RAS & AMS) of our own. In this regard, please note that the code size of our systems (smaller compared to real-life commercial software) does not appear to be an issue so far experiments with design level sequence

152 6.4. Comparison with related work

diagrams for use cases is concerned. This is because, complex use cases of large softwares are decomposed into a number of relatively simpler use cases following the standard decomposition principle [91]. Due to this decomposition, the design specifications of such large softwares may contain a larger number of sequence diagrams for the use cases than our systems (RAS & AMS), but abstraction levels of their individual sequence diagrams (which is measured by the product of number of objects and messages) are similar to our sequence diagrams. Fur- ther, in our empirical study we have used three open source applications Soot, OpenGT S, Crimson to mitigate the possibility of any subjective bias. Note that these applications have also been considered by other researchers [35]. However, the limited number of case studies used in our experiments should not be influencing factor when our two patterns occur in common situations. In this regard, we point out that MUX interaction pattern occurs while modeling the state-dependent object interactions. On the other hand, NLC interaction pattern occurs while modeling the object selection by means of delegation from one object to another. Since state-dependent behavior, object selection, dele- gation are the common characteristics of the object-oriented systems [92], the occurrence contexts of MUX and NLC interaction patterns are not specific to some case studies, rather they refer to a general situation.

6.4 Comparison with related work

We now compare our technique first with the existing static techniques [35, 37] and then with dynamic techniques [63, 36, 38]. Static techniques are based on infeasibility patterns such as identical/complement-decision [35], mutually-exclusive-decision [35], check-then-do [35], looping-by-flag [35], type infeasible call chain [37]. Table 6.19 depicts the characteristics of these patterns, whether they are similar to our patterns and are applicable in Java and/or sequence diagram. From an analysis of Table 6.19, we make the following observations.

1) Complement-decision and mutually exclusive decision are similar to our MUX pattern. Note that conditions of complement-decision pattern are complement to each other and hence are mutually exclusive as well.

2) Our NLC pattern is novel and such infeasibility pattern has not been reported in the literature.

153 6. Identification of Infeasible Paths

Table 6.19: Comparison of characteristics of different infeasibility patterns.

Pattern Main features Applicable in Similar to MUX NLC Among a set of actions, only one action Mutually Java code + is allowed to perform. Action refers to √ exclusive Sequence × message sending, modification/definition decision diagram of variables etc. Java code + Identical Conditions are independent and identical Sequence × × decision to each other diagram Java code + Complement Conditions are independent and comple- √ Sequence × decision ment to each other diagram Successful checking of some conditions Check- activates other action to perform. The Java code × × then-do activation is enabled through setting a value to flag variable. Flag variable is used to terminate loop. Looping- For this, initialization and modification × × by-flag Java code of flag variable is necessary Type Multiple caller calls a method with a pa- infeasible rameter of polymorphic type, which can Java code × × call be of different types (i.e. Object) chain

3) The patterns namely check-then-do, looping-by-flag, type infeasible call chain are not applicable in sequence diagrams. This is because, check-then-do, looping- by-flag consist of code-artifacts related to the assignment of flag variable, which is not captured in sequence diagrams. On the other hand, for type infeasible call chain pattern, multiple callers would send different types of object references via a polymorphic typed parameter to callee. To detect infeasibility for this pattern, we need the information regarding actual type of the object references, which can not be known from sequence diagrams.

Our approach is comparable with the existing techniques based on mutually-exclusive- decision and complement-decision. If we apply the existing algorithms for these two patterns to our graph model (SIG, which is the input of our approach as well as control flow graph), then they may identify the control blocks belonging to differ- ent method scopes as mutually exclusive where the the control blocks are actually

154 6.4. Comparison with related work not mutually exclusive. This can be possible when the control blocks are influenced by the variables with the same name and their predicates are not satisfiable. On the contrary, our approach avoids this situation by identifying integration colonies (subgraph whose model elements in the same method scope) of SIG and processing them individually. Note that SIG for sequence diagram can contain model elements belonging to multiple method scopes. Let us now compare infeasibility detection capability of our patterns with the existing patterns. As per our empirical study, the existing patterns namely mutually- exclusive-decisive, complement-decision, check-then-do patterns cause considerable number of infeasible MM paths in Java programs. To investigate their effects in three applications namely Crimson, SOOT, and OpenGTS, we have determined the number of infeasible MM paths for them as shown in Table 6.20. For our MUX pattern being similar to complement-decision and mutually-exclusive- decisive, we can identify as many infeasible MM paths as detected using these two existing patterns. Next, we find how much percentage of total infeasible MM paths for three patterns can be detected using MUX pattern, that is, 80% in SOOT (1273 out of 1574), 95% in OpenGTS (540 out of 568), 44% in Crimson (72 out of 161) (see Table 6.20). However, our approach can not detect the infeasible paths for check- then-do pattern. This limitation is due to the use of model information where the constituent code artifact of check-then-do pattern, that is, the assignment of flag vari- able is not captured. Next, we compare detection capability of the existing type infeasible call chain pattern and our NLC pattern with regard to infeasibility of MM pairs. To investigate their effects in SOOT, OpenGTS, Crimson, we have determined the number of infea- sible MM pairs as shown in Table 6.21. We find the percentage of total infeasible MM pairs for two patterns that can be detected using NLC pattern, that is, 86% in SOOT

Table 6.20: Number of infeasible MM paths in three open-source applications.

System Total MM Number of infeasible MM paths paths Complement-decision Mutually-exclusive-decisive Check-then-do SOOT 4059 324 949 301 OpenGTS 2227 20 520 28 Crimson 488 30 42 89

155 6. Identification of Infeasible Paths Table 6.21: Number of infeasible MM pairs in three open-source applications.

System Total MM pairs Number of infeasible MM pairs NLC Type Infeasible Call Chain SOOT 8300 3656 574 OpenGTS 2250 1047 148 Crimson 1145 630 0

(3656 out of 4230), 87% in OpenGTS (1047 out of 1195), 100% in Crimson (630 out of 630). As we have already mentioned, that infeasibility due to type infeasible call chain can not be determined using the sequence diagrams, therefore consideration of this pattern is out of scope of our investigation. Let us now compare our approach with the existing dynamic approaches [63, 36, 38]. Bueno et al. [63] use dynamic information to decide lack in search progress over successive generations while finding test data for an intended path. Bueno et al.’s approach decides potential infeasibility of any arbitrary path, whereas our static approach decides infeasibility of those paths which satisfy certain infeasibility condi- tions. However, Bueno et al.’s approach results in large computation overhead due to use of genetic algorithm, whereas our approach incurs less overhead due to process- ing of model information only. Ngo et al. [36] decide path infeasibility with respect to few empirical properties for correlated conditional statements. Ngo et al. use program traces to check validity of the empirical properties, which can not be done with our static analysis. On the other hand, Gong et al. [38] use sample values of branch outcomes to determine branch correlations related to path infeasibility. Note that these techniques [36, 38] target to detect infeasible paths for branch correlations which are identifiable at runtime. However, Ngo et al.’s and Gong et al’s approaches are dependent on test generation process, whereas our approach is independent of such test generation process.

6.5 Conclusion

In this chapter, we have proposed two algorithms to detect infeasible paths in UML sequence diagrams with respect to MUX and NLC patterns. The NLC patterns occur when the object selection is modeled through delegation from one object to another. On the other hand, MUX pattern occurs while modeling state-dependent behavior of

156 6.5. Conclusion an object. For infeasibility detection, we have processed SIG that subsumes control flow graph of sequence diagram and additionally contains method scope information of interactions. Our experimental results show that the additional information in the SIG helps to reduce computation overhead for detecting infeasible paths. Applying our algorithms for infeasibility detection to the sequence diagrams of RAS and AMS, we identify 14% to 80% infeasible MM paths, 33% to 94% infeasible MM pairs, and 50% to 97% infeasible scenarios. In case of three open-source applica- tions: Crimson, SOOT , OpenGT S, we find 26% to 42% infeasible MM paths, 49% to 55% infeasible MM pairs. Our empirical study shows that we can save on an average 54% test effort per use case for RAS and AMS by excluding infeasible paths. This saving is realized by means of avoidance of the test data generation, test script preparation for the infea- sible paths. Further, prior detection of infeasible test scenarios helps practitioners to prepare an effective test plan in the early phase of software development. We observe from our experimental results that if we can scale down the value of the product of Path-In and Path-Out of an integration colony, then the number of infeasible paths can be reduced. One possible way to achieve this is by recon- structing the corresponding sequence diagram, that is, to move its integration colony downward/upward without affecting overall behavior.

157

Chapter 7

Conclusions and Future Research

In this thesis, we have presented a graph model called Sequence Integration Graph (SIG) and have reported our investigations concerning its three applications namely code generation, determination of effective coverage of MM paths, and identifica- tion of infeasible paths. In this chapter, we first summarize our important research contributions and then discuss the directions for future research.

7.1 Research contributions

We now examine our specific contributions against each research objective that we had set forth in the start.

Construction of graph model We have proposed a novel graph model of sequence diagram called Sequence Inte- gration Graph (SIG). In contrast to a conventional graph model (i.e. control flow graph), SIG subsumes control flow information and additionally, contains method scope information of the interactions. The method scope information is captured in SIG by means of scope edge pairs (representing the start and end of method scopes). For constructing SIG from XMI of sequence diagram, we use SAX parser to ex- tract meta-information necessary for node information. Further, we determine prece- dence relations among model elements (messages, start and end of fragments) to map them into control flows of our graph model. Processing of XMI of UML sequence diagram is indeed non-trivial task for large and complex applications, which none of the existing work [29, 65, 21, 20, 22] have

159 7. Conclusions and Future Research reported. We have presented an approach for processing XMI of sequence diagram in details.

Code generation We have proposed an approach to generate behavioral code (i.e. code inside class methods) from sequence diagrams. Our approach supports code generation for meth- ods belonging to different classes, whereas the existing work [40, 30, 45] limit to code generation for single class method. For code generation, we have applied a set of mapping rules to SIG model to map their model elements into code artifacts inside different class methods unlike the ex- isting work [40, 30, 45] which applied mapping rules to meta-models. The advantage of using our SIG model is that it has explicit method scope information that helps us to easily identify different class methods for which code has to be generated. On the other hand, it is difficult to identify such class methods from the meta-models. Our empirical study has shown that approximately 48% lines of code for controller classes can be generated from design level sequence diagrams, whereas for boundary and entity classes, it is 6% and 11%, respectively. Our results imply that code gen- eration from sequence diagrams designed for use cases is most effective for controller classes compared to entity and boundary classes. Our code generation approach is most suitable in situations where sequence dia- grams are designed to model the behavior of complex use cases [46, 47] and usually contain the messages belonging to multiple class methods. In such situations, ap- plicability of the work [40, 30, 45] is restricted because their sequence diagrams are assumed to have the messages belonging to a single class method.

Control of MM paths coverage We have proposed an approach to determine the effective coverage of MM paths while generating a subset of message paths from sequence diagrams. For this, we first gen- erate MM paths from SIG and subsequently, build MM coverage model to capture their call relationships as well as priority information. The effective coverage of MM paths is decided by analyzing MM coverage model and following two coverage cri- teria: All MM Paths (AMP) and Prioritized MM Paths (PMP). In case priority of MM paths is our concern, we use Prioritized MM Paths, otherwise All MM Paths is sufficient to use. To prioritize the MM paths, we have proposed a novel metric called AW P L (Av-

160 7.1. Research contributions erage Weighted Path Length), where higher AWPL value for an MM path implies that the MM path has common subpath with higher length and density (the average number of paths covering the subpath). Its significance is that even if a single fault exists in higher AWPL valued MM path, then it would cause more number of failures to occur during execution and hence, such MM paths should be covered more. Our proposed approach is most suitable in situations where all message paths can not be taken up for testing due to limited resource and budget constraints. This is particularly true when a message path has to be tested with multiple test data in order to expose the faults associated with state-transitions, predicates etc. In such situations, the existing work [18, 19, 20, 21, 22] are not effective since they primarily use control flow graph of sequence diagram and target all message paths to cover. To evaluate the efficacy of our approach, we have experimented with several sequence diagrams and have found that MM coverage model can be used for selecting a subset of all message paths satisfying our proposed coverage criteria, whereas control flow graph can not be used for the same even with prioritization of message paths. For satisfying our proposed coverage criteria, the number of messages paths must be greater than the minimum limit. As the number of test cases with which a mes- sage path has to be tested commensurate to its criticality and test budget, individual priority scores of the message paths can be indicative of their number of test cases.

Identification of infeasible paths We have investigated infeasibility of message paths in sequence diagrams due to NLC and MUX interaction patterns. The NLC pattern occurs while modeling the object selection through delegation from one object to another. On the other hand, MUX pattern occurs while modeling state-dependent behavior of an object. Our MUX pat- tern is similar to complement-decision and mutually exclusive decision [35], whereas none of the existing patterns have similarity with our NLC pattern. In fact, NLC pattern is new of its kind. We have proposed two algorithms to detect infeasible paths with respect to MUX and NLC patterns. Applying our algorithms to sequence diagrams of a few selected case studies, we identify 14% to 80% infeasible MM paths, 33% to 94% infeasible MM pairs, and 50% to 97% infeasible scenarios. In case of three open-source applications: Crimson, SOOT , OpenGT S, we find 26% to 42% infeasible MM paths, 49% to 55% infeasible MM pairs. Our empirical study has shown that we can save on average 54% test effort per

161 7. Conclusions and Future Research use case for RAS and AMS after excluding infeasible paths. This saving is realized by means of avoidance of test data generation and test script preparation for the infeasible paths. For detecting infeasible paths, we use UML model instead of using program code as done by the existing techniques [56, 61, 52, 54, 63, 55, 35, 37, 36]. Since our approach is UML model based, prior detection of infeasible test scenarios helps practitioners to prepare an effective test plan in the early phase of software development.

7.2 Directions for Future Research

There are a number of unexplored issues related to the research reported in this thesis that warrant further investigation. We mention here a few of those.

• In our research, we have used sequence diagrams designed for desktop appli- cations. More interesting observations and insight results may come out from applying our research to concurrent and real-time systems. For this, messages along with timing specifications can be used.

• Our code generation approach is based on sequence diagrams. It is worth further investigation to study the extent to which code of class methods can be gener- ated by considering other behavioral diagrams such as statechart and activity etc.

• In our research, we have used coverage information of MM paths for selecting a subset of message paths in sequence diagrams. These MM paths and state information of their associated objects can be combined to support state-based testing for MM paths. This needs further research.

• For prioritization of MM paths, we have used AWPL metric to measure the length and density of subpath common among MM paths. However, other metrics based on criticality of messages, number of possible object-states for message sending can be considered.

• We have investigated detection of infeasible paths in sequence diagrams. It may be interesting to investigate infeasibility of paths in other UML diagrams such as statechart diagrams.

162 7.2. Directions for Future Research

• Concurrent programming is increasingly becoming popular. In this context, defining suitable coverage model and coverage criteria for testing concurrent executions of messages is necessary and can be a promising direction for future research.

163

Bibliography

[1] D. Graham, E. V. Veenendaal, I. Evans, and B. Rex. Foundation of Software Testing: ISTQB Certification. Cengage Learning, third edition edition, 2008. 1

[2] Tim Kitchens. Automating software development processes, 20 June 2013. http://www.developerdotstar.com/mag/articles/automate software process.html. 1.1

[3] D. M. Rafi, K. R. K. Moses, K. Petersen, and M. V. Mantyla. Benefits and lim- itations of automated software testing: Systematic literature review and prac- titioner survey. In Automation of Software Test (AST), 2012 7th International Workshop on, pages 36–42, 2012. 1.1

[4] Randall W. Rice. Surviving the top 10 challenges of software test automation, 20 June 2013. http://www.crosstalkonline.org/storage/issue- archives/2002/200205/200205Rice.pdf. 1.1

[5] B. Hailpern and P. Tarr. Model-driven development: The good, the bad, and the ugly. IBM Systems Journal, 45(3):451–461, 2006. 1.2

[6] MDA. http://www.omg.org/mda/, 2 July 2013. 1.2, 1.2

[7] Mike Whalen. Why we model: Using MBD effectively in critical domains, 20 June 2013. http://2013.icse-conferences.org/documents/publicity/MiSE-WS-Whalen- slides.pdf. 1.2, ??

[8] http://www.uml.org/, 3 Mar 2011. 1.2

[9] http://www·omg·org/technology/xml/, 2 November 2010. 1.2, 1.3

[10] http://www.itu.int/rec/t-rec-z.109/en, 12 April 2013. 1.2

165 References

[11] J. M. Spivey. The Z Notation. Prentice Hal International (UK) Ltd, 1992. 1.2

[12] http://www.omg.org/mof/, 20 May 2011. 1.2

[13] G. Booch, J. Rumbaugh, and I. Jacobson. Object-oriented Analysis and Design. Addison Wesley, 2002. 1.2, 5.3.1

[14] D. Pilone and N. Pitman. UML 2.0 in a Nutshell. O’Reilly, June 2005. 1.2

[15] N. Medvidovic, D. S. Rosenblum, D. F. Redmiles, and J. E. Robbins. Modeling software architectures in the unified modeling language. ACM Transactions on Software Engineering and Methodology, 11(1):2–57, 2002. 1.2

[16] B. Selic. UML 2: A model-driven development tool. IBM Systems Journal, 45(3):607–620, 2006. 1.2

[17] F. Fraikin and T. Leonhardt. SeDiTeC - testing based on sequence diagrams. In IEEE International Conference on Automated Software Engineering (ASE02), pages 261– 266, 2002. 1.2, 2.2, 2.2

[18] O. Pilskalns, A. Andrews, S. Ghosh, and R. France. Rigorous testing by merging structural and behavioral UML representations. In International Conference on the Unified Modeling Language, pages 234–248, 2003. 1.2, 2.2, 2.2, 5.3.2, 5.4, 7.1

[19] M. Sarma, D. Kundu, and R. Mall. Automatic test case generation from UML sequence diagram. In Proceedings of the 15th International Conference on Ad- vanced Computing and Communications (ADCOM, 2007), pages 60–67. IEEE Computer Society, Washington, DC, 2007. 1.2, 2.2, 2.2, 5.3.2, 5.4, 7.1

[20] A. Nayak and D. Samanta. Model-based test cases synthesis using UML interac- tion diagrams. ACM SIGSOFT Software Engineering Notes, 34(2):1–10, March 2009. 1.2, 1.3, 2.2, 2.2, 3.3, 3.3, 3.7, 5.3.2, 5.4, 7.1

[21] Trung T. D.-T., S. Ghosh, and R. B. France. A systematic approach to generate inputs to test UML design models. In Proceedings of the 17th International Symposium on Software Reliability Engineering, pages 95–104. IEEE Computer Society, Washington, DC, 2006. 1.2, 2.2, 2.2, 3.3, 3.3, 3.7, 5.3.2, 5.4, 7.1

[22] A. Bandyopadhyay and S. Ghosh. Test input generation using UML sequence and state machines models. In Proceedings of International Conference on Software

166 References

Testing Verification and Validation, pages 121–130, 2009. 1.2, 1.3, 2.2, 2.2, 3.3, 3.3, 3.7, 5.3.2, 5.4, 7.1

[23] S. Ali, L. C. Briand, M. J. Rehman, H.Asghar, M. Z. Z. Iqbal, and A. Nadeem. A state-based approach to integration testing based on UML models. Information and Software Technology, 49(11-12):1087–1106, November 2007. 1.2, 2.2, 2.2, 5.4

[24] D. Kundu, M. Sarma, D. Samanta, and R. Mall. System testing for object- oriented systems with test case prioritization. Software Testing, Verification and Reliability, 19(4):297–333, December 2009. 1.2, 5.2.3, 1

[25] L. Briand, , Y. Labiche, , and Y. Liu. Combining UML sequence and state machine diagrams for data-flow based integration testing. In European conference on Modelling Foundations and Applications, pages 74–89. Springer-Verlag, 2012. 1.2, 2.2, 2.2

[26] M. Utting and B. Legeard. Practical Model-Based Testing: A Tools Approach. Morgan-Kaufmann, 2006. 1.2

[27] R. V. Binder. Testing Object-Oriented Systems Models, Patterns, and Tools. Addison Wesley, Reading, Massachusetts, October 1999. 1.2

[28] Leonard Gallagher, Jeff Offutt, and Anthony Cincotta. Integration testing of object-oriented components using finite state machines. Software Testing, Veri- fication and Reliability, 16(4):215–266, December 2006. 1.2

[29] V. Garousi, L. C. Briand, and Y. Labiche. Control flow analysis of UML 2.0 se- quence diagrams. Lecture Notes in Computer Science, 3748(1):160–174, February 2005. 1.3, 2.2, 3.3, 3.3, 3.7, 7.1

[30] M. Usman and A. Nadeem. Automatic generation of Java code from UML dia- grams using UJECTOR. International Journal of Software Engineering and Its Applications, 3(2):21–38, April 2009. 1.3, 2.1, 2.1, 4.2.1, 4.3, 7.1

[31] A. Jakimi and M. El Koutbi. An object-oriented approach to UML scenarios engineering and code generation. International Journal of Computer Theory and Engineering, 1(1):35–41, April 2009. 1.3, 2.1, 2.1, 4.3

[32] A. Rountev, S. Kagan, and J. Sawin. Coverage criteria for testing of object interactions in sequence diagrams. In International Conference on Fundamental

167 References

Approaches to Software Engineering (FASE’05), pages 289–304, 2005. 1.3, 2.2, 2.2, 5.4

[33] P. C. Jorgensen and C. Erickson. Object-oriented integration testing. Commu- nications of the ACM, 37(9):30–38, 1994. 1.3, 5.1

[34] R. Zhao and L. Lin. An UML statechart diagram-based MM-path generation approach for object-oriented integration testing. International Journal of Applied Mathematics and Computer Sciences, 3(1):22–27, 2007. 1.3, 5.1

[35] M. N. Ngo and H. B. K. Tan. Detecting large number of infeasible paths through recognizing their patterns. In Proceedings of the the 6th joint meeting of the European Software Engineering conference and the ACM SIGSOFT symposium on Foundations of Software Engineering, ESEC-FSE ’07, pages 215–224, New York, NY, USA, 2007. ACM. 1.3, 2.3, 2.3.1, 2.3, 2.3.2, 1, 3, 6.4, 7.1

[36] M. N. Ngo and H. B. K. Tan. Heuristics-based infeasible path detection for dy- namic test data generation. Information and Software Technology, 50(7-8):641– 655, June 2008. 1.3, 2.3, 2.3.2, 2.3, 2.3.2, 6.4, 6.4, 7.1

[37] A. L. Souter and L. L. Pollock. Type infeasible call chains. IEEE International Workshop on Source Code Analysis and Manipulation, page 0196, 2001. 1.3, 2.3, 2.3.1, 2.3, 2.3.2, 6.4, 7.1

[38] D. Gong and X. Yao. Automatic detection of infeasible paths in software testing. IET Software, 4(5):361–370, october 2010. 1.3, 2.3, 2.3.2, 2.3, 6.4, 6.4

[39] G. Engels, R. Hucking, S. Sauer, and A. Wagner. UML collaboration diagrams and their transformations to Java. In Proceedings of the 2nd International Con- ference on the UML, pages 473–488. LNCS Vol -1723, Springer, Berlin / Heidel- berg, 1999. 2.1, 2.1

[40] M. Thongmak and P. Muenchaisri. Design of rules for transforming UML se- quence diagrams into Java code. In Proceedings of the Ninth Asia-Pacific Soft- ware Engineering Conference (APSEC, 2002), pages 485–494. IEEE Computer Society, Washington, DC, 2002. 2.1, 2.1, 4.3, 7.1

[41] I. A. Niaz and J. Tanaka. An object-oriented approach to generate Java code from UML statecharts. International Journal of Computer and Information Science, 6(2), 2005. 2.1, 2.1

168 References

[42] J. Ali and J. Tanaka. Converting statecharts into Java code. In Proceed- ings of Fourth World Conference on Integrated Design and Process Technology (IDPT99). Dallas, Texas, USA, 2000. 2.1

[43] I. A. Niaz and J. Tanaka. Mapping UML statecharts to Java code. In Proceedings of IASTED International Conf. on Software Engineering (SE 2004), pages 111– 116. Innsbruck, Austria, 2004. 2.1

[44] Kurt Jensen. Coloured Petri Nets. Springer, 1996. 2.1

[45] A. G. Parada, E. Siegert, , and L. B. de Brisolara. Generating Java code from UML class and sequence diagrams. In Proceedings of the 2011 Brazilian Sympo- sium on Computing System Engineering, pages 99–101. IEEE Computer Society, Washington, DC, 2011. 2.1, 2.1, 4.3, 7.1

[46] K. McNeish. UML sequence diagrams. CoDe Magazine, March/April 2002. http://www.code-magazine.com/Article.aspx?quickid=0203081. 2.1, 4.3, 7.1

[47] B. J. Schmuller. Sams Teach Yourself UML in 24 Hours. Third edition, Sams Publishing, 2004. 2.1, 4.3, 7.1

[48] L. C. Briand and Y. Labiche. A UML-based approach to system testing. Software and Systems Modeling (Springer), 1(1):10–42, 2002. 2.2

[49] A. Andrews, R. France, S. Ghosh, and G. Craig. Test adequacy criteria for UML design models. Journal of Software Testing, Verification and Reliability, 13(2):95–127, June 2003. 2.2

[50] MIT Laboratory for Computer Science. http://alloy.mit.edu/alloy/, 20 June 2013. 2.2

[51] D. Hedley and M. A. Hennell. The causes and effects of infeasible paths in com- puter programs. In Proceedings of the 8th International Conference on Software Engineering, ICSE ’85, pages 259–266, Los Alamitos, CA, USA, 1985. IEEE Computer Society Press. 2.3, 2.3.1, 1

[52] R. Bod´ık,R. Gupta, and M. L. Soffa. Refining data flow information using infea- sible paths. In Proceedings of the 6th European Software Engineering Conference

169 References

held jointly with the 5th ACM SIGSOFT International Symposium on Founda- tions of Software Engineering, ESEC ’97/FSE-5, pages 361–377, New York, NY, USA, 1997. Springer-Verlag New York, Inc. 2.3, 2.3.1, 2.3.2, 2.3, 7.1

[53] N Malevris. A path generation method for testing LCSAJs that restrains infea- sible paths. Information and Software Technology, 37(8):435 – 441, 1995. 2.3, 2.3.1

[54] I. Forg´acsand A. Bertolino. Feasible test path selection by principal slicing. In Proceedings of the 6th European Software Engineering Conference held jointly with the 5th ACM SIGSOFT International Symposium on Foundations of Soft- ware Engineering, ESEC ’97/FSE-5, pages 378–394, New York, NY, USA, 1997. Springer-Verlag New York, Inc. 2.3, 2.3.1, 7.1

[55] J. Yan and J. Zhang. An efficient method to generate feasible paths for basis path testing. Information Processing Letters, 107(3-4):87–92, July 2008. 2.3, 2.3.1, 7.1

[56] L.A. Clarke. A system to generate test data and symbolically execute programs. IEEE Transactions on Software Engineering, SE-2(3):215–222, 1976. 2.3.1, 7.1

[57] http://www·nag·co·uk/numeric/fl/fldescription.asp, 2 November 2011. 2.3.1

[58] D. Yates and N. Malevris. Reducing the effects of infeasible paths in branch testing. ACM SIGSOFT Software Engineering Notes, 14(8):48–54, November 1989. 2.3.1

[59] R. Mall. Fundamentals of Software Engineering. Prentice-Hall of India Pvt.Ltd, 2004. 2.3.1

[60] R. Bod´ık,R. Gupta, and M. L. Soffa. Interprocedural conditional branch elimi- nation. In Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, PLDI ’97, pages 146–158, New York, NY, USA, 1997. ACM. 2.3.1

[61] J. Zhang and X. Wang. A constraint solver and its application to path feasi- bility analysis. International Journal of Software Engineering and Knowledge Engineering, 11(2):139–156, 2001. 2.3.1, 2.3, 7.1

170 References

[62] Jian Zhang. Specification analysis and test data generation by solving boolean combinations of numeric constraints. In Quality Software, 2000. Proceedings. First Asia-Pacific Conference on, pages 267–274, 2000. 2.3.1

[63] P. M. S. Bueno and M. Jino. Identification of potentially infeasible program paths by monitoring the search for test data. In Proceedings of the 15th IEEE International Conference on Automated Software Engineering, ASE ’00, pages 209–218, Washington, DC, USA, 2000. IEEE Computer Society. 2.3.2, 2.3, 6.4, 6.4, 7.1

[64] Superstructure Version 2.2 OMG Unified Modeling LanguageTM (OMG UML). http://www.omg.org/spec/UML/2.2/superstructure, 2 November 2010. 3.2, 3.4

[65] Technical report: Control Flow Analysis of UML 2.0 Sequence Diagrams. http://134.117.61.33/pubs/tech report/tr sce-05-09.pdf, septermber 2005. 3.3, 3.3, 3.7, 7.1

[66] http://www·magicdraw·com/, 2 November 2010. 3.3, 4.2.1

[67] K. Barclay and J. S. Amsterdam. Object-Oriented Design with UML and Java. Elsevier Butterworth-Heinemann, Linacre House, Jordan Hill, Oxford OX2 8DP 200 Wheeler Road, Burlington, MA 01803, 2004. 3.4

[68] http://xerces.apache.org/xerces2-j/, 2 November 2010. 3.4

[69] B. Bates and K. Sierra. Head First Java. O’Reilly, 1005 Gravenstein Highway North, Sebastopol, CA 95472., 2003. 4.2.1

[70] H. Schildt. Java 2: The Complete Reference. Fifth Edition, McGraw- Hill/Osborne, 2002. 4.2.1

[71] H. M. Deitel and P. J. Deitel. Java How to Program. Sixth Edition, Prentice Hall, 2004. 4.2.1

[72] F. Fioravanti and P. Nesi. Estimation and prediction metrics for adaptive main- tenance effort of object-oriented systems. IEEE Transactions on Software Engi- neering, 27(12):1062–1083, December 2001. 4.2.1

[73] http://www.eclipse.org/, 2 November 2010. 4.2.1

171 References

[74] http://netbeans·org/, 2 November 2010. 4.2.1, 4.2.1

[75] http://www.nid.iitkgp.ernet.in/isd/., 2 November 2010. 4.2.1, 5.3.1, 6.2

[76] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algo- rithms. The MIT Press, Cambridge , Massachusetts London, England McGraw- Hill Book Company, 2001. 4.2.2

[77] http://www.eclipse.org/tptp/, 10 February 2012. 4.2.3

[78] http://www.borland.com/us/products/together/index.aspx, 10 February 2012. 4.3

[79] D. Kundu. Model-driven Testing for Object-Oriented Systems with Test Case Prioritization. MS Thesis, School of Information Technology, IIT Kharagpur, 2008. 5.2.1

[80] http://www.vassarstats.net/textbook/ch12a.html, 2 November 2011. ??, 5.8, 5.9, 5.3.2, 5.3.2, ??, 5.11, 5.12, 5.3.2

[81] http://www.objectmentor.com/resources/articles/ocp.pdf, 2 November 2011. 6.1

[82] http://xml.apache.org/crimson/, 2 March 2013. 6.3.2

[83] http://www.sable.mcgill.ca/soot/, 2 March 2013. 6.3.2

[84] R. Vall´ee-Rai,P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a java bytecode optimization framework. In Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research (CASCON), page 13. IBM Press, 1999. 6.3.2

[85] http://opengts.sourceforge.net/, 2 March 2013. 6.3.2

[86] Spearman’s rank correlation coefficient. http://en.wikipedia.org/wiki/spearman’s rank correlation coefficient, November 2011. 6.3.3, 6.3.3, 6.3.3, 6.3.5, 2

[87] J. H. Zar. Significance testing of the spearman rank correlation coefficient. Jour- nal of the American Statistical Association, 67(339):578–580, 1972. 6.3.3, 6.3.3

172 References

[88] E. R. C. de Almeida, B. T. de Abreu, and R. Moraes. An alternative approach to test effort estimation based on use cases. In International Conference on Software Testing Verification and Validation, pages 279– 288. IEEE CS, 2009. 6.3.4, 6.3.4, 6.3.4, 6.3.4, 6.3.4, 6.3.4, 1

[89] S. Nageswaran. Test effort estimation using use case points. Quality Week, pages 1–6, 2001. 1

[90] N Malevris, D.F. Yates, and A Veevers. Predictive metric for likely feasibility of program paths. Information and Software Technology, 32(2):115–118, 1990. 1

[91] Alistair Cockburn. Writing Effective Use Cases. Addison Wesley, 2000. 3, 3

[92] D. A. Taylor. Object Technology: A Manager’s Guide. Addison-Wesley Profes- sional, 1998. 3

173

Appendix A

Restaurant Automation System (RAS)

175 Restaurant Automation System interaction DeliverOrderSD[ DeliverOrderSD ]

<> <> <> <> <> <> <> <> <> SOF : SelectOrderForm POC : ProcessOrderController ORRG : OrderRegister OrderList[i] : Order aOrder : Order S : Screen IR : ItemRegister ItemList[i] : Item aItem : Item

1: DeliverOrder(=OrderNumber) 2: aOrder = FindOrder(=OrderNumber)

loop-1 loop [i < OrderList.size() and found=false] 3: found = MatchOrder(=OrderNumber)

alt-1 alt [found=true] 4: OrderList[i]

[else] null5:

alt-2 alt [aOrder!=null] 6: Status = GetStatus()

break-1 break [Status="New Order" or Status="Pending" or Status="Delivered"] opt-1 opt

[Status="New Order"] 7: DisplayMessage(Msg="Order is newly registered and needs to be processed")

opt-2 opt [Status="Pending"] 8: DisplayMessage(Msg="Order is partially processed and some item is not available")

opt-3 opt [Status="Delivered"] 9: DisplayMessage(Msg="Order is already delivered")

opt-4 opt [Status="Processed"] 10: NameList = GetItemList() 11: NumItemPlateList = GetItemPlateNumberList()

loop-2 loop [i< NameList.size()] alt-3 alt [NumItemPlateList[i]=0] 12: DisplayMessage(Msg="Item is not included in this order")

[else] 13: aItem = FindItem(=NameList[i])

loop-3 loop [j < ItemList.size() and found=false]

14: found = MatchItemName(=NameList[i])

alt-4 alt

[found=true] 15: ItemList[j]

[else] 16: null

opt-5 opt [aItem!=Null] 17: DeductItemAvailability(=NumItemPlateList[i])

18: SetStatus(Status="Delivered")

19: DisplayMessage(Msg="Order is being ready for delivery")

[else] 20: DisplayMessage(Msg="Order is not found while delivering the order")

21:

Figure A.1: Sequence diagram of Deliver Order use case.

176 Restaurant Automation System

interaction ProcessOrderSD[ ProcessOrderSD ]

<> <> <> <> <> <> <> <> <> : SelectOrderForm POC : ProcessOrderController ORRG : OrderRegister OrderList[i] : Order aOrder : Order : Screen IR : ItemRegister ItemList[j] : Item aItem : Item

1: ProcessOrder(=OrderNumber) 2: aOrder = FindOrder(=OrderNumber) loop-1 loop [i < OrderList.size() & found=false] 3: found = MatchOrder(=OrderNumber)

alt-1 alt [found=true] 4: OrderList[i]

[else] nu ll5: null5:

alt-2 alt [aOrder!=Null] 6: Status = GetStatus()

7: NameList = GetItemList()

8: NumPlateList = GetItemPlateNumberList()

9: UnReservedItemList = GetUnReservedItemList()

opt-1 opt [Status="New Order"] loop-2 loop [i< NameList.size()] 10: aItem = FindItem(=NameList[i])

loop-3 loop [j< ItemList.size() & found=false] 11: found = MatchItemName(=NameList[i])

alt-3 alt

[found=true] 12: ItemList[j]

[else] n u ll13: null13:

alt-4 alt [aItem!=null] 14: ReservedStatus = ReserveItem(=NumPlateList[i])

opt-2 opt [ReservedStatus="false"] 15: UpdateUnReserveItemList(=NameList[i], ="Add")

16: SetStatus(Status="Pending")

17: DisplayMessage(Msg="Item is being prepared")

[else] 18: DeleteOrderItem(=NameList[i])

19: DisplayMessage(Msg="Item is not being prepared today")

20: UpdatedStatus = GetStatus()

opt-3 opt [UpdatedStatus=Status] 21: SetStatus(Status="Processed")

opt-4 opt [Status = "Pending"]

loop-4 loop [i< UnReservedItemList.size()] 22: aItem = FindItem(=UnReservedItemList[i]) loop-5 loop [j< ItemList.size & found=false] 23: found = MatchItemName(=UnReservedItemList[i])

alt-5 alt [found=true] 24: ItemList[j]

[else] null25:

alt-6 alt [aItem!=null] 26: ReservedStatus = ReserveItem(=NumPlateList[i])

opt-5 opt [ReservedStatus=true] 27: UpdateUnReserveItemList(=UnReservedItemList[i], ="Remove")

[else] 28: DeleteOrderItem(=UnReservedItemList[i])

29: UnReservedItemListEmpty = IsEmptyUnReservedItemList()

opt-6 opt [UnReservedItemListEmpty = true] 30: SetStatus(Status="Processed")

31: DisplayMessage("Order is being ready to serve")

opt-7 opt [Status ="Processed"] 32: DisplayMessage("Order is already processed")

[else] 33: DisplayMessage("Order is not found while processing the order")

34:

177 Figure A.2: Sequence diagram of Process Order use case. Restaurant Automation System

interaction GenerateStatisticsSD[ GenerateStatisticsSD ]

<> <> <> <> <> <> <> <> <> : GenerateStatisticsForm GSC : GenerateStatisticsController ORRG : OrderRegister OrderList[i] : Order : Screen IR : ItemRegister ItemList[i] : Item BRG : BillRegister BillList[i] : Bill

1: GenerateStatistics(Option=)

opt-1 opt [Option=Order] 2: OrderList = GetOrderList()

loop-1 loop [i < OrderList.length] 3: OrderIDList = GetOrderID()

4: OrderStatusList = GetStatus()

5: BillingStatusList = GetBillingStatus()

6: PrintStatistics(==Order, =OrderIDList, =OrderStatusList, =BillingStatusList)

opt-2 opt [Option=Item] 7: ItemList = GetItemList()

loop-2 loop [i < ItemList.length] 8: ItemNameList = GetItemName()

9: ItemPlateNumberPreparedList = GetPreparationStatus()

10: ItemPlateNumberAvailableList = GetAvailableStatus()

11: PrintStatistics(=Item, =ItemNameList, =ItemPlateNunberPreparedList, =ItemPlateNumberAvailableList)

opt-3 opt [Option=Bill] 12: BillList = GetBillList()

loop-3 loop [i < BillList.length] 13: BillNumberList = GetBillNo()

14: BillingStatusList = GetStatus()

15: PrintStatistics(=Bill, =BillNumberList, =BillingStatusList, =)

16:

Figure A.3: Sequence diagram of Generate Statistics use case.

178 Restaurant Automation System

interaction GenerateBillSD[ GenerateBillSD ]

<> <> <> <> <> <> <> <> : SelectOrderForm POC : ProcessOrderController ORRG : OrderRegister OrderList[i] : Order aOrder : Order : Screen BRG : BillRegister BillList[i] : Bill

1: GenerateBill(=OrderNumber) 2: aOrder = FindOrder(=OrderNumber)

loop-1 loop [i< OrderList.size() and found=false] 3: found = MatchOrder(=OrderNumber)

alt-1 alt [found=true] 4: OrderList[i]

[else] null5: null5:

alt-2 alt

[aOrder!=Null] 6: Status = GetStatus()

break-1 break

[Status != "Delivered"] 7: DisplayMessage("Bill will be generated after delivery of the order")

8: BillList = GetBillList()

loop-2 loop [i< BillList.size()] 9: BillFound = MatchOrderID(=OrderNumber)

break-2 break

[BillFound=true] 10: DisplayMessage("Bill is already generated")

11: <> aBill : Bill

12: GenerateBillNo()

13: SetBillInfo(=OrderNumber)

14: BillNumber = GetBillNo()

15: BillAmount = CalculateBill()

16: AddBill(=aBill)

17: DisplayMessage(Msg="Bill Number:" Bill Number)

[else] 18: DisplayMessage("Order is not found while generating bill")

19:

Figure A.4: Sequence diagram of Generate Bill use case.

179 Restaurant Automation System

interaction PayBillSD[ PayBillSD ]

<> <> <> <> <> <> <> <> <> <> : PaymentForm POC : ProcessOrderController BRG : BillRegister BillList[i] : Bill aBill : Bill ORRG : OrderRegister OrderList[i] : Order aOrder : Order PRG : PaymentRegister : Screen

1: PayBill(=BillNumber, =PaymentMode, =ChequeNo, =IssingDate, =IssuingBranch, =BonusCardValue)

2: aBill = FindBill(=BillNumber)

loop-1 loop [i

alt-1 alt [found=true] 4: BillList[i]

[else] null5: null5:

alt-2 alt [aBill!=null] 6: OrderID = GetOrderID()

7: aOrder = FindOrder(=OrderID)

loop-2 loop [i

8: found = MatchOrder(=OrderID)

9: OrderList[i]

10: BillAmount = GetBillAmount()

11: RebateAmount = GenerateRebate(=BillAmount, =BonusCardValue)

alt [PaymentMode="Cheque"] 12: <> aChequePayment : ChequePayment 13: SetPaymentInfo(==BillNumber, =(BillAmount-RebateAmount), =PaymentDate, =PaymentMode, =ChequeNo, =IssuingDate, =IssuingBranch) alt-3 14: AddPayment(=aChequePayment)

[else] 15: <> aPayment : Payment 16: SetPaymentInfo(==BillNumber, =(BillAmount-RebateAmount), =PaymentDate, =PaymentMode)

17: AddPayment(=aPayment)

18: SetStatus(="Paid")

19: SetBillingStatus(="Paid")

20: NewBonusCardValue = GenerateNewCardValue(=BonusCardValue)

21: DisplayBonusCard(=RebateAmount, =NewBonusCardValue)

[else] 22: DisplayMessage(Msg="Invalid Bill number: Try again")

23:

Figure A.5: Sequence diagram of Pay Bill use case.

180 Restaurant Automation System

interaction ManageItemSD[ ManageItemSD ]

<> <> <> <> <> : ManageItemForm MIC : ManageItemController IR : ItemRegister ItemList[i] : Item : Screen

1: ManageItem(Option=, ItemName=, ItemPlateNumber=, ItemPrice)

opt-1 opt [Option=AddItem] 2: <> aItem : Item

3: SetItemInfo(ItemName=, ItemPlateNumber=, ItemPrice=)

4: AddItem(=aItem)

opt-2 opt [Option=PrepareItem]

5: aItem = FindItem(=ItemName)

loop-1 loop [i < ItemList.length & found=false] 6: found = MatchItemName(=ItemName)

alt alt-1 [found=true] 7: ItemList[j]

[else] nu ll8: null8:

alt alt-2 [aItem!=null] 9: AddItemAvailability(=ItemPlateNumber)

[else] 10: DisplayMessage(Msg="Item is not found")

opt-3 opt [Option=DeleteItem] 11: aItem = FindItem(=ItemName)

loop-2 loop [i < ItemList.length & found=false]

12: MatchItemName(=ItemName)

alt-3 alt [found=true] 13: ItemList[i]

[else] 14: null

alt alt-4 [aItem!=null] 15:

[else] 16: DisplayMessage(Msg="Item has not been found")

17:

Figure A.6: Sequence diagram of Manage Item use case.

181 Restaurant Automation System

interaction Make Order[ Make Order ]

<> <> <> <> : OrderForm MOC : MakeOrderController : Screen : OrderRegister

1: MakeOrder(=ItemNameList, =NumPlateList)

2: <> aOrder : Order

3: GenerateID()

4: OrderNumber = GetOrderID()

5: DisplayMessage(Msg="Order Number generated")

6: SetOrder(=ItemNameList, =NumPlateList)

7: DisplayOrder()

8: SetStatus(Status="New Order")

9: AddOrder(=aOrder) 10:

Figure A.7: Sequence diagram of Make Order use case.

182 Appendix B

Auditorium Management System (AMS)

183 Auditorium Management System

interaction BookTicketSD[ BookTicketSD ]

<> <> <> <> <> <> <> <> <> : TicketBoundary : TicketController : IndexSeatRegister SeatRegisterList[i] : SeatRegister aSeatRegister : SeatRegister : Screen : TicketRegister : TicketRegister : FundRegister

1: BookTicket(aBookingAgent=, ShowTitle="", ShowDate="", ShowTime="", SeatType="")

2: aSeatRegister = SelectSeatRegister(ShowTitle=, ShowDate=, ShowTime=)

loop1 loop [i < SeatRegisterList.length & found=false]

3: found = HasMatch(ShowTitle=, ShowDate=, ShowTime=)

alt-1 alt [found=true] 4: SeatRegister[i]

[else] null5: null5:

alt-2 alt [aSeatRegister=null] 6: Display(Msg=="Invalid show date and time and please check properly")

[else] 7: SeatNo = FindAvailableSeat(SeatType="")

alt alt-3 [Seatno=0] 8: Display(Msg=="Seat is not available")

[else] 9: ReserveSeat(==SeatNo)

10: <> aFund : Fund 11: SetBookingAgent(=aBookingAgent)

12: SetFund(=SeatType, =ShowTitle)

13: GenerateTicketPrice()

14: TicketPrice = GetTicketPrice()

15: AddFund(==aFund)

16: <> aTicket : Ticket 17: SetTicketInfo(ShowTitle=, ShowDate=, ShowTime="", SeatNo="", SeatType="", TicketPrice="")

18: AddTicket(==aTicket)

19: DisplayTicket(=aTicket)

20:

Figure B.1: Sequence diagram of Book Ticket use case.

184 Auditorium Management System

interaction CancelTicketSD[ CancelTicketSD ]

<> <> <> <> <> <> <> <> <> <> : TicketBoundary : TicketController : TicketRegister TicketList[i] : Ticket aTicket : Ticket : IndexSeatRegister SeatRegisterList[i] : SeatRegister aSeatRegister : SeatRegister : FundRegister : Screen

1: CancelTicket(TicketNo=) 2: aTicket = FindTicket(TicketNo=)

loop-1 loop [i < TicketList.length & found=false] 3: found = HasMatch(TicketNo=)

alt-1 alt [found=true] 4: TicketList[i]

[founf=false] null5: null5:

alt-2 alt [aTicket!=null] 6: ShowTitle = GetShowTitle()

7: ShowDate = GetShowDate()

8: ShowTime = GetShowTime()

9: aSeatRegister = SelectSeatRegister(ShowTitle=, ShowDate=, ShowTime=)

loop-2 loop [i < SeatRegisterList.length & found=false] 10: found = HasMatch(ShowTitle=, ShowDate=, ShowTime=)

11: SeatRegisterList[i]

12: SeatNo = GetSeatID()

13: UnReserveSeat(==SeatNo)

14: TicketPrice = GetPrice()

15: TicketType = GetTicketType()

16: aRefund : Refund

17: RefundAmount = ComputeRefund(TicketPrice=, TicketType=)

18: AddRefund(=aRefund)

19: UpdateStatus(Status="Cancelled")

20: Display(=="Ticket has been successfully canceled and collect refund amount", RefoundAmount)

[aTicket=null] 21: Display(Msg=="Ticket has not been found and please enter valid ticket number")

22:

Figure B.2: Sequence diagram of Cancel Ticket use case.

185 Auditorium Management System

interaction ComputeSaleCommissionSD[ ComputeSaleCommissionSD ]

<> <> <> <> <> <> <> <> : CommissionBoundary : CommissionController : CommissionRegister CommissionList[i] : Commission aCommission : Commission : Screen : FundRegister FundList[i] : Fund

1: ComputeSaleCommission(aSaleAgent=, Month=, Year=)

2: aCommission = FindCommission(aSaleAgent=, Month=, Year=)

loop-1 loop [i < CommissionList.length & found=false]

3: found = HasMatch(aSaleAgent=, Month=, Year=)

alt-1 alt [found=true] 4: CommissionList[i]

[else] null5: null5:

alt-2 alt [aCommission!=null] 6: Amount = GetCommission()

7: DisplayCommission(Msg="Commission amount", =Amount)

[else] 8: TotalSale = GetTotalSale(aSaleAgent=)

loop-2 loop [i

9: flag = IsSoldBy(aSaleAgent) opt-1 opt [flag=true] 10: GetTicketPrice()

11:

12: Amount = GenerateCommission(=TotalSale)

13: <> bCommission : Commission 14: AddCommission(==bCommission)

15: SetCommission(aSaleAgent=, Month=, Year=, Amount=)

16: SetPaymentStatus(Status='Non-Paid') 17: DisplayCommission(Msg="Commission is Generated", =Amount)

18:

Figure B.3: Sequence diagram of Compute Sale Commission use case.

186 Auditorium Management System

interaction PayCommisionSD[ PayCommisionSD ]

<> <> <> <> <> <> : CommissionBoundary : CommissionController : CommissionRegister CommissionList[i] : Commission aCommission : Commission : Screen

1: PayCommission(aSaleAgent=, Month=, Year=)

2: aCommission = FindCommission(aSaleAgent=, Month=, Year=)

loop-1 loop [i < CommissionList.length & found=false]

3: found = HasMatch(aSaleAgent=, Month=, Year=)

alt-1 alt [found=true] 4: CommissionList[i]

[else] 5: null

alt-2 alt [aCommission!=null] 6: Status = CheckPaymentStatus()

opt-1 opt [Status=Paid] 7: Display(Msg='Commission is already paid')

opt-2 opt [Status=Non-Paid] 8: SetPaymentStatus(Status='Paid')

9: Amount = GetCommission()

10: DisplayCommission(Msg="Commission amount paid", =Amount)

[else] 11: Display(Msg="Not Valid Agent or No commission is payable")

12:

Figure B.4: Sequence diagram of Pay Commission use case.

187 Auditorium Management System

interaction ManageShowSD[ ManageShowSD ]

<> <> <> <> <> SB : ShowBoundary SC : ShowController SR : ShowRegister ShowList[i] : Show aScreen : Screen

1: ManageShow(Option=, ShowAttribute=, Values=)

opt-1 opt [Option="Add Show"] 2: <> aShow : Show 3: SetShowDetails(ShowAttributes="", Values="")

4: AddShow(=aShow)

5: Display(Msg="Show has been successfully added")

opt-2 opt [Option="Update Show"] 6: aShow = FindShow(ShowTitle=)

loop-1 loop [i < ShowList.size & found=false] 7: found = MatchShow(ShowTitle=):""

alt-1 alt [found=true] 8: ShowList[i]

[else] 9: null

alt-2 alt [aShow!=null] 10: Display(Msg=="Show has not been found")

[else] 11: UpdateShow(ShowAttributes=, Values="")

opt-3 opt [Option="Cancel Show"] 12: aShow = FindShow(ShowTitle=)

loop-2 loop [i < ShowList.size & found=false] 13: found = MatchShow(ShowTitle=)

alt-3 alt [found=true] 14: ShowList[i]

[else] 15: null

alt-4 alt [aShow!=null] 16: SetStatus(Status="Cancelled")

17: Display(Msg=="Show is successfully canceled")

[else] 18: Display(Msg=="Show has not been found")

19:

Figure B.5: Sequence diagram of Manage Show use case.

188 Auditorium Management System

interaction GenerateShowStatistics[ GenerateShowStatistics ]

<> <> <> <> <> <> <> : StatisticsBoundary : StatisticsController : CommissionRegister CommissionList[i] : Commission aScreen : Screen : FundRegister : Fund

1: GenerateShowStatistics(==Option, =aShowTitle, =Month)

opt-1 opt [Option=Commission] 2: TotalCommission = ComputeTotalCommission()

loop-1 loop [i

4:

5: DisplayCommission(Msg="Total Commission paid is", =TotalCommission)

opt-2 opt [Option=MovieSale] 6: TotalShowSale = ComputeTotalShowWiseSale(==aShowTitle):""

loop-2 loop [i< FundList.length] 7: Flag = IsSaleofShow(=aShowTitle)

opt-3 opt [Flag=true] 8: GetFundAmount()

9:

10: DisplayMessage(Msg="Total Sale for the show", =TotalShowSale)

opt-4 opt

[Option=MonthSale] 11: TotoalMonthSale = ComputeTotalMonthWiseSale(==Month)

loop-3 loop [i

opt-5 opt [Flag=true] 13: GetFundAmount()

14:

15: DisplayCommission(Msg="Total Sale for the month", =TotalMonthSale)

16:

Figure B.6: Sequence diagram of Generate Show Statistics use case.

189

Dissemination of Work

1. Debasish Kundu, Debasis Samanta, and Rajib Mall, An Approach to Convert XMI Representation of UML 2.x Interaction Diagram into Control Flow Graph, ISRN Software Engineering Journal, 2012, Vol. 2012, 22 pages.

2. Debasish Kundu, Debasis Samanta, and Rajib Mall, Automatic Code Gen- eration from UML Sequence Diagrams, IET Software, 2013, Vol. 7, No. 1, pp. 12-28.

3. Debasish Kundu, Debasis Samanta, and Rajib Mall, A UML Model-Based Approach to Detect Infeasible Paths, ACM Transactions on Software Engineer- ing and Methodology (TOSEM), Manuscript Id: TOSEM-2012-0083.R1 (Sub- mitted after major revision).

4. Debasish Kundu, Debasis Samanta, and Rajib Mall, Controlling MM Path Coverage While Generating Paths From UML Sequence Diagrams, IEEE Trans- actions on Software Engineering, Paper-Id: TSE-2013-01-0036 (Communicated).