Linköping University | Department of Computer and Information Science Master’s thesis, 30 ECTS | Datateknik 202019 | LIU-IDA/LITH-EX-A--2019/101--SE

Investigating the applicability of Software Metrics and Technical Debt on X++ Abstract Syntax Tree in XML format – calculations using XQuery expressions

Tillämpning av mjukvarumetri och tekniska skulder från en XML representation av abstrakta syntaxträd för X++ kodprogram

David Tran

Supervisor : Jonas Wallgren Examiner : Martin Sjölund

External supervisor : Laurent Ricci

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet ‐ eller dess framtida ersättare ‐ under 25 år frånpublicer‐ ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstakako‐ pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis‐ ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. Allannan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker‐ heten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman‐ nens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet ‐ or its possible replacement ‐ for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to down‐ load, or to print out single copies for his/hers own use and to use it unchanged for non‐commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www homepage: http://www.ep.liu.se/.

© David Tran Abstract

This thesis investigates how XML representation of X++ abstract syntax trees (AST) residing in an XML database can be subject to static code analysis. Dynamics 365 for Finance & Operations comprises a large and complex corpus of X++ and intuitive ways of visualizing and analysing the state of the code base in terms of software metrics and technical debt are non-existent. A solution is to extend an internal web application and semantic search tool called SocrateX, to calculate software metrics and technical debt. This is done by creating a web service to construct XQuery and XPath code to be queried to the XML database. The values are stored in a relational database and imported to Power BI for intuitive visualization. Software metrics have been chosen based on the amount of previous research and compatibility with the X++ AST, whereas technical debt has been estimated using the SQALE method. This thesis concludes that XML representations of X++ abstract syntax trees are viable candidates for measuring quality of source codes with the use of functional query programming languages.

Acknowledgments

First and foremost, all work for this thesis has been carried out at Microsoft Development Center in Copenhagen. I would like to thank Laurent Ricci for being my mentor and giving me the opportunity to conduct my thesis at Microsoft. Secondly, I would like to thank my supervisor Jonas Wallgren for his indispensable help and feedback throughout writing this thesis and to Martin Sjölund for being my examiner. My friends in Norrköping, you know who you are. Thank you for making the five years of studying fly by in the blink of an eye. Thank you, Tobias Matts, for being my opponent in this thesis and for giving me valuable feedback on my report. I would also like to express my deepest gratitude to my family: Minh Tran, Phuong Nguyen and Amanda Tran, for their unconditional love and support throughout my five years’ time at Linköping University. Without them, I would not have this opportunity. Lastly, I want to thank Tu Do for being my rock during this thesis. There are no words that can express how much your encouragement, love and support have meant to me.

v

Contents

Abstract iii

Acknowledgments v

Contents vii

List of Figures ix

List of Tables xi

1 Introduction 1 1.1 Motivation ...... 1 1.2 Aim ...... 2 1.3 Research questions ...... 2 1.4 Delimitations ...... 2

2 Background 5 2.1 Microsoft Development Center ...... 5 2.1.1 Microsoft Dynamics 365 for Finance & Operations ...... 5 2.2 SocrateX ...... 5

3 Theory 7 3.1 Quality Assessment in ...... 7 3.1.1 Life Cycle ...... 7 3.1.2 Quality in Software ...... 8 3.2 Trees ...... 10 3.3 Software metrics ...... 10 3.3.1 Traditional metrics ...... 11 3.3.2 Chidamber & Kemerer Metric Suite ...... 14 3.4 Technical Debt ...... 17 3.5 SQALE Method ...... 17 3.5.1 The SQALE Quality Model ...... 18 3.5.2 The SQALE Analysis Model ...... 21 3.5.3 The SQALE Indices ...... 22 3.6 X++ ...... 22 3.6.1 Language Specific Restrictions ...... 23 3.7 XML ...... 25 3.7.1 XPath ...... 26 3.7.2 XQuery ...... 29 3.8 XLNT Framework ...... 30 3.9 SocrateX ...... 30 3.9.1 ASP.NET Core ...... 30 3.9.2 Angular Framework ...... 30

vii 3.9.3 BaseX ...... 31 3.9.4 SocrateX Architecture ...... 31 3.10 Entity Framework Core ...... 32 3.11 Power BI ...... 33 3.12 Related Work ...... 33

4 Method 37 4.1 Approach ...... 37 4.1.1 Constructs in XML abstract syntax tree ...... 37 4.1.2 Applicability of software metrics ...... 38 4.1.3 Applicability of technical debt ...... 39 4.1.4 Integration with SocrateX ...... 39 4.2 Implementation ...... 39 4.2.1 Traditional metrics ...... 39 4.2.2 Chidamber & Kemerer Metric Suite ...... 41 4.2.3 SQALE Method for Technical Debt ...... 44 4.2.4 Web service - Static Code Analysis ...... 46 4.2.5 Power BI ...... 52 4.2.6 Report Module in Angular ...... 52

5 Results 55 5.1 Performance benchmark ...... 55 5.2 Software metric values ...... 55 5.3 Power BI Report ...... 56 5.3.1 Overview page ...... 57 5.3.2 Artifact page ...... 58 5.3.3 Software metric page ...... 58 5.3.4 Rule page ...... 59 5.3.5 Team page ...... 59

6 Discussion 61 6.1 Results ...... 61 6.1.1 Database design and XQuery expressions ...... 61 6.1.2 XQuery performance ...... 61 6.1.3 Software metrics ...... 62 6.1.4 Technical Debt ...... 64 6.1.5 Power BI Report ...... 64 6.2 Method ...... 64 6.2.1 Validity ...... 65 6.2.2 Replicability ...... 65 6.2.3 Reliability ...... 66 6.2.4 Source criticism ...... 66 6.3 The work in a wider context ...... 66

7 Conclusion 67 7.1 Consequences ...... 68 7.2 Future Work ...... 68 7.2.1 Historical data support ...... 68 7.2.2 Task scheduler and Assembly in ASP.NET Core ...... 68 7.2.3 XQuery parallelism ...... 68 7.2.4 Code refactoring ...... 69 7.2.5 Utilize the full XML AST ...... 69

Bibliography 71

viii List of Figures

3.1 The core stages in Software Development Life Cycle...... 8 3.2 The three aspects of in a software development environment. . . . 9 3.3 ISO/IEC 25010 product quality model composed of eight quality characteristic which are further divided into sub-characteristics...... 10 3.4 A code snippet and its corresponding abstract syntax tree of the for-loop...... 10 3.5 A binary search algorithm implementation in X++ and its corresponding control- flow graph...... 13 3.6 A tree interpreted as a class hierarchy. Each node is a class...... 15 3.7 The general structure of the SQALE Quality Model. It is divided into three hier- archical levels...... 18 3.8 A detailed overview of the mappings between the hierarchical levels in the SQALE Quality Model ...... 19 3.9 A high-level overview of the SocrateX software architecture...... 31

4.1 A high-level overview of the new updated SocrateX architecture...... 47 4.2 The database schema of the entities for static code analysis...... 51

5.1 The overview page of the static code analysis report...... 57 5.2 The artifact page of the static code analysis report. The class Tax is selected from the search bar and its associated metadata, software metric values and rule violations are shown...... 58 5.3 The software metric page of the static code analysis report. The software metric Number of Children (NOC) has been selected...... 59 5.4 The rule page of the static code analysis report. The rule Calls to obsolete methods is selected and its corresponding metadata, rule violations and technical debt are shown...... 60 5.5 The team page of the static code analysis report. The selected team is Warehouse and Transportation and its associated metadata, artifacts, software metric values and technical debt...... 60

ix

List of Tables

3.1 Description of Functional Quality, Process Quality and Structural Quality...... 9 3.2 Branching and looping constructs...... 12 3.3 thresholds...... 14 3.4 A description of the SQALE Quality Model’s characteristics, adopted from ISO/IEC 25010...... 20 3.5 Two examples of requirement samples, their mappings between the hierarchical levels and their associated remediation function...... 21 3.6 Overview of system classes for a broad range of system programming areas . . . . . 23 3.7 Description of axes supported in the XPath specification...... 28

4.1 Examples of constructs derived from the XML abstract syntax tree of X++ code. . 38

5.1 A benchmark for the NOC implementation with and without maps...... 56 5.2 Eight software metric values obtained from the XML database. Minimum, maxi- mum and average values are displayed...... 57

xi

1 Introduction

The outline of this chapter is as follows: Section 1.1 describes the research problem. The aim of the thesis and the associated research questions are presented in section 1.2 and section 1.3 respectively. Lastly, section 1.4 describes the scope of this thesis.

1.1 Motivation

The importance of quality in the software engineering domain continues to increase due to the fact that in the software industry, quality is no longer an important factor in this competitive environment but a necessity in order to prevent serious bottom-line effects on the business. Managers in the industry are progressively focusing on process improvement in the software development life cycle for the reason that it plays a fundamental role in the delivery of a product [3]. This demand proves to be pivotal for companies to implement new approaches or im- prove existing ones to software development, with perhaps the most prominent being object- orientation. As a result, measurements of quality need to be defined and thus allow successful ideas and concepts found in non-software fields to be converted into comparable entities for use in the software engineering domain. This is known as software metrics and is widely used in the industry to measure an application’s quality [9]. Apart from software metrics, there is also another concept that is used extensively at all levels of an IT organization. This concept is called technical debt and refers to the additional work needed to complete the software development caused by not following best practices of a given programming language [21]. Microsoft Dynamics 365 for Finance & Operations is a cloud-based enterprise resource planning (ERP) system that is being developed at Microsoft. The application is written in X++, which is a proprietary programming language, and constitutes a large and complex corpus of source code [44]. One of many challenges that limits companies in the industry from scaling in terms of the number of developers is the ramp-up time each individual developer needs in order to become proficient enough to make changes to production. For that reason, the possibility to analyze the X++ code base of Microsoft Dynamics 365 for Finance & Operations with regard to software metrics and technical debt, is essential for a developer to understand the code more quickly. This will also give an indication to developers and managers, if certain parts of the application is regressing or improving.

1 1. Introduction

Currently, there is an existing tool used internally at Microsoft, called SocrateX, that en- ables the user to generate AST (abstract syntax tree) of any source code from the application’s code base in XML format. The ASTs are stored in BaseX, an open-source XML database. The user can also write custom queries to the database in order to get insight about the source code of a given artifact. Although there are some pre-defined queries to get insight about the source code in the current tool, there are currently no intuitive approach to analyze and visualize source code and metadata in rich and centralized ways.

1.2 Aim

An interesting aspect from developers’ and managers’ point of view is to get an understanding of the code in terms of software metrics and technical debt. Therefore, the aim of this thesis is to investigate the applicability to calculate software metrics and technical debt of the existing code base of Microsoft Dynamics 365 Finance & Operations, represented as XML AST’s of X++ code in order to analyze the code base. This will be done using XQuery and XPath, which are query languages for finding and extracting information from XML documents. Asa result, SocrateX will be extended to support business intelligence in the form of a visualization report.

1.3 Research questions

The research questions for this thesis are:

1. How can XQuery and XPath queries be used to calculate software metrics and technical debt over XML representations of ASTs?

2. How can software metrics and technical debt be applied to Microsoft Dynamics 365 for Finance & Operations?

3. How can the values be presented to enable developers to keep track of key performance indicators of the X++ code base?

1.4 Delimitations

This thesis only aims to investigate which software metrics are suitable for an XML repre- sentation of the X++ AST and how one can estimate the technical debt. Thus, additional programming languages are not supported. The thesis is based on an existing internal tool developed by Microsoft developers and therefore, the current technology stack of the applica- tion won’t be replaced. The parsing of the AST and serialization to XML have already been implemented. Software metrics can be defined in a variety of levels such as by package, by system andby class. In this thesis, software metrics will be chosen based on the amount of previous research and compatibility with the metadata of the X++ AST. One of the metrics from Chidamber & Kemerer metric suite was not included in this thesis, Lack of Cohesion Of Methods (LCOM), which measures the cohesion of a class by counting number of connected components where a connected component is a set of related methods (and class-level variables) [9]. Results from previous research papers and a previous conducted thesis project on the very same X++ code base proved that LCOM metric is unreliable due to a number of factors: LCOM will be skewed if getter and setter accessors are included, methods without accessing instance variables and class constructors to name a few [39]. Furthermore, the original intent was to include J. bansiya and .G Davis’s QMOOD (Quality Model for Object-Oriented Design) which is a hierarchical model that defines the

2 1.4. Delimitations relationship between high-level quality attributes such as flexibility and reusability and de- sign properties such as coupling and cohesion [3]. These design properties are proportionally weighted and quantified by their effects on the high-level quality attributes with thehelpof linear equations based on values from software metrics. However, the software metrics defined in the model is based on a design, and not on a single artifact e.g. class. Therefore, a decision was made to not include QMOOD in this thesis. Initially, the thesis would include code refactoring as well. However, a decision was made to deviate from code refactoring and include technical debt instead using the SQALE Method due to time restriction and limited scope of the thesis. Non-remediation functions defined in the SQALE definition document was not included since the thesis is strictly limited to development and does not take business plan and cost into consideration. The requirements of technical debt will be based on best practice rules associated with the X++ language and not on rules defined in research papers.

3

2 Background

This chapter gives an introduction to the Microsoft office in Copenhagen, Microsoft Dynamics 365 for Finance & Operations and SocrateX, the internal tool used by Microsoft developers.

2.1 Microsoft Development Center

This thesis was conducted at Microsoft Development Center in Copenhagen. The main focus at this office is on developing Microsoft Dynamics business apps such as Microsoft Dynamics 365 for Finance & Operations and Visual Studio App center. Additionally, there are teams of scientist and students working on Quantum research.

2.1.1 Microsoft Dynamics 365 for Finance & Operations Microsoft Dynamics 365 for Finance & Operations, hereafter called MD365FO, is formerly known as Microsoft Dynamics AX and was originally developed as a collaboration between IBM and Danish Damgaard Data as IBM Axapta. After the release of Microsoft Dynamics AX version 1.5, IBM released all rights in the product to Damgaard Data. The company was later merged with Navision Software A/S and Microsoft acquired the combined company in July 2002. MD365FO is a part of the Dynamics 365 product line and is a cloud-based ERP system that offers businesses the ability to integrate the management of core processes suchasfi- nance, inventory, HR, manufacturing, supply chain and more, into a single system. This enables businesses to simplify the flow pipeline of real-time information across departments and ecosystems, which ultimately bolster the business to make fast, decisive decisions and quickly adapt to changing market demands and drive rapid business growth. MD365FO is written in X++, a proprietary programming language which is further de- scribed in section 3.6.

2.2 SocrateX

SocrateX is a web application and a semantic search tool used by Microsoft developers in order to get insight into the source code of the MD365FO.

5 2. Background

When compiling the source code, abstract syntax trees are generated that can subsequently be used for static code analysis and code generation. These AST’s are serialized into XML. The XML representations have types recorded for all expression. It contains important metadata and the start and end positions for every construct. Therefore, these XML representations are conveniently used for querying through its code document object model (DOM).

6 3 Theory

This chapter presents the theory and related work to this thesis. A background of quality assessment in the software development field is introduced in section 3.1. Software metrics and technical debt are presented in section 3.3 and 3.4 respectively. The SQALE Method is described in section 3.5. Section 3.6 gives a brief introduction to X++. Furthermore, abstract syntax trees are described in section 3.2. XML is introduced in section 3.7. Finally, the tools and frameworks used in this thesis are introduced in section 3.8-3.9.2. Related work and relevant studies to this thesis can be seen in section 3.12.

3.1 Quality Assessment in Software Engineering

This section gives a concise introduction to pivotal factors concerning quality assessment in software engineering. Section 3.1.1 presents a brief introduction to software development life cycle followed by quality in the software engineering domain in section 3.1.2. Lastly, section 3.1.2 presents the definition of a software quality model.

3.1.1 Software Development Life Cycle The delivery of a software product which satisfies the customer’s requirements is an indispens- able factor in the software engineering field. To achieve this, many software companies have adapted different methodologies and processes to further elevate their business. Oneofthe most prominent approach is the software development life cycle (SDLC), which is a workflow methodology for designing, developing and testing high quality software. The international standard for SDLC is ISO/IEC 12207 which strives to be the standard that defines all the tasks required for developing and maintaining software [16]. SDLC is divided into distinct core stages of development cycles. However, there are nu- merous interpretations of the stages in SDLC but the most common one can be seen in figure 3.1 [5]:

7 3. Theory

Figure 3.1: The core stages in Software Development Life Cycle.

1. Planning: Planning, also known as requirement analysis, is the most fundamental stage in SDLC. The ambition is to gather information from customer, domain experts in the industry and stakeholders to plan the approach.

2. Requirements: This stage is solely focused on creating the software requirement spec- ification (SRS) to define and document the product requirements for approval bystake- holders and customers.

3. Designing: The design of the product architecture with reference to the SRS in earlier stage is created in this stage. The design approach is documented in a Design Document Specification (DDS).

4. Development: This stage is focused on the development of the product with reference to the DDS.

5. Testing & Integration: Testing and integration of the product is done at this stage. Bugs and defects of the product are being reported and fixed in this stage until the product reaches the quality standards.

6. Deployment & Maintenance: This stage is dedicated to the deployment of the prod- uct. Afterwards, the product is maintained for the customer.

3.1.2 Quality in Software The definition of quality can be quite complex and ambiguous. ISO/IEC/IEEE 24765:2017 describe quality as “the degree to which a component, system, or process meets specified re- quirements and/or user/customer needs and expectations“ [17]. In the context of software engineering, ISO/IEC 9126 (revised as ISO/IEC 25010) defines software quality as “the totality of functionality and features of a software product that bear on its ability to satisfy stated or implied needs“ [40]. There are many perceptions of software quality among researchers. David Chappell breaks down software quality into three aspects: functional quality, structural quality and process quality [8], see figure 3.2.

8 3.1. Quality Assessment in Software Engineering

Figure 3.2: The three aspects of software quality in a software development environment [8].

Figure 3.2 illustrate three main groups associated with software quality: sponsors, users and development team. • Sponsors: Sponsors can be regarded as stakeholders for the software product. • Users: Users are the customers of the software product. • Development Team: The team that implements the software. The definition of the three aspects of software quality can be seen intable 3.1:

Table 3.1: Description of Functional Quality, Process Quality and Structural Quality.

Quality Aspects Description Functional quality focuses on the software functionality to end-users. Attributes concerning Functional Quality functional quality are requirements, performance and accessibility of a software product. Process quality covers the development process of a software product. A good example of a Process Quality development process that aid consistency and efficiency in a development team is SDLC. Structural quality represents the code structure of a software. Testability, , Structural Quality understandability, efficiency and security in code are central attributes concerning structural quality.

The three aspects of software quality are closely linked to each other and comes with both advantages and disadvantages depending on the type of software and organization.

Software Quality Model Quality in software is difficult to assess because of its ambiguity. To address this issue, re- searchers have developed models to accomplish the ability of measuring quality in software. This is known as software quality model, which is a standardized procedure to objectively evaluate and measure the quality of a product based on quality characteristics and metrics [19].

9 3. Theory

An example of the ISO/IEC 25010 [40] can be seen in figure 3.3.

Figure 3.3: ISO/IEC 25010 product quality model composed of eight quality characteristic which are further divided into sub-characteristics.

3.2 Trees

Abstract syntax trees (AST) is a data structure which represents the abstract syntactic struc- ture of source code in a tree representation. Technically, a tree data structure is a directed graph with the exception that a tree has no cycles and a child node can only have one parent. ASTs are particularly well known in construction where it is considered as an intermediate representation and is created during the syntax analysis phase in a compiler which is commonly referred to as the front-end of a compiler [1]. AST is different from parse trees because it does not contain any syntactic details such as semi-colon, commas and parentheses. Numerical operators such as +, -, *, / are not represented as leaves, instead it is represented as parent nodes. ASTs are generally used for semantic analysis, optimization and code generation. Figure 3.4 illustrates a code snippet and its corresponding AST.

int a = 1, x = 0, y = 10; for(int i = 0; i < 20; i++) { if(i > a + 5) { x = x + 2; } }

(a) Code snippet of a for-loop with an if-statement. (b) An abstract syntax tree of the for-loop in 3.4a.

Figure 3.4: A code snippet and its corresponding abstract syntax tree of the for-loop.

3.3 Software metrics

William Thomson, also known as Lord Kelvin, was one of the most renowned physicist in the world. Perhaps his most prominent quote is [18]:

10 3.3. Software metrics

“When you can measure what you are speaking about, and express it in numbers, you know something about it, when you cannot express it in numbers, your knowl- edge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarely, in your thoughts advanced to the stage of science.“

The capability of evaluating software in terms of numerical values is essential as technologies continues to advance in the field. Moreover, as William Thomson stated, without measurement there is no knowledge and thus no improvement. To measure quality characteristics in software quality models, appropriate software metrics have to be defined. Software metrics are quantitative measurements of a product andare related to one or multiple quality characteristics. The intention of software metrics is to make assessment during the software development life cycle as whether the software requirements are being accomplished [14].

3.3.1 Traditional metrics This section will give an introduction to traditional object-oriented metrics that have been used in this thesis.

Source Lines of Code (SLOC) Source lines of code (SLOC) is a software metric used to measure the size of a class by counting the number of lines in the text of the class’s source code. It is one of the most used metrics to measure the understandability and readability of a program. However, there are a few complications when using this metric. First and foremost, there is no clear indication whether a lower SLOC is better than a higher one due to a number of factors. For instance, comments contributes to a higher SLOC value, but comments are there for readability and understandability for the developer. Secondly, programming languages are different in terms of syntax and that will also affect SLOC and thus it canbedifficult to compare the values between different languages. Another factor that needs to be taken into consideration is the developer’s personal coding style as it may vary from developer to developer as there is usually more than one way to write code. Therefore, one can argue that SLOC does not measure code quality, readability and un- derstandability [23]. Despite this, there are a few arguments that needs to be enlightened:

1. A higher SLOC means more complication.

2. The more complicated code, the harder it is to understand.

3. Before a developer implements new features or fix bugs, they need to understand itin order to write production quality code.

4. Understanding takes time.

5. Time costs money.

In conclusion, on the basis of the arguments above, less code implies an easier understanding and thus cheaper to add new features. SLOC provides a good hint to developers of the amount of effort required to understand how the code works. SLOC is also valuable to managers due to the fact that SLOC gives a good indication regarding resource allocation and estimation of time.

11 3. Theory

Comment Percentage (CP) Comments in source code are there to aid new developers and maintainers of the code in terms of understandability, readability and maintainability. Equation 3.1 shows the calculation of comment percentage metric. NumberofComments CP = (3.1) SLOC According to SATC (Software Assurance Technology Center), the desired comment per- centage is about 30% [24].

Number of Methods Number of methods (NOM) is a metric which simply counts all methods in a class including private methods [23]. This metric can be interpreted as a complexity metric because a class is likely to be more complex if it contains significantly more methods compared to a class with lesser methods. Understandability is also one characteristic associated with NOM.

Cyclomatic Complexity (CC) Cyclomatic Complexity is a source code complexity measure based on graph theory and was defined by Thomas J. McCabe [25]. By constructing a control-flow graph one can measure the number of independent paths in a program module. A control flow graph is defined asa directed graph with unique entry and exit nodes. In this context, a program’s module is a method with a single entry point and a single exit point. Each node in the graph corresponds to a block of code that executes commands in the module. Between each node is a directed edge, which means the second node might be executed after the first node. The equation for cyclomatic complexity can be seen in equation 3.2.

M = E − N + 2P (3.2) where M is the complexity, E is the number of edges in the graph, N is the number of nodes in the graph and P is the number of connected components. The cyclomatic complexity in figure 3.5b would be:

M = E − N + 2P Ô⇒ M = 12 − 10 + 2 ∗ 1 = 4 (3.3) However, this works well for small methods. For larger methods, the construction of the control-flow graph and execution of the calculations can be quite cumbersome. Therefore, it is possible to calculate the cyclomatic complexity by counting the branching and looping constructs and add one [6]. Examples of constructs can be seen in table 3.2.

Table 3.2: Branching and looping constructs.

Construct Decision Reasoning If..Then +1 An if statement is a single decision. If..Then..Else +1 Contributes to decision except for the else construct While +1 A while loop has a decision at the start. Do..While +1 A do-while loop has a decision at the end. For loop +1 A for-loop has a decision at the start. Try-Catch +1 Each catch in try-block statements adds a new path.

12 3.3. Software metrics

1 int BinarySearch(List numbers, int target)

2 {

3 int left = 0;

4 int right = numbers.elements()-1;

5 int mid = 0;

6 while(left <= right)

7 {

8 mid = left+((right-left) / 2);

9 if(numbers[mid] == target)

10 {

11 return mid;

12 }

13 else if(numbers[mid] > target)

14 {

15 right = mid-1;

16 }

17 else

18 {

19 left = mid+1;

20 }

21 }

22 return -1;

23 } (b) A control-flow graph for binary search algorithm where the number in (a) Binary search algorithm implementation in X++. each node corresponds to the line num- ber in figure 3.5a . Figure 3.5: A binary search algorithm implementation in X++ and its corresponding control- flow graph.

Looking at the code in listing 3.5a, there is one while-statement, one if -statement and one elseif -statement. The cyclomatic complexity using this approach can be seen in equation 3.4.

M = 1(while) + 1(if) + 1(elseif) + 1 = 4 (3.4) A higher complexity value correlates with greater testing and maintenance requirement as well as low understandability [23]. Consequently, it can increase the difficulty of achieving a high in the program. Thomas J. McCabe established an upper-bound complexity limit of 10 which seemed like a feasible threshold between acceptable code and too complex code. Whenever a complexity exceeded the upper-bound limit, refactoring had to be made in the method or completely redo the software [25]. It is undoubtedly difficult to estimate an acceptable threshold of cyclomatic complexity. Fortunately, there are reference values from the software engineering institute of Carnegie Mellon University where a total of four ranges of cyclomatic complexity and their respective risk evaluation were defined [12]. See table 3.3.

13 3. Theory

Table 3.3: Cyclomatic complexity thresholds.

Cyclomatic Risk Evaluation Complexity 1-10 Simple program, low risk. 11-20 More complex, moderate risk. 21-50 Complex, high risk. >= 50 Untestable program, very high risk.

Even so, it is difficult to realize which threshold range is convenient for the occasion. For instance, the cyclomatic complexity threshold arguably depends on the project. The complexity limit of 10 might be applicable to a certain project but can be impractical in others. It certainly boils down to the software architecture complexity, code style of the developer and the organization. A lower cyclomatic complexity is usually the desired goal yet refactoring techniques could increase cyclomatic complexity even though the expected goal of refactoring code is to simplify and improve code while increasing understandability and testability. The cyclomatic complexity is therefore a subjective metric and the threshold could be altered based on requirements and projects.

3.3.2 Chidamber & Kemerer Metric Suite This section will give an introduction to object-oriented metrics that have been used for this thesis based on the metric suite defined by Shyam R. Chidamber and Chris F. Kemerer [9]. It is one of the most well-known metric suites and has been cited in several research papers and implemented into various static code analysis tools. Due to its recognition and establishment, these metrics have been selected.

Weighted Method per Class (WMC) Weighted Method per Class (WMC) is a software metric which calculates the aggregate com- plexity of all methods within a class. The computation of WMC is shown in equation 3.5.

n WMC = ∑ ci (3.5) i=1 where n is the number of methods within a class and c is the complexity of a single method. All methods within the class should be taken into consideration, including inherited methods. To determine the complexity of a method, McCabe’s cyclomatic complexity is used as a weight metric. There are other possible metrics that can be used to decide the weight of an individual method such as SLOC. If all method complexities are unity, then WMC is equal to the number of methods in the class. A high WMC is a solid indication of a class that has a large number of methods, effectively limiting the possibility of reuse. There is also a likelihood that a subset of those methods are inherited, thus derived classes are also affected in the process. One can draw the conclusion that higher WMC is directly correlated with increased testing and maintenance efforts [23].

Depth of Inheritance tree (DIT) Given a class hierarchy which can be interpreted as a tree where each node is a class, the definition of Depth of inheritance (DIT) metric is the depth of a given node. That is,the

14 3.3. Software metrics number of edges from one node to the root node. In other words, DIT measures the number of ancestors a class has. Consider the following figure:

Figure 3.6: A tree interpreted as a class hierarchy. Each node is a class.

In figure 3.6, class A is the base class in the class hierarchy and its children are derived classes from class A. Class B, E and F has a value of one since there is only one ancestor which is the base class. Finally, class C and D has a value of two because there are two classes which are ancestors to the respective classes. It is important to have knowledge of a class’s DIT value because it has direct correlation with object-oriented (OO) design. A higher DIT contributes to a greater OO design complexity because more classes and methods are involved. Ultimately, a high value makes it harder to maintain and more error prone since modifications could potentially result in breaking change. However, a high value also suggest that the likelihood of methods being reusable increases [23]. The recommended DIT can be ambiguous due to different assessments between developers regarding this metric. A class with a lower value might indicate poor exploitation of OO principles but could be perfectly suitable depending on the project. As previously mentioned, classes with higher DIT are more complex and error prone thus a low value is good and a high value is bad. A recommended starting point is no classes should have a value higher than five [39].

Number of Children (NOC) Number of Children (NOC) metric and Depth of Inheritance (DIT) are closely related metrics. The latter measures the depth of a class to the base class whereas the former measures the breadth which is the number of immediate sub-classes a class has. The NOC in figure 3.6 for class A is three, since class B, E and F are immediate sub-classes of class A. A higher NOC corresponds to an increase in testing efforts because modification in a class with a high value has a direct impact on its sub-classes. This implies that testing needs to be done for all immediate sub-classes. Reusability also increases with higher NOC since reusability is a form of inheritance [23].

Coupling Between Objects (CBO) Coupling Between Objects (CBO) measures a class’s dependency on other classes, also known as coupling. This can be estimated by counting the number of unique reference types excluding primitive data types that occur through:

• Method calls.

15 3. Theory

• Method parameters. • Variable in methods. • Field declarations. In X++, the concept of a class is ambiguous because it can be interpreted as e.g. table or form. Assessments have to be made for this particular metric and is described in section 3.6.1. Excessive coupling between classes negatively influences reusability since the class is de- pendent of other classes. Modifications in a class with high CBO might affect other classes in the design and consequently an increase in maintenance and decrease in testability are to be expected [23].

Response For a Class (RFC) Response for a Class (RFC) counts the number of methods of a class and methods that they directly call. The formula for RFC is defined in equation 3.6:

RFC = ∣RS∣ (3.6) where RS is the response set and can be expressed as:

RS = {M} ∪ all i{Ri} (3.7) where {Ri} is the set of methods called by method i and {M} is the set of all methods in the class.

public class A { public A(){}; public void M1(){...} public void M2(){...} public void M3() { B object = new B(); object.M1B(); object.M1B(); } }

public class B { public B(){}; public void M1B(){...} }

Listing 3.1: A code snippet which illustrates the concept of RFC.

Listing 3.1 shows two classes, A and B containing three methods and one method respec- tively. The response set of class A is:

RS = {M1,M2,M3} ∪ {M1B,M1B} = {M1,M2,M3,M1B} (3.8)

16 3.4. Technical Debt

A set is a collection of distinct values and thus multiple method calls counts as one. The RFC of class A from equation 3.8 is four. Classes with high RFC are unfavourable. The difficulty to understand and test a class increases with high RFC since more methods are involved and as a result, the class becomes more complex [23].

3.4 Technical Debt

Technical debt is an adequate concept in software development and has been adopted by numerous organizations in the industry. Ward Cunningham used this metaphor for the first time in 1992 and described the concept as follows [10]:

“Shipping first-time code is like going into debt. A little debt speeds develop- ment so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire en- gineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.“

There is no clear definition of what technical debt is. However, it can be interpreted asthe amount of extra development time it takes to refactor code. The unit of measurement related to the technical debt can be in terms of money, hours and work units to name a few. Common causes of technical debt are lack of conception, poor scheduling and developers not following best practice rules [34]. The estimation of technical debt can be calculated with the SQALE method, which is described in section 3.5.

3.5 SQALE Method

The SQALE method (Software Quality Assessment based on Life cycle Expectations) was developed by Inspearit, an organization which specializes in transformation and process im- provement of agile methodologies and management, in response to the lack of methods to estimate a reliable quality score available in the current standards such as ISO 9126 [21]. It is used to accurately estimate the technical debt in source code level. The SQALE method consists of four key concepts [22]:

• The Quality Model

• The Analysis Model

• The Indices

• The Indicators

The last key concept, SQALE Indicators, won’t be described in this chapter because it has not been used in this thesis. Moreover, the SQALE method is also defined by nine fundamental principles in order to achieve an accurate estimation of the quality and technical debt:

1. The quality of the source code is a non-functional requirement.

2. A requirement associated with quality of source code must be formalized according to the quality criteria; atomic, unambiguous, non-redundant, justifiable, acceptable, imple- mentable and verifiable.

17 3. Theory

3. Assessing quality in source code is determining the distance between its current state and its expected quality objective.

4. The SQALE method assesses the distance to requirement conformity by considering the remediation cost required to bring the source code to conformity.

5. The SQALE method checks the importance of a non-conformity by considering the re- sulting cost of delivering the source code with this non-conformity.

6. The SQALE method respects the representation condition.

7. The SQALE method takes the sum of all remediation cost, non-remediation cost and its indicators for determining the technical debt.

8. The SQALE method is orthogonal which implies that a quality requirement only occur once in the quality model.

9. The SQALE method takes the software development life-cycle into consideration.

3.5.1 The SQALE Quality Model Most software quality models rely on the decomposition of high-level quality characteristics into its associated lower-level requirements that needs to be well-defined and measurable. The SQALE Quality Model is not an exception. The goal of the model is to define an appropriate methodology for organizing requirements associated with code quality to estimate the code’s technical debt. The SQALE Quality Model’s general structure can be seen in figure 3.7.

Figure 3.7: The general structure of the SQALE Quality Model. It is divided into three hierarchical levels.

As illustrated in figure 3.7, there are three hierarchical levels in the SQALE Quality Model that are constructed such that the software development life-cycle (SDLC) is taken into account along with the developer and software user’s point of view [21][22]. The first level is composed of quality characteristics. For each characteristic, there are multiple sub-characteristics that constitutes the second level of the SQALE Quality Model. The final level consists of require- ments which are linked to each sub-characteristic, see figure 3.8.

18 3.5. SQALE Method

Figure 3.8: A detailed overview of the mappings between the hierarchical levels in the SQALE Quality Model [21]. © 2012 IEEE, used with permission.

Figure 3.8 presents a comprehensive overview of the hierarchical levels of the SQALE Qual- ity Model. The characteristics are in chronological order based on when they appear in a SDLC with ’Testability’ as the foundation because it is burdensome to make an untestable component reliable. The requirements are concrete blueprints of ”right code” and must be linked to an appropriate quality characteristic that would be affected in the case of a requirement violation [21]. Ultimately, debt is created when there is noncompliance with the specified requirements. Precision and amount of technical debt depends on the remediation functions associated with each requirement. The hierarchical levels of the SQALE Quality Model are further described in the following sections.

Defining the Characteristics (Level 1) The software quality characteristics defined in the ISO/IEC 25010 have been adapted andused in the first hierarchical level of the SQALE Quality Model. A total of eight characteristics are present in this level, see table 3.4 [22, 40].

19 3. Theory

Table 3.4: A description of the SQALE Quality Model’s characteristics, adopted from ISO/IEC 25010.

Quality Definition Characteristic The degree of effectiveness and efficiency of testing and verifying Testability a software system, product or component. Reflects the degree of a software system, product or component Reliability that can maintain its intended functionality under a specified period of time, even when facing adversity. The degree of effectiveness and efficiency to which a software Changeability system, product or component enables modification to be implemented. The amount of resources expended to provide the required functionality of a software system, product or component. Efficiency Example of resources are financial cost, time, materials, disk space. The degree to which a software system, product or component protects sensitive information, stored data and data in Security transmission. Levels of authorization for data accesses are also related to security. The degree of effectiveness and efficiency with which a software system, product or component can be modified by intended Maintainability maintainers. Examples of concrete actions directly related to maintainability are correction, improvements, adaption of software changes and installation of updates and upgrades. The degree of effectiveness and efficiency of a software system, Portability product or component’s ability to adopt changes in terms of software, hardware or usage environment. Reflects the degree to which an asset can be reapplied toanew Reusability problem without significant effort.

a distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail. An important case is that testability, changeability and reusability are sub-characteristics in the standard of ISO/IEC 25010, yet it is labelled as characteristic in the SQALE Quality Model. These sub-characteristics have been moved into the first level of the SQALE Quality Model because they are needed early inthe life-cycle of a source code file20 [ ]. The quality characteristics in this level have been selected because they rely on the code’s internal properties and directly impact the software development life-cycle.

Defining the Sub-characteristics (Level 2) The second level of the SQALE Quality Model is sub-characteristics and is used to refine the characteristics. Generally, sub-characteristics are affiliated with life-cycle activities ofa software. Unit test, integration test, and optimization of processor usage are prime examples related to the software development life-cycle. Sub-characteristics can also refer to common taxonomies of best practices related to soft- ware’s architecture and coding [22].

20 3.5. SQALE Method

Defining the Requirements (Level 3) The last and final level of the SQALE Quality Model are the requirements. Requirements are rules of what is considered as ’right code’ and pertain to artifacts that compose the software’s source code e.g. components, files and classes. Requirements could be interpreted asbest practice rules. The definition of ’right code’ is related to implementation level, naming requirements, presentation requirements and depend exclusively on the project and organization.

3.5.2 The SQALE Analysis Model The fifth fundamental principle of the SQALE Method mentioned in 3.5 states that assessing a software source code is comparable to estimating the distance of a source code’s current state to its expected quality objective. The SQALE Analysis Model is responsible for normalizing the internal measurements of the source code into costs which is done using a remediation index [20]. The remediation index is the cost of fixing the noncompliance of the requirements. The cost is determined by remediation functions and is described in the next section.

Remediation Functions When a requirement is breached, a debt is created. When the requirements in the third level of the SQALE Quality Model have been formulated, the requirements are analyzed on the source code and the unit of measure collected is the number of noncompliance per artifact [21]. Each requirement need to be associated with a remediation function. The goal of a reme- diation function is to convert the number of noncompliance of an artifact into a remediation cost. Remediation functions vary depending on the workload required to bring the source code into conformity. One can simply use a multiplicative factor for each violation of a requirement in terms of time, see table 3.5.

Table 3.5: Two examples of requirement samples, their mappings between the hierarchical levels and their associated remediation function.

Remediation Remediation Characteristic Requirement Detail Function 1 hour per There is no method with Refactor and occurrence if Testability a cyclomatic complexity write tests measure is < 24; 2 over 12 hours if > 24 Parameters of private and Add an 1 minute per Maintainability internal methods must underscore to occurrence start with underscores the parameter

Table 3.5 references two distinct requirements with different remediation functions. The first row of the table presents a requirement for cyclomatic complexity. If violated, therequired action must be taken as described in the remediation detail column which is to refactor the code and update the tests. The second row presents a requirement that demands the naming of private/internal method parameters to start with underscores. The necessary action is to add underscores to the parameter names. For example, imagine an artifact’s source code which has 10 private/in- ternal method parameters that does not start with underscores in the parameter names. This means that the technical debt for this specific artifact is 10 minutes, since one violation takes approximately one minute to fix.

21 3. Theory

Comparing the two requirements, one can come into conclusion that the remediation work- load difference is significant, and the remediation functions are altered accordingly.

3.5.3 The SQALE Indices SQALE Indices is the third concept of the SQALE Method. Indices represents the total remediation cost of a project and are used to analyze the technical debt. The remediation cost of artifacts related to a certain quality characteristic is estimated as the summation of all remediation cost for that characteristic. Accordingly, the indices can be categorized with respect to quality characteristics [22]:

• SQALE Testability Index (STI)

• SQALE Reliability Index (SRI)

• SQALE Changeability Index (SCI)

• SQALE Efficiency Index (SEI)

• SQALE Usability Index (SUI)

• SQALE Security Index (SSI)

• SQALE Maintainability Index (SMI)

• SQALE Portability Index (SPI)

• SQALE Reusability Index (SRuI)

These indices are used to analyze the technical debt yet there is no index for representing the total technical debt. Fortunately, there is an approach to measure the total technical debt. The aggregate value of all characteristic indices is called the SQALE Quality Index (SQI) and denotes the concept of ”Technical Debt” commonly used in agile projects [22]. This implies that the technical debt of a large code base can be estimated by taking the sum of remediation costs for all artifacts in the code base. As mentioned previously, the precision of the technical debt solely depends on the level of detail of the formulated remediation functions.

3.6 X++ Programming Language

X++ is an object-oriented programming language with similarities to C# [44]. It is used for enterprise resource planning programming and database applications in MD365FO. An overview of system programming areas in MD365FO can be seen in table 3.6.

22 3.6. X++ Programming Language

Table 3.6: Overview of system classes for a broad range of system programming areas [44].

X++ Language Description Concepts In addition to system classes, MD365FO also provides Classes application classes for managing many types of business processes. Reflection on classes is supported. X++ programmers can access the relational tables in MD365FO. Tables X++ includes keywords that match most of the keywords in standard SQL. Reflection on tables is supported. User interface Manipulation of user interface items, such as forms and reports. X++ code is checked for syntax errors during compile time. The Best Practice compile process also performs best practice checks. Violations of Checks best practices can generate compiler messages. The X++ runtime execution engines have automatic Garbage collection mechanisms to discard objects that are no longer referenced, so that memory space can be reused. MD365FO supports interoperability between classes written in Interoperability X++ and in C# (or other .NET Framework languages). File input and output is supported, including XML building and File manipulation parsing. Dynamic arrays are supported and the X++ includes several Collections collection objects.

Although X++ is a highly object-oriented language, there are some language specific re- strictions when calculating software metrics due to the language features of X++ that have not been taken into consideration by authors that have defined the software metrics introduced in section 3.3. These language specific restrictions are presented in the next section.

3.6.1 Language Specific Restrictions The chosen metrics were implemented based on the research papers by the original authors. One of many things that makes X++ unique is the ability to write queries to the database anywhere in the code (embedded SQL). Since X++ is a proprietary programming language, the special language features were not taken into consideration by the authors and thus, necessary actions had to be taken in order to not deviate from the original intent of the software metric definitions [9].

Local functions Local functions, also known as nested functions and embedded methods, allow developers to declare functions within a method. It is preferred to declare a local function at the same level as variable declarations. Listing 3.2 showcases two code snippets which are functionally equivalent but implemented with local functions and private method. Local functions have access to variables in the outer method meanwhile code in the outer method cannot access variables that are instantiated inside the local function. Local functions can only be called within the same method where the local function was declared. The main argument to use local functions instead of private methods is to restrict the functionality of a local function to the method where it was declared. As a result, no other

23 3. Theory

public class TestClassA public class TestClassB { { public void MethodA() public void MethodA() { { int num = 0; int num = 0; int MethodB(int number) int newNumber = MethodB(num); { } number++; return number; private int MethodB(int number) } { number++; int newNumber = MethodB(num); return number; } } } }

(a) A class with a local function. (b) A class with a private method.

Listing 3.2: Code snippets which illustrates the differences in implementation between local function and private method. method in the class can access and use the local function except for the one method that declared it. The usage of local functions does not violate any best practice rules. Even so, it is en- couraged to use private methods instead of local functions because local functions cannot be overridden, overlayered, or reused. The use of local functions increases the size of the method and thus decreases the understandability and readability factor because it most likely requires a higher effort in understanding the code [39, 44].

Embedded SQL X++ allows developers to execute SQL statements, either interactively or within source code, to access and retrieve data stored in the database. The following statements are available for data manipulation:

• SELECT: Select the data to modify,

• INSERT: Add one or more data to a table.

• UPDATE: Modify existing data in existing tables.

• DELETE: Remove existing data from a table.

A simple select statement fetches only one record or field data from the database. The data is stored in a table buffer variable. The following code illustrates how a developer can retrieve one single customer from table CustTable where its account number is less than 100: A while-select statement is used to handle data based on conditionals. It works by looping multiple records which meets the required criteria specified in the code. An example ofa while-select statement in X++ can be seen in listing 3.4 [43]: The above code loops through all records from the BankAccountTable where magicNumber is less than 1 and prints the value of AccountID and magicNumber to an InfoLog window. A while-select statement can be interpreted as the equivalent of while-loops in other programming languages such as C# or C++. However, a crucial detail is that boolean expressions in while- select statements are evaluated only once. Therefore, the behavior differs compared to C#

24 3.7. XML

static void SelectStatement() { CustTable customerTable; select * FROM customerTable where customerTable.AccountNum < 100; }

Listing 3.3: A select statement in X++.

static void WhileSelectStatement() { int magicNumber = 0; BankAccountTable xrecBAT; while select * from xrecBAT where magicNumber < 1 { magicNumber++; Global::info(strFmt(”%1, %2”, magicNumber, xrecBAT.AccountID)); } }

Listing 3.4: A while-select statement in X++. or C++ and cannot be interpreted as short-circuit operators [43]. This indicates that the while-select statement in listing 3.4 may have more than one iteration. For cyclomatic complexity, select statements does not contribute to the cyclomatic com- plexity since it can be interpreted as a method which returns a single object whereas a while- select statement adds one to the cyclomatic complexity regardless of boolean operators [39]. Other SQL statements such as insert, update and delete do not contribute to the cyclomatic complexity.

User-defined Classes The definition of a class in the object-oriented paradigm is a software construct which depicts behavior based on data and methods. The data represents the state of the construct and the methods represent the behavior of the construct. In X++, the interpretation of a class is ambiguous. The reason is that classes can be divided into Class and Form objects, where a form is a user interface item. Tables, extended data types and enums can also be interpreted as classes because one can directly instantiate it in the code. For this reason, the implementation of CBO includes all data types except primitive data types.

3.7 XML

Extensible Markup Language (XML) is a markup language for describing and storing data [7]. XML was developed and introduced in 1996 by the World Wide Web Consortium (W3C) and is derived from the Standard Generalized Markup Language (SGML), also known as ISO 8879 [37].

25 3. Theory

3.7.1 XPath XML Path Language (XPath) is a W3C recommended expression language used to identify and traverse nodes in XML documents [35]. The most essential feature of XPath is path expression, which provides the ability to traverse nodes hierarchically through XML documents with a tree structure. In addition, one can select nodes by a variety of criteria with predicates and XPath can also be used to compute values from the content of an XML document.

XPath Nodes As previously mentioned, XPath operates on an XML document as a tree of nodes. In XPath, there are seven types of nodes [42]:

1. Element: For every element in an XML document, there exists a corresponding element node. An element node is the content between an element’s start and end tags, defined by angle brackets. Given the following example of an XML document describing a movie and associated metadata: In listing 3.5, the topmost element of the tree is defined as the

Avatar James Cameron 2009

Listing 3.5: An example of an XML document describing a movie.

root node, in this case Cinema is the root node of the XML document. There also exist several relationship types between element nodes. For instance, every element node has one parent which has an arbitrary number of children. In the example above, Movie is the parent of Title, Director and Year which in turn are the children of Movie. Moreover, siblings, ancestors and descendants are all valid relationship type between element nodes.

2. Attribute: An attribute is owned by an element node and is designed to contain data related to the affiliated element node. Each attribute hasa name and a value.

Listing 3.6: An element node with attributes.

Listing 3.6 shows an element node of Movie with related attributes. This information is equivalent to listing 3.5.

3. Text: Text is the data between two enclosing tags. A text node always has at least one character of data and never has an immediately following or preceding sibling that is a text node. Example of text nodes are the data between the angle bracket tags in listing 3.5 for Title, Director and Year.

26 3.7. XML

4. Namespace: A namespace node is identifiable as the binding of a namespace URI (Uni- form Resource Identifier) to a namespace prefix. The syntax for namespace declaration is the following: xmlns:prefix=URI. Namespaces are used to uniquely name elements and attributes in XML documents with different sources and thus effectively avoid element name conflicts by using anamespace prefix.

5. Processing Instruction: Processing instructions (PI) is used to carry intended in- structions to the application and can occur anywhere in the XML document.

6. Comment: Comment nodes can be placed anywhere in the document and are defined as the content between two angle brackets combined with an exclamation mark and double dashes: .

7. Document: Each XML document is encapsulated to a document node, which means the entire document including all nodes (elements, processing instructions and comments).

Additionally, there are two important type of nodes which needs to be addressed: context node and current node. Context node is the node that the XPath processor is currently looking at and is altered during evaluation of a query. Meanwhile, the current node is the first node that the XPath processor is looking at when an evaluation of a query begins, and it does not change. In other words, each XPath expression uses the context node as its point of reference when navigating in the XML document tree.

XPath Axes An axis describes a relationship between the context node and nodes relative to it. Axes are used to locate nodes by their relationship relative to the context node. Table 3.7 presents thirteen axes supported by XPath.

27 3. Theory

Table 3.7: Description of axes supported in the XPath specification.

Axis Description All ancestors of the context node starting from its ancestor parent node to root node are selected. All ancestors including the context node are ancestor-or-self selected. Attributes of the context node are selected. Can be attribute abbreviated with the at sign (@). child Selects all children of the context node. Selects all descendants(children, grandchildren and descendant so forth) of the context node. Selects the context node and all its descendant-or-self descendants(children, grandchildren and so forth). Selects all nodes that appear after the context node, following except descendants, attribute and namespace nodes. following-sibling Selects all siblings after the context node. Selects all namespace nodes. This is deprecated namespace since XPath 2.0. parent Selects the parent of the context node. All nodes are selected before the context node, preceding except ancestors, attribute nodes and namespace nodes. preceding-sibling Selects all siblings before the context node. self Selects the context node.

Furthermore, the axes (except self and namespace) can be divided into two categories:

• Forward axes: Nodes that occur after the context node: child, descendant, attribute, descendant-or-self, following, following-sibling.

• Reverse axes: Nodes that occur before the context node: ancestor, ancestor-or-self, parent, preceding, preceding-or-self.

Location Path In order to select a node(or set of nodes), one can use location path, which is the most important kind of expression in XPath. Location paths are used to specify the exact path to the node/nodes of interest and consists of one or several location steps. Each location step has three components:

1. An axis which specifies the relationship between the nodes selected and the context node.

2. A node test which defines the name of the node or type (element, attribute, text, document, comment or processing instruction).

3. Zero or more predicates are used to further refine the set of nodes selected bythe location step.

28 3.7. XML

Axis and node tests are separated by two colons and predicates are enclosed by brackets. The syntax for a location path is: axis::nodetest[predicate]. However, beginning a location path with either / or // implies that the path starts at the document root node and thus making this an absolute location path. This means the root node becomes the context node for the first location step in the path. Given the following XML document in listing 3.7: A location path which selects all movies

Listing 3.7: An example of an XML document describing movies. released before 2015 can be seen in listing 3.8

/Cinema/Movie[@Year < 2015]

Listing 3.8: A location path which selects all movies released before 2015.

3.7.2 XQuery The most significant advantage XPath has is the simplicity. It is easy and convenient to specify a location path to select nodes in the XML tree. Unfortunately, that is also the biggest drawback because XPath does not support advanced queries such as nested iterations and recursive queries. Luckily, that is when XML Query Language (XQuery) comes into the picture. XQuery is a purely functional programming language developed by W3C and is used to retrieve and manipulate data in XML format [36]. XQuery is the successor of Quilt, a query language for XML, which in turn integrated several features from other languages such as XPath and Structured Query Language (SQL). This implies that XPath is a subset of XQuery and both of the languages share the same data model and support the same functions and operators. XQuery provides a versatile query statement called FLWOR. The term FLWOR is an acronym for the keywords: for, let, where, order by and return, which are a subset of many clauses introduced in XQuery. The statement is made up of the following clauses:

• For: Selects a sequence of nodes for iteration.

• Let: Bind values to variables.

• Where: Serves as a filter for the nodes.

• Order By: Value-based ordering of the nodes.

• Return: Determines what to return, and is evaluated once for every node.

These clauses come with great capability and flexibility to the language. The previous XML document in listing 3.7 contains three distinct element nodes describing different movies. An example of a FLWOR expression is seen in listing 3.9:

29 3. Theory

for $movie in /Cinema/Movie let $test := ”Binding variable” where $movie/@Year < 2018 order by $movie/@Title return

Listing 3.9: XQuery implementation to select all movies released before 2018.

The FLWOR expression selects all movies in the XML document that were released before 2018 and the result is ordered by title. It also illustrates the use case of binding values to variables and printing them to the result. The resulting XML is shown below:

These clauses provide great capability and flexibility to the language. Moreover, there are other several features that XQuery possess such as sorting, grouping, sliding window of the resulting query; conditional queries with logical operators such as IF-statements and recursion with user-defined functions. Although there are numerous built-in operators and functions in XQuery, some situations require more sophisticated implementations. Fortunately, an open-source XQuery function library called FunctX are available [13]. This library contains many functions in different areas such as strings, numbers, and sequences to name a few.

3.8 XLNT Framework

XLNT (X++ LaNguage Toolkit) is a framework written in C# that allows the user to hook into the X++ compiler to analyze the source code [45]. The implementation of the back-end in ASP.NET Core has a class library dedicated for generation of AST’s based on X++ source code files. By providing a path to X++ source code files, it leverages the XLNT framework by creating a metadata provider in order toextract metadata from X++ source code and a multi-pass administrator to compile the X++ source code. This will generate AST’s of every X++ source code file in XML format.

3.9 SocrateX

The following section describes the technology frameworks of SocrateX and its software archi- tecture.

3.9.1 ASP.NET Core ASP.NET Core is an open-source, cross-platform and cloud-optimized web framework for developing web applications [2]. The framework supports both the full .NET Framework runtime specifically for Windows and the cross-platform .NET Core runtime for and OSX.

3.9.2 Angular Framework Angular is an open-source platform and web framework developed by an Angular team at Google [15]. It was first released in 2010 under the name AngularJS, a JavaScript based

30 3.9. SocrateX front-end web framework. However, in 2014, AngularJS was completely rewritten to Angular (Angular v2+) by the very same team. There are several underlying changes in comparison between AngularJS and Angular, perhaps the most decisive change being that Angular is writ- ten in Microsoft’s TypeScript compared to JavaScript in AngularJS. TypeScript is transpiled to JavaScript and thus it is a superset of JavaScript and supports various features such as static typing, interfaces, class properties and accessibility levels. Up till now, five incremental updates have been released, Angular version 8 being the latest.

Architecture Overview Angular applications are modular, and the foundation of the modularity system are NgMod- ules. NgModules can be thought of as containers for a cohesive block of code that satisfy a specific application domain and can be comparable to C# namespaces and Java packages. Each Angular application has at least one root module. The initialization of an Angular application is done by bootstrapping the root module. An NgModule in turn consist of one or more components. Each component is accountable for the application logic and is associated with a template, which defines a part of the screen called view. Components may also use services which provides additional functionalities. The services can be injected into components as dependencies.

3.9.3 BaseX BaseX is an open-source, lightweight, high performance, and scalable XML database and an XQuery 3.1 processor which supports the W3C and Full Text extensions [4]. The database allows the user to store, query and process large corpora of textual data such as XML, JSON and CSV. BaseX is used to inject the processed XML sources of the AST’s source code into the database as a compilation is done.

3.9.4 SocrateX Architecture The software architecture of SocrateX can be divided into two parts: SocrateX front-end and SocrateX back-end. A high-level overview of the SocrateX architecture can be seen in figure 3.9.

Figure 3.9: A high-level overview of the SocrateX software architecture.

The SocrateX front-end is built with Angular 6. Users can write advance queries to the code base and convert X++ code to XML abstract syntax tree in real-time. The back-end is written in a .NET framework and exposes API endpoints for querying the XML database and for finding out which team a specific artifact belongs to. Furthermore, there are class libraries dedicated for converting X++ code to XML abstract syntax trees (AST). The conversion is achieved by running a script which takes X++ source code files from a folder and runs the

31 3. Theory

XLNT framework to compile the code, extract metadata and convert AST to XML format. These XML files are then stored in the XML database. At the moment, SocrateX is hosted on a server at Microsoft and the building of ASTs are done once every day.

3.10 Entity Framework Core

Microsoft’s primary means of interaction between .NET applications and relational databases was first released in 2008 under the name Entity Framework, which is an object-relational mapping (ORM) framework specifically for Windows. Since then, there have been five incre- mental releases. It was not until 2016 when Entity Framework Core was first released as a result of Microsoft’s decision to modernize, componentize and enable .NET cross-platform to Linux and OSX. Entity Framework Core (EFCore) is an open-source, lightweight, extensible, cross-platform ORM framework and is a complete rewrite of Entity Framework [11]. EFCore enables de- velopers to query and manipulate .NET objects, called entities, from a database using an object-oriented paradigm. EFCore supports numerous database providers such as SQL Server, SQLite, PostgreSQL and more. There are two development approaches with EFCore: Code-First and Database-First.

• Code-first: The code-first approach creates a database and tables from entity models defined by the developer.

• Database-first: Database-first approach constructs the entity models based on anex- isting database.

Furthermore, it is important to have an understanding of the following fundamentals when working with EFCore, code-first approach. The following points briefly describes these funda- mentals:

• DbContext: In EFCore, DbContext is a class which represents a session with the database. It can be interpreted as a data access layer and lets developers establish a database connection, configure model and relationships, querying data, tracking changes made to entities and transaction management to name a few.

• DbSet: DbSet is a class which represents a collection for a given entity and is mapped by default to database tables. The DbSet class acts as a gateway for database operations against a specific entity.

• Model: An EFCore model is a conceptual model of an application’s domain. The model is represented by .NET objects.

• Relationship: In the context of databases, a relationship is a connection between two database tables that are logically related to each other. There are three types of relation- ship: one-to-one, one-to-many and many-to-many. The relationship can be constructed by defining navigation properties, which is a type of property that cannot be mappedto a primitive or scalar type by the database provider.

• Fluent API: Fluent API allows for a more complex configuration of the entities.

• Migrations: Models might change during development and thus it is crucial that the database is in sync. The migration feature lets the database schema keep it in sync with the model while preserving existing data in the database. From the entities, an EF Core model is created with the framework. Lastly, a migration is executed which will create a database or update an existing one based on the model.

32 3.11. Power BI

3.11 Power BI

Data visualization and business intelligence (BI) are valuable assets in all types of organiza- tions. BI technologies aid enterprises to identify and develop new business opportunities as well as having a central point for tracking key performance indicators in their products. In 2011, Microsoft released a fully fledged business analytics service in Power BI, which is a collection of software services, applications and connectors that transform data into compre- hensible, immersive and interactive visualizations in reports and dashboards [32]. Power BI consists of four parts:

• Power BI Desktop: A Windows application for creating reports and data visualization based on the data from the data source.

• Power BI Service: An online SaaS (Software as a Service) service .

• Power BI Mobile Apps: Power BI mobile apps enables developers to stay connected to the their data.

• Power BI Report Server: An on-premise report server which allows developers to deploy Power BI reports.

There are a large range of data sources that are compatible with Power BI such as simple files (e.g. JSON, XML), various database providers (e.g. SQL Server, PostgreSQL), Azureor even web platforms such as Facebook and Google Analytics. To connect the data with Power BI, there are three available data connectivity modes which are:

• Import: Copies the data from the data source and import it locally into the Power BI cache. When published to Power BI report server, the data will reside in the Microsoft Data Center.

• Direct Query: Model schema without data is stored in Power BI Desktop, the data itself is staying in the data source. When creating visualizations in Power BI Desktop, queries will be sent directly to the data source in order to get the data values.

• Live Connection: Specifically for analysis services only.

With regard to databases with multiple relationship between the tables, Power BI preserves the schema and thus relationship filtering are possible. Additionally, Power BI supports data analysis expression (DAX), a functional programming language to further construct more complex data manipulations and thus allows for more sophisticated visuals for the visualization.

3.12 Related Work

It is interesting to see how the other players in the industry are addressing the use of software metrics and technical debt as a static code analysis tool. NDepend is a static code analysis tool for complex .NET code bases [27]. The tool allows developers to identify dependency graphs, technical debt and software metric values in the code base. The tool is integrated in Visual Studio and results are visualized in a report. The code base is considered as a database and thus allows developers to write user-defined LINQ queries that can be executed against the database. This allows developers to build a collection of best practice rules using LINQ queries that must always be maintained. SonarQube is another static code analysis tool developed by SonarSource to identify bugs, technical debt, code smells, software metric values, security vulnerabilities and supports over 20 programming languages [38]. SonarQube can be integrated with various DevOps tools such as Maven, Gradle, Jenkins and Azure DevOps. The values are visualized in a dashboard.

33 3. Theory

There is also an available plugin which implements the SQALE Method for SonarQube as well. PMD Source Code Analyzer is a static code analysis tool used to enforce best practice rules by scanning and evaluating the source code [30]. PMD supports eight programming languages but it focuses mostly in Java. Abstract syntax trees (AST) of Java source files are generated using JavaCC which are conveniently effective for static code analysis. The reason whyPMD is presented here is due to the fact best practice rules can be defined with either using Java or using XPath expression to assess the ASTs. This is similar to SocrateX where there are best practice rules defined in XPath which can be queried to the code base to find violations. Gordana Rakić and Zoran Budimac presented a software metric tool in response to the limitations of the current software metric tools available in the software engineering field [33]. One of the reasons is that the static code analysis tools available in the industry are generally not independent on programming language or underlying platform. This leads to inconsistency because different tools are being used for different projects. The authors developed anearly prototype in Java which achieved platform and programming language independence along with support of wide-ranging software metrics. To achieve language independence, abstract syntax trees are a good starting point. The problem is that software metric algorithms are sensitive to input programming language syntax. Therefore, they used enriched concrete syntax trees (eCST) by using the ANTLR parser generator. eCST is a modification of concrete abstract syntax trees and are stored in an XML structure. eCST’s contains concrete source code elements attached to corresponding language elements and additional information are stored as universal nodes. These eCST’s are then used to calculate software metric values. Truong, Roe & Bancroft in their research paper introduced a static code analysis framework which calculates software metric and analyzes structural similarity for student’s Java program using an XML representation of the program’s abstract syntax tree (AST) [41]. It leverages the ANTLR parser to generate an XML representation of a source code’s AST which is then analyzed by executing software metric algorithms and are implemented in Java. The XML elements determine which metric to invoke and uses Java reflection. Nödler, Neukirchen & Grabowski introduced in their paper an XQuery-based software code analysis framework [28] to calculate metrics and detect code smells in Java source code, UML models and Testing and Test Control notation (TTCN-3). The framework leverages the use of facade design pattern to intermediate the concrete analysis targets i.e. programming languages and the actual analysis. A facade layer interface is used as a fixed interface for the analysis layer and provides for each underlying concrete analysis target a corresponding implementation of the facade layer interface. XML format is the universal representation of any software artifact in this framework. The authors highlighted several approaches to encode source code to XML which are JavaML (Java Markup Language), a self-describing representation of Java source code in XML which is achieved by using the Jikes Java compiler framework; srcML (Source Code Markup Language), an XML representation of source code with all constructs and information preserved; XMI (XML Metadata Interchange), a standardised XML language for exchanging metadata information such as metamodels and lastly, mapping abstract syntax trees to XML. The analysis of software artifacts is done using XQuery, and as a result of the facade layer, the XQuery expression works for all concrete analysis targets as long as it adheres to the facade layer interface. Mendonça et al. introduced a refactoring framework called ReFax in their paper, which works over an XML representation of source code [26]. The approach taken is to define so called pre-condition, responsible for guaranteeing the refactoring operation legitimacy. This could for example be code smells and can be found using XQuery and XPath as introduced in this thesis. Next, XUpdate is used which is a query language for modifying XML data, to modify the source code. Finally, post-conditions are evaluated which verifies if the modified source code still preserves its external behaviour. Although their framework does not take software metrics nor technical debt into consideration, it is still relevant because of the technologies and approach used in their framework.

34 3.12. Related Work

Anders Tind Sørensen presented in 2005 his thesis about software metrics relevant to X++ code. He investigated the X++ code, specifically the language features such as embedded SQL, and implemented metrics from the Chidamber and Kemerer metric suite and other traditional metrics. The implementation was integrated into an existing best practice tool, which allows developers to analyze their code and investigate if improvements are possible in their current implementation. Coincidentally, his thesis was carried out at Microsoft Development Center in Copenhagen [39].

35

4 Method

The following chapter describes the methodology used for this thesis and its corresponding implementation. Note, the term ’artifact’ in the following sections, corresponds to traditional classes, tables and forms in MD365FO.

4.1 Approach

This sections describes the approach taken in this thesis.

4.1.1 Constructs in XML abstract syntax tree In order to fully understand the applicability of software metrics and technical debt on X++ abstract syntax tree (AST) in XML format, one must understand the available constructs and information that can be derived from the tree. As previously mentioned in section 3.8, the conversion of an X++ source code to AST and the serialization to XML are done using the XLNT framework [45]. This had already been done and stored in an XML database prior to the start of this thesis. Unfortunately, it was not possible to retrieve a list of all constructs that are made available after the conversion of source code to AST which is crucial for the implementation of software metrics and technical debt. Therefore, it had to be investigated manually. In section 3.9.4, it was mentioned that users can convert X++ source code to AST in XML format in real-time using the semantic search tool, SocrateX. This has been used extensively in order to find out which constructs occurring in the source code are available. The constructs are represented as element nodes in the XML document of an X++ AST source code. As there are too many constructs to cover, table 4.1 shows five examples of constructs derived from the X++ AST in XML format. The dots in the element nodes corresponds to additional attributes that are derived from the XML AST such as start and end columns of where the constructs occur in the source code, types, variable names and so on.

37 4. Method

Table 4.1: Examples of constructs derived from the XML abstract syntax tree of X++ code.

Corresponding XML Construct element nodes

Member variables outside methods

Calling methods from an object

While Select

Try-Catch ….

Local functions

4.1.2 Applicability of software metrics After investigating what constructs can be derived from the AST, one must find suitable software metrics. Therefore, research of which software metrics are applicable to X++ source code had to be conducted by reading and studying various papers, discussion forums and static code analysis tools available in order to gain more knowledge and insights. This includes Chidamber & Kemerer metric suite as well as the static code analysis tool, SonarQube. The software metrics that have been chosen in this thesis are in the source code level category, meaning all information required to calculate and analyze a specific software metric can be obtained directly from the source code. This can be referred to the traditional metrics such as source lines of code and comment percentage, described in section 3.3.1. Moreover, there is information that can be obtained from the X++ AST in XML format, specifically attributes of element nodes. One example is the attribute Extends in element nodes which is available in every artifact’s corresponding XML AST that specifies the name of the artifact it extends. This enables additional possibilities to adopt different software metrics such as number of children metric from Chidamber & Kemerer metric suite described in section 3.3.2. After identifying a set of software metrics to be used in this thesis, the next step was to implement each metric using XQuery. This is further described in section 4.2.

38 4.2. Implementation

4.1.3 Applicability of technical debt There are a set of best practice rules available for X++ that engineers need to consider when writing production quality code. The question is how best practice rules can be converted to a technical debt. In section 3.5, the SQALE method was introduced. The method states that in order to estimate technical debt, one has to associate best practice rules with a corresponding quality characteristic, remediation function and remediation detail. A total of 13 best practice rules were identified and each associated with a quality char- acteristic, remediation function and remediation detail. This has been done together with engineering managers working on the MD365FO product at Microsoft to make the technical debt as accurate as possible. The process was to go through a set of best practice rules for X++ and pair each rule with an associated quality characteristic. Moreover, a remediation detail and an approximate remediation function had to be defined and associated for each rule. The best practice rules are compatible with the constructs of the X++ AST in XML format, meaning these rules can be implemented using XQuery. As previously mentioned in section 3.7.2, XQuery is a great and efficient query language to search and retrieve information from XML documents. In this case, XQuery is used to find in the X++ AST in XML format, information that violates best practice rules. This means that the number of violations for each best practice rules are counted which in turn can be converted to a technical debt based on the remediation function. See section 4.2.3 for implementation details.

4.1.4 Integration with SocrateX Section 3.9.4 described the architecture of the semantic search tool, SocrateX. After identifying the software metrics to be used in this thesis as well as the best practice rules, the question is how can this be integrated into SocrateX. The approach is to introduce an additional web service within the SocrateX back-end which will be responsible for creating the XQuery code and retrieving the information from the XML database. The web service can expose HTTP endpoints to initiate the migration of the database and querying the XML database with the XQuery code for the software metrics and best practice rules. Furthermore, a database has to be created to store the results obtained from the XQuery code. Because the results are structured, a relational database is suitable and, in this case, SQL Server is used.

4.2 Implementation

The first three subsections of this section presents the mappings from code constructs to XML, XQuery implementations for the software metrics and best practice rules for technical debt using the SQALE Method as described in section 3.3 and 3.5. Section 3.9.4 introduced the current architecture for SocrateX. Hereafter, section 4.2.4 describes the implementation for the new microservice focused on static code analysis and how it is integrated with the current architecture of SocrateX. Lastly, section 4.2.5 presents how Power BI leverages the code analysis data stored in the relational database.

4.2.1 Traditional metrics The following sections describes how each metric has been implemented with XQuery and XPath.

Source Lines of Code (SLOC) Source lines of code (SLOC) metric counts the number of lines in the source code. Each artifact in the XML database has an attribute labelled as source, which consist of the whole source code as a string. The XQuery implementation is shown in listing 4.1.

39 4. Method

1 declare variable $slocMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table | /Form 3 let $sloc := fn:count(functx:lines($a/@Source)) 4 return map { $a/@Artifact/string() : $sloc } 5 ); 6 7 declare function local:SourceLinesOfCode($artifact) 8 { 9 let $artifactName := $artifact/@Artifact 10 return if(map:contains($slocMap, $artifactName)) then map:get($slocMap, $artifactName) 11 else 0 12 };

Listing 4.1: XQuery implementation for SLOC.

The implementation is straightforward. A variable called slocMap calculates the number of lines in the source code and store each artifact name as a key along with its associated SLOC. The source attribute splits the string into individual lines using the function lines from the FunctX namespace and thus the number of lines can be calculated with the count function. Thereafter, the function local:SourceLinesOfCode takes in an artifact and passes its name to the map. If it exists in the map, the SLOC is returned.

Comment Percentage (CP) The comment percentage (CP) metric requires the number of comments and the total number of lines in the source code. The implementation for this metric can be seen in listing 4.2.

1 declare variable $commentsMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table | /Form 3 for $line in functx:lines($a/@Source) 4 where functx:contains-any-of($line, ('//', '/*', '*/', '///')) = true() 5 group by $a := $a/@Artifact/string() 6 return map {$a : fn:count($line)} 7 ); 8 9 declare variable $slocMap as map(xs:string, xs:integer) := map:merge( 10 for $a in /Class | /Table | /Form 11 let $sloc := fn:count(functx:lines($a/@Source)) 12 return map { $a/@Artifact/string() : $sloc } 13 ); 14 15 declare function local:CommentPercentage($artifact) 16 { 17 let $sloc := map:get($slocMap, $artifact/@Artifact) 18 let $comments := map:get($commentsMap, $artifact/@Artifact) 19 return if($sloc = 0 or fn:empty($comments) or fn:empty($sloc)) then ( 0 ) 20 else( let $cp := 100 * ($comments div ($sloc) ) 21 return $cp 22 ) 23 };

Listing 4.2: XQuery implementation for CP.

40 4.2. Implementation

First off, the number of comments in source code for all artifacts are calculated andstored in a variable, commentsMap. The difference in implementation compared to SLOC is that all lines of code containing //, /*, */ or /// are considered. Secondly, the same implementation for SLOC has been used here. Lastly, the function local:CommentPercentage takes in an artifact and obtains the values from the maps for SLOC and number of comments respectively. Unfortunately, the implementation does not consider multiline comments due to time re- strictions and complications with XQuery. In hindsight, one could have used regular expres- sions of the source code to remove all types of comments.

Number of Methods (NOM) The implementation for number of methods (NOM) metric simply counts the methods includ- ing local functions in the source code.

1 declare variable $nomMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table | /Form 3 let $numMethods := fn:count($c/Method) 4 let $numLocalFunctions := fn:count($c/Method/LocalDeclarationsStatement/FunctionDeclaration) 5 let $NOM := $numMethods + $numLocalFunctions 6 return map { $a/@Artifact/string() : $NOM } 7 ); 8 9 declare function local:NumberOfMethods($artifact) 10 { 11 let $artifactName := $artifact/@Artifact 12 return if(map:contains($nomMap, $artifactName)) then map:get($nomMap, $artifactName) 13 else 0 14 };

Listing 4.3: XQuery implementation for NOM.

Here, the summation of number of methods and local functions are stored as value with its associated artifact name. The metric does not take getters and setters into consideration.

Cyclomatic Complexity (CC) As previously mentioned in section 3.3.1, the cyclomatic complexity can be calculated by counting the branching and looping constructs of the source code. Each branching and looping construct are defined as element nodes in the XML representation of the X++ source code, see listing 4.4 for the implementation. The input argument for the local:CyclomaticComplexity function is a method element node from the XML abstract syntax tree of an artifact’s source code. Each method element node contains additional element nodes that makes up the method. Here, the function simply counts the number of branching and looping constructs which contributes to cyclomatic complexity.

4.2.2 Chidamber & Kemerer Metric Suite The following sections describes how each metric from Chidamber & Kemerer metric suite has been implemented with XQuery and XPath.

41 4. Method

1 declare function local:CyclomaticComplexity($m) 2 { 3 let $complexity := 1 + fn:count($m/descendant::IfStatement) + ↪ fn:count($m/descendant::IfThenElseStatement) + fn:count($m/descendant::WhileStatement) + ↪ fn:count($m/descendant::DoWhileStatement)+ fn:count($m/descendant::ForStatement) + ↪ fn:count($m/descendant::SearchStatement) + fn:count($m/descendant::CaseValues)+ ↪ fn:count($m/descendant::ConditionalExpression) + ↪ fn:count($m/descendant::TryStatement/descendant::CatchExpression) + ↪ fn:count($m/descendant::TryStatement/descendant::CatchAllValues) 4 5 return $complexity 6 };

Listing 4.4: XQuery implementation for CC.

Weighted Method per Class (WMC) Weighted method per Class (WMC) calculates the total complexity of all methods within a class. The weight metric to determine complexity is McCabe’s cyclomatic complexity (CC) metric. Therefore, the implementation for WMC is similar to CC, see listing 4.5.

1 declare variable $wmcMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table | /Form 3 let $complexity := 1 + fn:count($a/descendant::IfStatement) + ↪ fn:count($a/descendant::IfThenElseStatement) + fn:count($a/descendant::WhileStatement) + ↪ fn:count($a/descendant::DoWhileStatement) + fn:count($a/descendant::ForStatement) + ↪ fn:count($a/descendant::SearchStatement) + fn:count($a/descendant::CaseValues) + ↪ fn:count($a/descendant::ConditionalExpression) + ↪ fn:count($a/descendant::TryStatement/descendant::CatchExpression) + ↪ fn:count($a/descendant::TryStatement/descendant::CatchAllValues) 4 group by $artifactName := $a/@Artifact/string() 5 return map { $artifactName : fn:sum($complexity)} 6 ); 7 8 declare function local:WeightedMethodPerClass($a) 9 { 10 let $artifactName := $a/@Artifact 11 return if(map:contains($wmcMap, $artifactName)) then map:get($wmcMap, $artifactName) 12 else 0 13 };

Listing 4.5: XQuery implementation for WMC.

The difference is the complexity is the aggregate CC of all methods within a single artifact. The variable wmcMap is a map and holds the artifact name as key and its WMC as value. The implementation counts all descendant element nodes of the branching and looping constructs from the artifact’s root and then summed together.

42 4.2. Implementation

Depth of Inheritance tree (DIT) Each artifact has an attribute labelled Extends which contains the artifact name it extends from. Therefore, the implementation for depth of inheritance (DIT) metric utilizes this at- tribute to count the number of ancestors an artifact has.

1 declare variable $ditMap as map(xs:string, xs:string) := map:merge( 2 for $a in /Class | /Table | /Form 3 return map {$a/@Name/string() : $a/@Extends/string()} 4 ); 5 6 7 declare function local:recurseDepth($artifactName, $count) 8 { 9 if(map:contains($ditMap, $artifactName) and map:get($ditMap, $artifactName) != ””) then 10 (local:recurseDepth(map:get($ditMap, $artifactName), $count + 1)) 11 else 12 $count 13 }; 14 15 declare function local:DepthOfInheritance($artifact) 16 { 17 let $artifactName := $artifact/@Name 18 return local:recurseDepth($artifactName,0) 19 };

Listing 4.6: XQuery implementation for DIT.

Listing 4.6 shows the implementation for DIT metric. First, the ditMap variable holds for each artifact, the artifact name it extends from. Secondly, the function lo- cal:DepthOfInheritance takes in an artifact and passes in the artifact’s name to the recursive helper function called recurseDepth. The recursive function checks if the artifact name passed into the function exist in the map. If it does, the corresponding value is then passed into the recursive function and a counter is incremented. The counter will, ultimately, be the number of ancestors for the initial artifact.

Number of Children (NOC) The number of children (NOC) is the number of immediate sub-classes an artifact has. This can be thought of as a grouping problem where all children for an artifact is of interest. The implementation can be seen in listing 4.7. As previous implementations, the use of maps is highly regarded because of performance. In this case, the group-by clause is used which in XQuery, concatenates values of all non- grouping variables that belong to a specific group. Here, the value is assigned to a grouping variable called extends which is the artifact’s name that the current evaluated artifact extends to. Next, the map sets the grouping variable as key and the count of all artifacts extending this grouping variable is assigned to the value.

Coupling Between Objects (CBO) Coupling between objects (CBO) counts the number of unique reference types excluding prim- itive data types that occur through method calls, method parameters, variable in methods and field declarations. The element nodes for the constructs does not distinguish whattypeit is. However, all constructs have an attribute called Type, which indicates what data type the

43 4. Method

1 declare variable $nocMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table 3 group by $extends := $a/@Extends/string() 4 return map { $extends : fn:count($a) } 5 ); 6 7 declare function local:NumberOfChildren($artifact) 8 { 9 let $artifactName := $artifact/@Name 10 return if(map:contains($nocMap, $artifactName)) then map:get($nocMap, $artifactName) 11 else 0 12 };

Listing 4.7: XQuery implementation for NOC.

construct is and therefore, it is possible to differentiate whether it is a primitive data type or not. Listing 4.8 is the implementation for CBO. The approach taken here is to declare a sequence of strings describing the primitive data types. The next step is to gather all variables in methods, method parameters, method calls and field declarations which is done using location paths. All location paths have a predicate from the FunctX namespace to only select nodes whose type attribute is not in the primitive data type sequence.

Response for a Class (RFC) The response for a class (RFC) metric counts the number of methods and the number of methods they directly call in an artifact. This is calculated simply by counting the number of methods including local functions in the source code, and also count the distinct method calls within these methods. The implementation is shown in listing 4.9. Location paths for the methods, local functions and method calls are used. The function distinct-values selects only distinctive method calls.

4.2.3 SQALE Method for Technical Debt As previously mentioned in section 3.5, a debt is created when a requirement is violated. First, requirements need to be defined. As previously mentioned, in X++, there are a variety ofbest practice rules that engineers need to consider when writing production quality code. Currently, SocrateX already provides sample queries of best practice rules which can be executed on the whole X++ code base. However, these queries need to be rewritten to reflect the SQALE method. Thirteen best practice rules have been rewritten to be compatible with the microservice presented in section 4.2.4. Since there is a large range of diverse best practice rules depending on organization, projects and programming language, it is not necessary to include all of them in this report. Nonetheless, the following code in listing 4.10 presents one of the best practice rules currently in X++. In X++, replaceable methods are methods that partners can replace. There is one catch - if a replaceable method references private or internal method or member variables within the source code, then the method becomes inaccessible to partners because they cannot access internal or private code blocks. This means the requirement is that no artifact should have replaceable methods that references private/internal code blocks. Recall that the aim of all remediation functions is to convert the amount of times a requirement has been violated into a remediation cost. The implementation in listing 4.10 creates a map, replaceMap, that counts

44 4.2. Implementation

1 declare variable $pdt := ('boolean', 'date', 'guid', 'anytype', 'int', 'real', 'str', 'timeOfDay', 'utcdatetime'); 2 3 declare variable $cboMap as map(xs:string, xs:integer) := map:merge( 4 for $a in /Class | /Table | /Form 5 let $fieldCount := fn:count(fn:distinct-values($a/FieldDeclaration[functx:is-value-in- ↪ sequence(@Type,$pdt)=fn:false()]/@Type)) 6 7 let $varCount := ↪ fn:count(fn:distinct-values($a/Method/LocalDeclarationsStatement/VariableDeclaration[functx:is- ↪ value-in-sequence(@Type,$pdt)=fn:false()]/@Type)) 8 9 let $paramCount := ↪ fn:count(fn:distinct-values($a/Method/ParameterDeclaration[functx:is-value-in-sequence(@Type, ↪ $pdt)=fn:false()])) 10 11 let $methodCallCount := ↪ fn:count(fn:distinct-values($a/descendant::QualifiedCall[functx:is-value-in-sequence(@Type, ↪ $pdt)=fn:false()])) 12 13 let $cbo := $fieldCount + $varCount + $paramCount + $methodCallCount 14 return map { $a/@Artifact/string() : $cbo } 15 ); 16 17 declare function local:CouplingBetweenObjects($artifact) 18 { 19 let $artifactName := $artifact/@Artifact 20 return if(map:contains($cboMap, $artifactName)) then map:get($cboMap, $artifactName) 21 else 0 22 };

Listing 4.8: XQuery implementation for CBO.

1 declare function local:ResponseForClass($artifact) 2 { 3 let $methods := $artifact/Method 4 let $localFunctions := $methods/LocalDeclarationsStatement/FunctionDeclaration 5 let $methodCalls := fn:distinct-values($methods/descendant::QualifiedCall/@MethodName) 6 return fn:count($methods)+fn:count($localFunctions)+fn:count($methodCalls) 7 };

Listing 4.9: XQuery implementation for RFC.

the number of methods within an artifact that references private or internal methods and member variables. The count is the number of violations of the requirement and thus it can be multiplied with the remediation function. The quality characteristic associated with this particular rule is changeability and the remediation function is one minute per violation since the only action that needs to be done is to change the access modifiers of member variables/methods from private to protected.

45 4. Method

1 declare variable $replaceMap as map(xs:string, xs:integer) := map:merge( 2 for $a in /Class | /Table | /Form 3 for $m in $a/Method 4 let $members := $m/descendant::SimpleField[@Name !='this'] 5 let $calls := $m/descendant::QualifiedCall 6 where ($members/@Name = $a/FieldDeclaration[@IsPrivate='true' or @IsInternal='true']/@Name or ↪ $calls/@MethodName=$a/Method[@IsPrivate='true' or @IsInternal='true']/@Name) and ↪ $m/@IsReplaceable='true' 7 group by $a := $a/@Artifact/string() 8 return map{$a : fn:count($m)} 9 ); 10 11 declare function local:ReplaceableMethods($artifact) 12 { 13 let $artifactName := $artifact/@Artifact 14 return if(map:contains($replaceMap, $artifactName)) then map:get($replaceMap, $artifactName) 15 else 0 16 };

Listing 4.10: XQuery implementation for Replaceable Methods rule.

4.2.4 Web service - Static Code Analysis A new ASP.NET Core web service was created within the SocrateX back-end repository. The sole purpose of this service is to calculate software metrics and evaluate code rules for every artifact in the code base of MD365FO. Subsequently, the values generated need to be stored in a relational database so it can later be integrated in Power BI. The updated software architecture for SocrateX can be seen in figure 4.1.

46 4.2. Implementation

Figure 4.1: A high-level overview of the new updated SocrateX architecture.

The code analysis service calls the SocrateX service through an HTTP call with an XQuery built to calculate software metrics and code rules. In return, the code analysis service receives an HTTP response in XML format containing the calculated values and metadata for every artifact in the code base. The response is later parsed into C# plain old CLR objects (POCO) and the relationship between each POCO is constructed, this is known as the conceptual model of the application’s domain as mentioned in section 3.10. During this process, HTTP requests are being executed to an ownership API endpoint which returns the team name and its metadata for each artifact. The reason is that each artifact in the X++ code base follows a naming convention and the team of an artifact can be determined based on the prefix or suffix of an artifact’s name and that is what theown- ership API endpoint does. Thereafter, the POCOs are inserted into the relational database which in turn connects to Power BI for visualization. To maintain a modular folder structure in the repository, the code analysis part is implemented in a separate class library, called CodeAnalysis. The following sections presents a more in-depth details of the steps.

Mapping Queries to JSON-format C#, and F# are the programming languages which can be used to build ap- plications and libraries in ASP.NET Core. It does not support XQuery as a programming language and there are no open-source libraries to enable the use of XQuery in the ASP.NET Core environment. In any case, it is possible to load XML files in-memory and that enables the usage of XPath expressions which are supported within the System.Xml namespace. However, it comes with a drawback. That is, the number of artifacts in the XML database are over 90 000 and it would not be convenient to load all XML files in-memory. Furthermore, it defeats the purpose of using a database and thus only a subset of the XML database was

47 4. Method utilized. Additionally, XPath expressions are not as sophisticated as XQuery. Most of the implementation cannot be replicated from XQuery to XPath even though their grammar and language descriptions are generated from a common source for consistency. The results of the queries implemented in section 4.2 are of interest and the information regarding the software metrics and the rules are as well. A solution for this is to create JSON files for each software metric and code rule which contains relevant metadata for the metric/rule and its XQuery implementation by taking advantage of attribute-value pair feature of JSON. The reason is that serializing and deseri- alizing between JSON and .NET objects are rather convenient with Newtonsoft JSON.NET framework. Listing 4.11 presents the JSON text for depth of inheritance metric.

{ public class MetricDTO ”Name”: ”Depth of Inheritance”, { ”Description”: ”Measures the inheritance upon which a ↪ class was built.”, string Name {get; set;} ”VariableName”: ”dit”, string Description {get; set;} ”FunctionName”: ”local:DepthOfInheritance”, ”Query”: string VariableName {get; set;} ”declare variable $classMap as map(xs:string, xs:string) := string FunctionName {get; set;} ↪ map:merge( string Query {get; set;} for $c in /Class | /Table return map {$c/@Name/string() : $c/@Extends/string()} } );

declare function local:recurseDepth($className, $count) (b) Data transfer object for software metrics. { if(map:contains($classMap, $className) and ↪ map:get($classMap, $className) != '') then (local:recurseDepth(map:get($classMap, $className), ↪ $count + 1)) else $count };

declare function local:DepthOfInheritance($class) { let $className := $class/@Name return local:recurseDepth($className,0) };” }

(a) Mapping of Depth of Inheritance metric to JSON-format.

Listing 4.11: Depth of Inheritance metric implementation and metadata in JSON format and a data transfer object for software metrics.

It works by converting the JSON text in listing 4.11 into an equivalent data transfer object (DTO) which in turn can be used to build the query. Each JSON file is added into the class library as an embedded resource, which indicates that the files are part of the application and can be accessed at run-time. The next section describes how each metric/rule can be used to build a query that calculates software metric and code rule for every artifact in the XML database.

XQuery Code Builder The interest lies in each and every artifact stored as XML format in BaseX. Consider a set of five software metrics that needs to be evaluated. In theory, one has to dedicate fivedifferent HTTP request for each artifact which is not efficient at all since it increases the likelihood of e.g. timeouts. Moreover, if one process fails while the number of requests increases could potentially lead to outright failure because the resources (CPU/threads/memory etc.) may be exhausted as a result.

48 4.2. Implementation

Fortunately, one can take advantage of FLWOR expressions available in XQuery. It is possible to obtain the data with a single HTTP request by concatenating the query implemen- tations into the following format:

1 /* Definitions and declarations of all functions */ 2 /* ..... */ 3 4 for $c in /Class | /Table | /Form 5 return 6 7 { 8 let $metric := 9 { 10 let $doi:=local:DepthOfInheritance($c) 11 let $sloc:=local:SourceLinesOfCode($c) 12 return 13 14 15 16 17 } 18 19 let $rules := 20 { 21 let $cwd:=local:ClassesWithoutDocumentation($c) 22 let $todo:=local:TODOComments($c) 23 return 24 25 26 27 28 } 29 30 return ($metric, $rules) 31 } 32

Listing 4.12: XQuery code generated from the JSON files.

The generated XQuery code calculates two software metrics and two code rules for every artifact available in the XML database. By utilizing the property values of the DTOs to perform string manipulations the generation of the code in listing 4.12 is possible. Thereafter, it can be used to query the XML database. Listing 4.13 shows an example of the result for one single class. Each artifact and its associated metrics and rules are enclosed within the angle bracket tags for result element node. The descending element nodes for each result are the metrics and rules along with the values.

Fetching Data Recall the updated architecture of SocrateX in figure 4.1. The SocrateX service exposes API endpoints to query the XML database. Currently, the code analysis service is a standalone service in the same repository as the SocrateX service. Ultimately, they should be separated into two distinctive services, and thus interservice communication is compulsory. There are two messaging patterns that microservices can use to communicate with each other, synchronous communication and asynchronous message passing.

49 4. Method

Listing 4.13: Result of the generated query for class ProdJournalCheckPost.

Here, synchronous communication using the HTTP protocol is used. The code analysis service calls an API that the SocrateX service exposes, leveraging the use of IHttpClientFactory in ASP.NET Core to make HTTP requests. The generated XQuery code as seen in listing 4.12 is sent to the SocrateX service for evaluation. Since there are thousands of artifacts, the resulting response XML in listing 4.13 can be extensive. Storing the whole response in a string will end up on the large object heap which is not applicable at all. Fortunately, it can be streamed and sent directly to the parser.

Parsing the response In reference to listing 4.13, the response is in XML format. Each artifact needs to be parsed to entities in the form of plain-old CLR objects (POCO) and can be seen in the next section. The streamed response is used to instantiate an XmlReader to read the XML data from the stream. The major advantages with using XmlReader are speed and performance. It provides forward-only, read-only access to XML data in a document or a stream. Therefore, it does not load the whole XML tree into memory. Instead it reads the content of each node one by one. Moreover, the reader supports selective processing which is a fundamental feature when traversing through the XML data. It allows one to skip nodes and only process nodes of interest by element name. In this case, the interesting element nodes are Result, Metricvalues and Rulevalues along with its children.

Database In the previous section, it was introduced that the response is parsed into entities. The following entities have been constructed:

• Team: This entity holds metadata of each team available in MD365FO as well as the artifacts associated with it.

• XppClass: The entity of an artifact of the X++ code base. This can either be a class, a table or a form and holds multiple MetricValue and RuleValue entities.

• Metric: Entity that corresponds to a software metric.

• Rule: Entity that corresponds to a requirement/best practice rule.

50 4.2. Implementation

• MetricValue: An entity which holds one single value of a specific metric

• RuleValue: An entity which holds one single value of a specific requirement/best prac- tice rule.

Figure 4.2 shows the database schema for the entities and the relationships between them:

Figure 4.2: The database schema of the entities for static code analysis.

A Team has a one-to-many relationship with XppClass. The reasoning behind this approach is that multiple artifacts belong to one team based on the name of the artifact. MD365FO constitutes a large code base of over 90 000 artifacts and thus it is compelling to group artifacts based on its responsibility in terms of functionality and feature. For instance, artifacts starting the prefix ’WHR’ belongs to a team called ’Warehouse And Transportation’. XppClass has in turn one-to-many relationships with MetricValue and RuleValue. The generated XQuery described in section 4.2.4 evaluates multiple metrics and requirements for each artifact and the MetricValue and RuleValue simply holds the values returned. Each MetricValue and RuleValue entity has a one-to-many relationship with Metric and Rule respectively. Before insertions of entities are possible, a migration has to be made based on the configured DbContext. DbSet for all entities are defined and the relationship between the entities are configured using Fluent API. It is used by overriding the OnModelCreating method. The configured DbContext class is then dependency injected into the Startup.cs and only then, a migration can be made.

51 4. Method

4.2.5 Power BI It was previously described that the entities making up the model were inserted into a relational database and thus, the next step is to connect the database as a data source to Power BI. This has been done using the Import method to ensure high performance. A decisive feature in Power BI is the ability to preserve the relationships between entities when using a database as data source. Not only does it allow complex relationships between entities, it also supports cross-filtering and the data reflect the changes between entities aswell. DAX expressions have been used to calculate the technical debt of the requirements avail- able for each artifact as described in section 3.5.1. Since the remediation functions are all defined in minutes, the approach is to multiply the number of occurrences with the remedia- tion function for the associated rule. Since the relationship between entities are preserved in Power BI, the data changes will also reflect on the XppClass entity. The code for calculating the database can be seen in listing 4.14.

Technical Debt = var minutes = SUMX(RuleValues, RuleValues[Occurence] * RELATED(Rules[RemediationFunction])) var numberOfDays = INT(minutes/1440) var numberOfHours = INT(MOD(minutes,1440)/60) var numberOfMinutes = MOD(MOD(minutes,1440),60)

return numberOfDays&”d ” &FORMAT(numberOfHours,”#00”)&”h ”&FORMAT(numberOfMinutes,”#00”)&”m”

Listing 4.14: DAX expression to calculate technical debt in the format DD-HH-MM.

The structure of the visualization report is broken down into five parts:

• Overview page: A complete overview of the code base with a linear plot of the technical debt based on compilation date, bar charts by quality characteristic, tree map of teams, metric values and additional metadata.

• Rule page: An in-depth page dedicated to the requirements.

• Metric page: An in-depth page dedicated to the metrics.

• Team page An in-depth page dedicated to the teams.

• Artifact page: An in-depth page dedicated to the artifacts.

Finally, the report is deployed to the Power BI Report Server and can then be embedded into any web application.

4.2.6 Report Module in Angular The front-end part of SocrateX is based on Angular 6. The code and folder structure of the front-end have been refactored to provide more scalability and maintainability and also focuses on a multiple-module architecture. This means, the existing user-interface has been refactored into one module, socratex-editor. This module is responsible for the querying to the XML database and conversion of X++ code to XML and therefore, all components and services associated with it are within this module as well as the templates. Afterwards, a new module is introduced, socratex-report. This module will conveniently contain one component with the embedded report created in Power BI. By default, the modules are eagerly loaded. This means that as soon as the web application loads, so does the modules even though they are not necessarily needed immediately. In this particular case, the socratex-editor will be the default module when the web application is

52 4.2. Implementation initiated. The other module, socratex-report will be lazy-loaded instead, meaning the module will only be loaded when it is requested from the user. A toggle button has been implemented in the header of the web application to enable users to switch between the editor and the report with the default being the editor.

53

5 Results

The implementation was executed according to what is described in section 4.2. The result was a web service in ASP.NET Core responsible for converting multiple code implementation written in XQuery that calculates software metric values and number of best practice rule violations for each artifact, into one piece of XQuery code. The resulting code is then used to query the XML database and the values returned are later stored in a SQL Server database. Note that as previously mentioned in section 4.2.4, the generated code was queried on a subset of the XML database, meaning not all X++ artifacts were evaluated. The data from the database was imported to Power BI Desktop for analysis, visualization and creation of the interactive report. The report was deployed to Power BI Report Server and later embedded into SocrateX. Therefore, the front-end part of SocrateX has been extended accordingly by providing a toggle button at the top bar, enabling users to alternate between the editor and the report.

5.1 Performance benchmark

As previously mentioned in section 4.2, the use of maps were heavily utilized due to the complexity of some of the XQuery implementation for software metrics. Additional work was conducted to investigate whether it was possible to increase performance by using maps since they were introduced in XQuery 3.1, which BaseX supports. Listing 5.1 shows two variations of XQuery implementation for the NOC metric. The XQuery implementation in listing 5.1a is the NOC metric without the use of maps. For each artifact passed into the function local:NumberOfChildren, which counts the number of artifacts that extends the passed artifact name. Figure 5.1b constructs a map with string and integer as key-value pair where artifacts are grouped based on what they extend which is previously described in section 4.2.2. A benchmark was conducted to compare the two different implementations. See table 5.1.

5.2 Software metric values

The result obtained from the generated XQuery code presented in section 4.2.4 is shown in table 5.2. These metrics have been evaluated on 33 889 artifacts in the form of traditional classes, tables and forms from the compilation date 2019-02-07.

55 5. Results

1 declare variable $context := /Class | /Table | /Form; declare variable $nocMap as map(xs:string, xs:integer) := 2 declare function local:NumberOfChildren($artifact) ↪ map:merge( 3 { for $a in /Class | /Table | /Form 4 let $artifactName := $artifact/@Name group by $extends := $a/@Extends/string() 5 return fn:count($context[@Extends = $artifactName]) return map { $extends : fn:count($a) } 6 }; ); 7 8 for $artifact in $context declare function local:NumberOfChildren($artifact) 9 let $noc := local:NumberOfChildren($artifact) { 10 return $noc let $artifactName := $artifact/@Name return if(map:contains($nocMap, $artifactName)) then ↪ map:get($nocMap, $artifactName) else 0 (a) XQuery implementation of NOC metric }; without the use of maps. for $artifact in /Class | /Table | /Form let $noc := local:NumberOfChildren($artifact) return $noc

(b) XQuery implementation of NOC metric with the use of maps.

Listing 5.1: Two XQuery implementations of NOC metric which produces identical values.

Table 5.1: A benchmark for the NOC implementation with and without maps.

#Artifacts 1 10 000 30 000 Implementation Time limit Without maps 30 sec 649 sec exceeded With maps 6 sec 34 sec 59 sec

5.3 Power BI Report

This section presents the static code analysis report created in Power BI Desktop which is fully integrated with SocrateX. It is composed of five different pages with different visualizations and features. The values presented in the Power BI report are obtained from the compilation date: 2019-02-07. Note that in pages with significantly empty spaces indicates that there are a lack of visualization options or data which could not be filled in said space. The following sections gives a detailed description of each page.

56 5.3. Power BI Report

Table 5.2: Eight software metric values obtained from the XML database. Minimum, maxi- mum and average values are displayed.

Software metric Min Max Avg SLOC 2.0 27011.0 215.1 WMC 0.0 2272.0 8.3 RFC 0.0 1297.0 17.0 NOM 0.0 638.0 7.7 CBO 0.0 242.0 4.0 NOC 0.0 97.0 0.2 CP 0.0 79.0 15.8 DIT 0.0 6.0 0.5

5.3.1 Overview page The overview page in figure 5.1 highlights the most general statistics and visualizations of the entire evaluated X++ code base.

Figure 5.1: The overview page of the static code analysis report.

Here, the left-hand side of the white vertical line in the page shows statistics of the evaluated X++ code base in terms of technical debt, number of artifacts, number of teams, number of methods, number of lines of code and compilation date. The right-hand side of the line consists of a linear plot which tracks the technical debt by compilation date, a tree map which highlights the top 20 teams with the highest amount of technical debt, a horizontal bar plot that shows the technical debt in minutes by quality characteristics and finally, two tables which shows best practice rules and number of violation occurrences as well as software metrics with minimum, maximum and average values. As explained in section 4.2.5, cross-filtering is supported in

57 5. Results

Power BI and thus, the visualizations are interactive which means the user can select certain parts of the visualization and the statistics are changed accordingly to get further insights about the data. In this case, the linear plot, bar plot, tree map and the tables are interactive. The overview page provides the user a high-level, centralized point to quickly get insights of the current state of the X++ code base. But sometimes, it is interesting and necessary to learn more about a specific artifact, team, rule or metric. For that reason, the following sections describes four additional pages which provides more comprehensive information at a lower level.

5.3.2 Artifact page The artifact page is responsible for displaying metadata and values of specific artifacts of the X++ code base such as classes, tables and forms, selected by the user. See figure 5.2. The

Figure 5.2: The artifact page of the static code analysis report. The class Tax is selected from the search bar and its associated metadata, software metric values and rule violations are shown. user can get insights about a specific artifact by searching for it in the search bar. Upon selecting an artifact, its corresponding metadata, software metric values and rule violations are shown. In this case, the class Tax has been selected and its metadata such as number of methods, number of lines of code, and model id, which is mapped to a model name is shown. Moreover, the artifact page shows which team and package this artifact belongs to along with the reviewers and the technical debt of the source code.

5.3.3 Software metric page This page focuses on the software metrics evaluated on the X++ code base. See figure 5.3. A total of eight software metrics has been evaluated on the X++ code base. Here, the NOC metric has been selected and its description, minimum, maximum and average values are displayed, and artifacts sorted by NOC in descending order. The left-hand side of the white vertical line shows the search bar for software metrics. Additionally, the user can filter the metric values in a range which is shown at the bottom of the search bar box.

58 5.3. Power BI Report

Figure 5.3: The software metric page of the static code analysis report. The software metric Number of Children (NOC) has been selected.

5.3.4 Rule page This page shows 13 best practice rules evaluated on the X++ code base. See figure 5.4. The left-hand side of the white vertical line provides a search bar for users to search for specific best practice rules. In this case, the rule Calls to obsolete methods is selected. The right-hand side of the line shows the selected rule and its associated quality characteristic, description, remediation detail and technical debt. It also shows the number of artifacts affected by the selected rule as well as a table containing the number of violations foreach artifact. Furthermore, there are two different bar charts, one vertical that shows all rules and its technical debt sorted in descending order from left to right. The horizontal bar chart illustrates the density in technical debt in terms of quality characteristics for all the rules.

5.3.5 Team page The X++ code base is grouped into teams based on functionalities and features due to its large and complex corpus of source code. A number of 65 teams have been identified and evaluated. See figure 5.5 The selected team is Warehouse and Transportation which constitutes a total of 2456 artifacts. The group name, area path and reviewers are displayed associated with the team. Since a team represents a subset of the total number of artifacts on the X++ code base, only the software metric values and technical debt for the artifacts are shown. Here, the software metric values for the 2456 artifacts are shown as well as a vertical bar chart which highlights the density in technical debt of the rules.

59 5. Results

Figure 5.4: The rule page of the static code analysis report. The rule Calls to obsolete methods is selected and its corresponding metadata, rule violations and technical debt are shown.

Figure 5.5: The team page of the static code analysis report. The selected team is Warehouse and Transportation and its associated metadata, artifacts, software metric values and technical debt.

60 6 Discussion

This chapter discusses the result achieved from chapter 4. This chapter also discusses and criticizes the method in terms of validity, replicability, and reliability.

6.1 Results

6.1.1 Database design and XQuery expressions The implementation was carried out as described in chapter 4. The database schema presented in section 4.2.4 shows how the data retrieved from the generated XQuery code is stored in the SQL Server database. Although the database schema is working as intended, it is not sophisticated enough to handle complex entities. For instance, the MetricValue and RuleValue entities store only single values. The database schema assumes that every metric and rule implementation return one single value, which is common in XQuery expressions. In this thesis, all XQuery implementations return one single value. As a result, the cyclomatic complexity metric has been intentionally left out as a standalone metric and has only been used as a weight metric for Chidamber & Kemerer’s weighted method per class (WMC) metric. The reason is that cyclomatic complexity metric estimates the complexity of artifacts in method level, and each artifact may have multiple methods.

6.1.2 XQuery performance For each software metric and best practice rule, a corresponding implementation in XQuery was created. The size of the XML database is 33 889 artifacts, which is only a subset of the entire X++ code base. Eight software metrics and 13 best practice rules were evaluated and calculated for each artifact. As previously mentioned in section 4.2.4, the implementations of the metrics and rules are concatenated into one single XQuery code where the base block of the generated code is an iteration over every artifact of the XML database. This proves to be a performance bottleneck because some of the XQuery implementations requires nested loop traversals which has a time complexity of O(n2) where n is the number of artifacts. A prime example is the number of children (NOC) metric from Chidamber & Kemerer’s metric suite since one needs to find all artifacts that extend the current one, described in section 3.3.2. In fact, earlier implementations resulted in time limit exceeded when querying the

61 6. Discussion generated code to the XML database, even though appropriate indexes such as attribute indexes, were properly configured in BaseX. This can be seen from table 5.1 in section 5.1. The time limit exceeded occurred after approximately 30 minutes elapsed time. This implies that the calculations could not be completed using an implementation without maps. Further investigation shows that the implementation of NOC with the use of maps is approximately 19 times faster than without maps and proves to be scalable with respect to the number of artifacts. Using maps effectively makes the algorithm perform better due to the leverage of keys and constant lookups. Therefore, this technique has been applied to implementations which can use maps to their advantage. The drawback is that a map object and an additional function must be declared for each metric and rule implementation that requires this. The execution time taken to query the XML database with the fully generated XQuery code was significantly faster with this approach as well, at approximately 8 minutes.

6.1.3 Software metrics Limitations of MD365FO The software metrics in this thesis are chosen based on the amount of previous research and compatibility with the metadata of X++ AST. The granular level of software metrics varies from e.g. system, package, module, class and method. As previously mentioned in section 1.4, QMOOD metrics were excluded from this thesis. Some metrics are class-level metrics, and some are design-level metrics. A meeting with an engineer from Microsoft was conducted to discuss the definition of a design in MD365FO. J. Bansiya and C.G Davis in theirpaper demonstrated that the QMOOD was applied on projects with a maximum design size of 352 [3]. The equivalent of a design in MD365FO would be a module which are arranged by prefix/postfix of the artifact name. However, the number of artifacts in a module exceedsthe number of artifacts evaluated in their paper by a significant factor. A possible solution is to build a graph between artifacts based on cross-references and from there, each cluster could in theory, be interpreted as a design. However, it can be quite complicated, especially using XQuery. Furthermore, some artifacts are cross-referenced with other artifacts in other modules which makes it harder to define the requirement of a cluster.

Software metric values The following sections gives a discussion on each software metric presented in chapter 5, table 5.2 in order to get an insight if the values are reasonable.

Source Lines of Code (SLOC) On average, the number of source lines of code per artifact is 215. The minimum value is 2.0 and the maximum is 27011. The values seem reasonable but could potentially be affected by the fact that block comments are not excluded. In general, the artifacts in the X++ code base are large and that is reflected on the average value of SLOC.

Weighted Method per Class (WMC) The purpose of this metric is to estimate an artifact’s complexity with cyclomatic complexity as the weight metric. As stated in section 4.2, the cyclomatic complexity is calculated by counting the number of branching and looping constructs of the source code. The minimum value is 0 which implies that perhaps the artifact is the equivalent of a data transfer object in C#, meaning that the artifact only holds property values. The maximum value is 2272 which is significant and indicates a difficult artifact to reuse, test, and maintain. The averagevalue is 8.3 which is an adequate number and implies that only a small subset of the X++ code base has a high complexity. There is, however, a possibility that the values are skewed because traditional classes have in general, more logic compared to tables and forms.

62 6.1. Results

Response for a Class (RFC) The values for RFC metric are 17 on average, 1297 and 0 as maximum and minimum value respectively. The maximum value implies a large artifact that contains dozens of methods and method calls which could be interpreted as a complex artifact with a significant responsibility of functionality in MD365FO.

Number of Methods (NOM) On average, the number of methods is 8 which seems reasonable and balanced. Note that the NOM metric includes not only normal methods, but also local methods as well. This suggests that the majority of the artifacts evaluated is not overly complex and the understandability factor for new developers are on average high. Certainly, some edge cases are bound to exist such as artifacts with 638 methods which can be argued in terms of complexity and understandability.

Coupling Between Objects (CBO) The maximum value of CBO is 242 which indicates that a lot of references to other classes through method calls, method parameters, variable in methods and field declarations, are needed to execute a method. This also implies that the various artifacts rely very much on each other which would indicate high coupling. Fortunately, the average value is only 4 which leads to the conclusion that most of the artifacts evaluated have low coupling and therefore the maintenance efforts stay relatively low and testing is more apparent. However, one can question the XQuery expressions used for this metric, mainly about the metric definition and the availability of the XML AST, which are further explained insection 6.2.

Number of Children (NOC) The mean and minimum value for NOC metric is 0 whereas the maximum value is 97. The maximum value suggest that the artifact is a base class where many other artifacts extends from. As presented in the metric page of the Power BI report, the artifact with a NOC of 97 is AxInternalBase, which is a class and the suffix implies that this is a base class. The values prove that most of the X++ code base would have moderate testing efforts and the base classes is highly reusable because it increases with higher NOC.

Comment Percentage (CP) The comment percentage 16% on average for all artifacts where the maximum value is 79%. The average case is slightly lower than the recommended comment percentage of 30% according to SATC (Software Assurance Technology Center) [24]. However, as presented in the method chapter, the comment percentage implementation does not support block comments enclosed by /* and */ and therefore, in reality, the average comment percentage could be higher.

Depth of Inheritance tree (DIT) The depth of inheritance (DIT) has a minimum value of 0, maximum value of 6 and the average value of 0. According to Anders Tind Sørensen’s thesis [39], the recommended value of DIT should be no more than five and the results are appropriate within the range. These values indicate that the object-oriented principles are adhered and the artifacts are well-reasoned as many artifacts inherit. At the same time, the DIT does not get excessively deep such that it could affect the maintainability of the artifacts.

63 6. Discussion

Summary of the Software metrics The values have been obtained using the XQuery implementations presented in section 4.2. XQuery code has been implemented based on definitions of software metrics and customized to accommodate the X++ language restrictions and the capabilities of XQuery. Although the values presented in the previous section is from one specific compilation date only, it can be concluded that XQuery and XPath are viable functional query programming languages to extract relevant information from the XML abstract syntax trees (AST) in order to compute values that reflects specific metrics.

6.1.4 Technical Debt Technical debt has been estimated using the guidelines of the SQALE Method [22] where each best practice rule has been assigned an associated remediation detail, remediation function and quality characteristic. There is a total of 13 best practice rules that have been used in this thesis, where the affected quality characteristics are maintainability, reliability and changeability. This deviates from the definition document of SQALE, where the number of quality characteristics representing the SQALE indices are nine. Therefore, the visualizations of technical debt by quality characteristics are limited to the quality characteristics represented by the available best practice rules. Furthermore, SQALE indicators have been excluded in this thesis such as SQALE ratings, pyramid and debt map due to the limitation of visualization options in Power BI. XQuery is a powerful query language for finding and extracting information from XML documents, this made it straightforward to find best practice rule violations over the XML ab- stract syntax trees (AST) due to the nature of ASTs, which represents the abstract syntactical structure of source code. Previous work where XQuery has been utilized to find violations in source code has been done. For instance, Nödler et al. [28], Mendonça et al. [26], and the PMD Source Code Analyzer[30] all utilizes XQuery over XML representation of AST when looking for code smells. Thus, it is achievable to count the number of best practice rule violations using XQuery. However, in this thesis, the actual calculation of the technical debt has been done in Power BI.

6.1.5 Power BI Report Initially, multiple wireframes were constructed to gain an understanding on how the visual architecture of the Power BI report should be arranged. As seen in chapter 5, all pages follow the same structure to preserve a form of consistency throughout the report. However, some pages have empty spaces, an example is the metric page in section 5.3, figure 5.3. This can be explained by lack of data and visualization options. To further fill these spaces, in the context of the metric page, one could add a histogram which displays the distribution of the metric values over the X++ code base. The technical debt has been calculated as presented in section 4.2.5 using DAX expres- sions, meanwhile the software metric values are simply imported and displayed using various visualization techniques. The Power BI report shows how one can have a centralized point where software metrics and technical debt of a software product are displayed, in this case MD365FO.

6.2 Method

The following sections discusses the validity, replicability and reliability of the method de- scribed in chapter 4.

64 6.2. Method

6.2.1 Validity Software metrics The definition of the software metrics used in this thesis have been adopted and implemented using XQuery expressions. As X++ is a proprietary programming language, several language specific restrictions had to be taken into consideration as presented insection 3.6.1. This has also been highlighted by Anders Tind Sørensen’s thesis about complexity in X++ code [39]. Furthermore, additional meetings with engineers at Microsoft were held to discuss the language specific restrictions and the definition of the software metrics. There is also some limitation with XQuery as a functional programming language which has hindered some of the software metric implementations. For instance, there are multiple variations of comments in X++ such as single-line comments and block comments. The current implementation for source lines of code (SLOC) metric does not exclude block comments. In hindsight, this could be solved by evaluating the source code with regular expressions. Another example is the coupling between object (CBO) metric. According to the definition, CBO is measured by counting the number of unique data types excluding primitive data types [9]. The language specific restriction section also describes user-defined classes in X++ andthe conclusion is that all data types are included except primitive data types. The implementation for this metric in the method chapter is to create a sequence that holds all primitive data types as strings and then exclude all artifacts that contains the types. One major flaw is that the X++ AST, does not have explicit element nodes for the primitive data types and relies on string matching of attribute values. Consequently, the interpretation of software metrics could deviate from their definitions introduced in research papers and therefore, the XQuery implementations in this thesis may not adhere to the specifications. This negatively impacts the degree of validity in this thesis. To increase the validity of the software metrics, one should have unit tests in store for each metric which can be done using the unit module in BaseX as well as enforcing correctness of the generated XQuery code by writing unit tests in ASP.NET Core. Unfortunately, in this thesis, unit tests for each metric was not implemented.

Technical debt Other frameworks that adopt the SQALE Method is NDepend and SonarQube as introduced in section 3.12. There are available best practice rules for both frameworks created with the supported programming languages in mind and the remediation function are already estimated by seasoned professionals. In this thesis, the best practice rules have been defined by engineers working with the X++ code base. Since it is a proprietary programming language, no available static code analysis frameworks support X++ for now. A meeting was conducted with engi- neers and managers with years of experience working with the X++ language and MD365FO, to define the remediation details and estimate the remediation functions for all available best practice rules. The precision of the technical debt depends entirely on the organization and care taken to define the remediation functions. Thus, the degree of validity directly correlates to the granularity of the decisions taken to define the remediation functions. In this case, the degree of validity is high becausethe remediation functions have been thoroughly discussed with experienced X++ engineers.

6.2.2 Replicability There are several factors for determining the degree of replicability in this thesis. The method chapter describes the various XQuery implementations for software metrics and best practice rules as well as the web service to concatenate the XQuery code and use it to query the XML database and store the obtained data in a SQL Server database. Should one use the same approach with the same subset of X++ code base used in this thesis, one should expect similar

65 6. Discussion results in terms of functionality. Of course, given the circumstances, the values generated from the XQuery code will deviate depending on the X++ code base, which is further described in the next section regarding reliability.

6.2.3 Reliability The XQuery implementations of software metrics and best practice rule were evaluated on a subset of the X++ code base of MD365FO with a compilation date of 7 February 2019. The MD365FO is a continuous product and code changes occur every day as multiple engineers are working on the code base. Furthermore, the XML database storing all artifacts of the X++ code base in XML format is updated in a routinely manner, specifically once every day. Should the same XQuery code defined in the method chapter be used to evaluate the current X++ code base, it would render different values than the ones presented in this thesis. There are other factors which affects the degree of reliability such as the approach for creating the XQuery code. At the moment, the XQuery expressions for individual software metrics and best practice rules are stored in JSON files along with the associated metadata as described in section 4.2.4. The drawback with this approach is that the probability of failure due to errors are high. One case is that a JSON string is sensitive to special characters which can occur in XQuery expressions. The probability of errors when generating the XQuery code are also high because multiple JSON files might have the same variable name and thus, errors are inevitable since XQuery code cannot contain duplicate variable names. With these factors in mind, the conclusion is that the degree of reliability is relatively low.

6.2.4 Source criticism Most of the sources used in this thesis are peer-reviewed sources. They are considered as relevant research literature because the peer-reviewed sources are based on static code analysis and software quality, which are highly relevant and valid topics for this thesis. There are also sources based on documentations for frameworks and the X++ language online. These are relevant sources but are unfortunately subjective to change due to the URLs.

6.3 The work in a wider context

As mentioned in the introduction, static code analysis is an important part in the software industry where quality is a crucial factor. For engineers and managers to have an overview of the software product in terms of technical debt and software metrics proves to be very beneficial for learning and monitoring the code base and to determine if the application is regressing or improving. The result of this thesis provides a centralized interactive visualization report to learn more about the code base of a specific software application. Although this thesis is focused on MD365FO, it can be argued that static code analysis is a valuable asset in academia as well. Static code analysis can be used to learn students programming and introduce guidelines to what is called ”clean code”. This would also greatly benefit students who are new to programming and supervisors when assessing a student’s programming code. Truong et al. [41] in their paper introduced a static analysis framework for students to learn Java. Their framework, however, does not include an interactive visualization report. It would be even more beneficial with a visualization report because of the waythe human brain processes information. Technology are advancing at an immense rate and with it comes new software applications. Programming has been introduced to middle schools because of the technology age we are living in. This thesis provides a starting point on how software metrics and technical debt can be presented in the software industry and the academia with different purposes.

66 7 Conclusion

This thesis was carried out to answer the following research questions: 1. How can XQuery and XPath queries be used to calculate software metrics and technical debt over XML representations of X++ ASTs? 2. How can software metrics and technical debt be applied to Microsoft Dynamics 365 for Finance & Operations? 3. How can the values be presented to enable developers to keep track of key performance indicators of the X++ code base? Abstract syntax trees (AST) represents the abstract syntactic structure of source code in a tree representation. Usually, the easiest way to traverse and search patterns in an AST is to use the visitor design pattern. For every code metric and rule violation, one can instantiate a visitor and evaluate it over the tree. It is a powerful approach in terms of speed and accuracy, but it takes a long time to program it. This thesis confirms that static code analysis can be carried out over XML representations of ASTs using query and functional programming languages such as XQuery and XPath, due to its ability of finding and assessing elements and attribute nodes of XML documents. TheXML document object model represents the AST structure of the X++ source code where each node denotes a construct occurring in the source code. This suggest that software metrics which are concretely defined in source code level, are suitable candidates under this circumstance. In terms of technical debt, the foundation is to find and count the number of code smells which violates best practice rules defined by the organization, and the XML ASTs enables us to find these using XQuery and XPath. To actually calculate the technical debt, it has been done within Power BI as described in section 4.2.5. The semantic search tool and web application, SocrateX, enables developers to get insight into the source code of MD365FO. This is possible because abstract syntax trees of the code base in XML format are stored in an XML database. This thesis demonstrated that it is possible to integrate software metrics and technical debt by creating a web service to generate pre-defined software metric and best practice rule implementations in XQuery and XPath.The resulting data is stored in a relational database which are ported into Power BI to be embedded into the web application. Now, software engineers and managers can observe, get insights and analyze the state of the X++ code base in terms of software metrics and technical debt and

67 7. Conclusion thus, key performance indicators of Microsoft Dynamics 365 for Finance & Operations can be tracked.

7.1 Consequences

The main consequences of this thesis is the fact that the code base of MD365FO are updated continuously, which can lead to inconsistency when evaluating software metrics and technical debt. Furthermore, the tools and frameworks used in this thesis are frequently updated which could alter the desired effect described in chapter 4.

7.2 Future Work

This thesis demonstrated that it is possible to use XQuery and XPath to calculate software metrics and technical debt on XML representation of ASTs. The current pipeline of the web service integrated in the SocrateX back-end shows that software metrics and technical debt can be incorporated to the web application by using Power BI which can visually present the values. However, there are still work left to be desired that has not been implemented due to time limitations. These can be seen in the next sections.

7.2.1 Historical data support As of now, the relational database storing the values retrieved from the generated XQuery code are updated after each execution. This implies that existing values are lost and replaced with new values from another compilation date. As a consequence, it is not possible to analyze key performance indicators from a historical perspective. One valuable aspect in static code analysis for managers and engineers is to be able to observe if an application is regressing or improving and be able to pinpoint areas in the code base where this is happening. A possible solution is to create an additional database that is responsible for historical data of previous software metric values and technical debt based on compilation date and add it as data source in Power BI. A shell script could be created to fetch data daily from the XML database and store it both in the historical log database and in the relational database. This enables more refined visualizations where both current data and previous data are available.

7.2.2 Task scheduler and Assembly in ASP.NET Core As described in section 4.2.4, a class library in ASP.NET Core was created for the static code analysis part. At the moment, the web service is invoked manually from an HTTP endpoint using an API testing tool such as Postman [31]. This is not necessary in this circumstance because the web service should be executed daily and not triggered manually by users. Therefore, it is more feasible to incorporate the web service in a development and operation (DevOps) tool such as Jenkins or Azure Task Scheduler. The former lets one define automat- ically scheduled tasks to run such as calling HTTP endpoints. Alternatively, it is possible to create an assembly of the class library. The assembly, in the form of .exe or dll, could be used in a shell script which triggers the fetching of data from the XML database and store it in the relational database. The shell script could be integrated in a DevOps tool for automatic scheduling.

7.2.3 XQuery parallelism Recall the generated XQuery code in listing 4.12, both functions for software metrics and best practice rules are called within the FLWOR expression. The XQuery code, albeit optimized with the use of maps, are still not fast enough.

68 7.2. Future Work

The advantage of the FLWOR is that the ’for’ expression will generate a sequence of independent nodes. Since the implementations of software metrics and best practice rules are evaluated per artifact in the XML database and the result does not interchange with each other, this could be a suitable situation where parallelism could be introduced. For instance, one could assign each thread to specific non-overlapping chunk of the for-loop to process and later merged into a single result, similarly to OpenMP which is a shared memory multiprocessing API for C, C++ and Fortran [29]. At the time of writing, the XQuery 3.1 processor in BaseX does not support parallel pragma clauses like OpenMP. However, parallel execution can be achieved using the function xquery:fork-join, introduced in XQuery 3.1 [36]. This function executes the supplied (non- updating) functions in parallel. Investigating how this can be incorporated into the generated XQuery code are left for future work.

7.2.4 Code refactoring There are many other possibilities with XML representations of abstract syntax trees (AST). Static code analysis of software metrics and technical debt have been proved to be achievable in this thesis. There was an interest from Microsoft to investigate how code refactoring can be applied on the X++ XML AST using programming languages that support transformation and modification of XML data. This has been left out due to time limitations, however, ithas been previously achieved by Mendonça et al. [26] as described in section 3.12. A similar approach could be taken in this thesis. A set of pre-conditions and post-conditions could be defined using XQuery and XUpdate in order to trigger the refactoring operation and verify that the process preserves the source code behaviour. The challenging part would be to serialize the XML AST back to X++ code and how refactoring can be visualized and incorporated in SocrateX and Power BI.

7.2.5 Utilize the full XML AST The XML AST of X++ source code contains a mixture of element nodes with multiple at- tributes that have not been fully utilized in this thesis. As an example, each construct of the X++ source code has a variety of attributes of start columns and end columns, indicating where this particular construct occurs in the source code. As of now, the Power BI report shows which artifact violates which rule, but it does not show where the violations occur in the source code. This proves to be highly inefficient if developers are looking to fix certain best practice rule violations on large source codes. A solution is to alter the database design to support multiple values for each Rule entity and add start/end columns to the RuleValue entity field.

69

Bibliography

[1] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. : Principles, Techniques, and Tools. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1986. isbn: 0-201-10088-6. [2] ASP.NET Core documentation. Sept. 2019. url: https://docs.microsoft.com/en-us/ aspnet/core/?view=aspnetcore-3.0. [3] J. Bansiya and C. G. Davis. “A hierarchical model for object-oriented design quality assessment”. In: IEEE Transactions on Software Engineering 28.1 (Jan. 2002), pp. 4–17. issn: 0098-5589. doi: 10.1109/32.979986. [4] BaseX - The XML Framework. Apr. 2019. url: http://http://basex.org/. [5] Vivek Bhatnagar. “A comparative study of Software evelopment Life Cycle models”. In: IJAIEM 4 (Oct. 2015), pp. 23–29. [6] Rex Black and Jamie L. Mitchell. Advanced - Vol. 3: Guide to the ISTQB Advanced Certification As an Advanced Technical Test Analyst. 1st. Rocky Nook, 2011. isbn: 1933952393, 9781933952390. [7] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, and François Yergeau. Extensible Markup Language (XML) 1.0 (Fifth Edition). Available at http://www.w3. org/TR/REC-xml/. 2008. [8] David Chappell. The Three Aspects of Software Quality: Functional, Structural and Process. url: http://www.davidchappell.com/writing/white_papers/The_Three_ Aspects_of_Software_Quality_v1.0-Chappell.pdf (visited on 06/05/2019). [9] S. R. Chidamber and C. F. Kemerer. “A Metric Suite for Object-Oriented Design”. In: IEEE Transactions on Software Engineering 20.6 (June 1994), pp. 293–318. [10] Ward Cunningham. “The WyCash Portfolio Management System”. In: SIGPLAN OOPS Mess. 4.2 (Dec. 1992), pp. 29–30. issn: 1055-6400. doi: 10.1145/157710.157715. [11] Entity Framework Core documentation. June 2019. url: https://docs.microsoft.com/en- us/ef/core/. [12] John T. Foreman, Jon Gross, Robert Rosenstein, David Fisher, and Kimberly Brune. C4 Software Technology Reference Guide —A Prototype. Tech. rep. CMU/SEI-97-HB-001. Software Engineering Institute, Carnegie Mellon University, Jan. 1997, pp. 145–149. [13] FunctX Xquery Functions. Sept. 2019. url: http://www.xqueryfunctions.com/.

71 Bibliography

[14] “IEEE Standard for a Software Quality Metrics Methodology”. In: IEEE Std 1061-1992 (Mar. 1993), pp. 1–96. doi: 10.1109/IEEESTD.1993.115124. [15] Introduction to the Angular docs. June 2019. url: https://angular.io/docs. [16] “ISO/IEC/IEEE International Standard - Systems and software engineering – Software life cycle processes”. In: IEEE STD 12207-2008 (Jan. 2008), pp. 1–138. doi: 10.1109/ IEEESTD.2008.4475826. [17] “ISO/IEC/IEEE International Standard - Systems and software engineering– Vocabulary”. In: ISO/IEC/IEEE 24765:2017(E) (Aug. 2017), pp. 1–541. doi: 10 . 1109/IEEESTD.2017.8016712. [18] W.T.B. Kelvin. Popular Lectures and Addresses. Nature series v. 1. Macmillan and Company, 1891. url: https://books.google.com/books?id=JcMKAAAAIAAJ. [19] Barbara A. Kitchenham and Shari Lawrence Pfleeger. “Software Quality: The Elusive Target”. In: IEEE Software 13 (1996), pp. 12–21. [20] J. Letouzey. “The SQALE method for evaluating Technical Debt”. In: 2012 Third In- ternational Workshop on Managing Technical Debt (MTD). June 2012, pp. 31–36. doi: 10.1109/MTD.2012.6225997. [21] J. Letouzey and M. Ilkiewicz. “Managing Technical Debt with the SQALE Method”. In: IEEE Software 29.6 (Nov. 2012), pp. 44–51. issn: 0740-7459. doi: 10.1109/MS.2012.129. [22] Jean-Louis Letouzey. The SQALE Method Definition Document. Jan. 2012. [23] Rudiger Lincke. “Compendium of Software Quality Standards and Metrics - Version 1.0”. In: 2007. [24] Dr. Linda, H. Rosenberg, and Lawrence E. Hyatt. Software Quality Metrics for Object Oriented System Environments, A report of SATC’s research on OO metrics. [25] T. J. McCabe. “A Complexity Measure”. In: IEEE Transactions on Software Engineering SE-2.4 (Dec. 1976), pp. 308–320. issn: 0098-5589. doi: 10.1109/TSE.1976.233837. [26] N. C. Mendonga, P. H. M. Maia, L. A. Fonseca, and R. M. C. Andrade. “RefaX: a refac- toring framework based on XML”. In: 20th IEEE International Conference on , 2004. Proceedings. Sept. 2004, pp. 147–156. doi: 10.1109/ICSM.2004. 1357799. [27] NDepend. Sept. 2019. url: https://www.ndepend.com/. [28] J. Nödler, H. Neukirchen, and J. Grabowski. “A Flexible Framework for Quality Assur- ance of Software Artefacts with Applications to Java, UML, and TTCN-3 Test Specifi- cations”. In: 2009 International Conference on Software Testing Verification and Vali- dation. Apr. 2009, pp. 101–110. doi: 10.1109/ICST.2009.34. [29] OpenMP Architecture Review Board. OpenMP Application Program Interface. Speci- fication. 2018. url: https://www.openmp.org/wp- content/uploads/OpenMP- API- Specification-5.0.pdf. [30] PMD Source Code Analyzer. Sept. 2019. url: https://pmd.github.io/. [31] Postman - The Collaboration Platform for API Development. Oct. 2019. url: https: //www.getpostman.com/. [32] Power BI documentation. June 2019. url: https://docs.microsoft.com/en-us/power-bi/. [33] Gordana Rakic and Zoran Budimac. “Problems in Systematic Application of Software Metrics and Possible Solution”. In: CoRR abs/1311.3852 (2013). arXiv: 1311.3852. url: http://arxiv.org/abs/1311.3852. [34] Nicolli Rios, Rodrigo Spínola, Manoel Mendonça, and Carolyn Seaman. “The most com- mon causes and effects of technical debt: first results from a global family of industrial surveys”. In: Oct. 2018, pp. 1–10. doi: 10.1145/3239235.3268917.

72 Bibliography

[35] J. Robie, M. Dyck, and J. Spiegel. XML Path Language (XPath) 3.1. Available at https: //www.w3.org/TR/xpath-31/. 2017. [36] J. Robie, M. Dyck, and J. Spiegel. XQuery 3.1: An XML Query Language. Available at https://www.w3.org/TR/xquery-31/. 2017. [37] Joan M. Smith and Robert Stutely. SGML: The User’s Guide to ISO 8879. New York, NY, USA: Halsted Press, 1988. isbn: 0-470-21126-1. [38] SonarQube: Code Quality and Security. Sept. 2019. url: https://www.sonarqube.org/. [39] A. T. Sørensen. Measuring Complexity in X++ Code. Supervised by Knud Smed Chris- tensen, IMM. Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, 2006. [40] “Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - System and software quality models”. In: ISO/IEC 25010:2011 (Mar. 2011). [41] Nghi Truong, Paul Roe, and Peter Bancroft. “Static Analysis of Students’ Java Pro- grams.” In: Jan. 2004, pp. 317–325. [42] N. Walsh, J. Snelson, and A. Coleman. XQuery and XPath Data Model 3.1. Available at https://www.w3.org/TR/xpath-datamodel-31/. 2017. [43] X++ data selection and manipulation. May 2019. url: https://docs.microsoft.com/en- us/dynamics365/unified-operations/dev-itpro/dev-ref/xpp-data-query#maintain-fast- sql-operations. [44] X++ Programming Language Reference. Apr. 2019. url: https://docs.microsoft.com/en- us/dynamics365/unified-operations/dev-itpro/dev-ref/xpp-language-reference. [45] XLNT - A Most ”Excellent” Framework for X++. Aug. 2013. url: https://community. dynamics.com/365/financeandoperations/b/daxmusings/archive/2013/08/20/xlnt-a- most-quot-excellent-quot-framework-for-x.

73