Side-Effect Removal Tool

Side-effect removal tool

Chapter 1: Project Definition

1.1 Introduction

This chapter introduces the problem that would be tackled in this report and outlines the importance of software testing, program maintenance, program comprehension and automated test-case generation. It briefly explains all important paradigms and defines how side-effect removal is closely related to these terms. The chapter also defines the aims and objectives of the project explaining in detail the time-table followed and the gaps that this project tends to fill. Also the impact of the research on both academic and commercial research is assessed. The rest of the report is structured as follows: Chapter two defines the literature review of the relevant areas, and explains what has to be done; Chapter three explains the design of the algorithms and presents the answer to the problems; Chapter four introduces the side-effect removal algorithm; Chapter five explains the implementation of the algorithm in greater detail; Chapter six concentrates on the testing and evaluation of the implemented algorithm and finally Chapter seven gives a conclusion and suggestions for future work.

1.2 Problem description

A program comprehension and easier maintenance can be of great importance in large software. In order for many software maintenance techniques to work properly, side effect version of the software is required. One example is the Maintainers Assistant, a reverse engineering tool that operates only on side effect free programs. Nearly 60% of the software cost and effort expanded by development could be spent in maintenance [2, 5]. To avoid such revenues, as much as possible should be done during development to ensure low maintenance costs. Furthermore, as argued by the software maintenance community, side effects have harmful effect upon program comprehension and maintenance and should be avoided where possible [3]. However, programmers often rely upon side-effects because of performance gains and faster coding. In order to have faster version that will be used for execution, as well as side effect free version for maintenance, comprehension, slicing, testing, re-engineering, etc. a side-effect removal tool is desirable. Developing such a tool will also help software techniques such as symbolic execution, slicing, partial evaluation and transformation [1]. In addition side effects have detrimental effect upon many programming-related tasks, such as testing and re-

1 Side-effect removal tool

engineering. Also side effects reduce the applicability of evolutionary testing, because they prevent the definition of fitness function. As argued by Pressman “Program maintainability and program understandability are parallel concepts; the more difficult a program is to understand, the more difficult it is to maintain” [5].Characterized as an “iceberg”, unmentionable software is an old problem. Side-effects removal will facilitate reverse engineering of poorly designed, poorly coded and poorly documented software. According to The Programming C Standard, a side effect is generally defined as: “Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects” [1]. In order successful completion of this project this definition will be simplified and a side effect will be considered to be any change of state which occurs when an expression is evaluated.

1.3 Aims and Objectives

The main aim of the project is to identify and remove the side effects from a C program using the post-placement side-effect removal algorithm introduced by Harman, Hu, Hierons, et al. [3]. The outcome will be a complete application that will take a C file that possibly contains side-effects and construct equivalent program, which is guarantied to be side-effect free. The objectives of the project are main steps that need to be achieved in order successfully completion of the project. Depending of the correctness of the expectations some of the objectives could be changed or adapted accordingly with the problems experienced. (For detail objective specification and time-table please see appendix). These consist of:

 Conduct relevant research and identify the scope of the project

 Investigate suitable data-binding tools for representing the abstract syntax tree into corresponding data structure

 Investigate DOM and SAX as a backup option if data-binding is not achievable. This is highly possible situation since the DTD format that corresponds with the generated XML file is not supported by many data-binding tools

 Design and evaluate possible rules for side effect removal algorithm

 Implement the post-placement side-effect removal algorithm

 Analyze and evaluate the outcome in terms of size, performance and comprehension

 And if time permits, investigate the impact of side-effect removal on evolutionary testing

1.4 Project Scope As already mentioned, a side effect is [7]:

 Modification of an object, that is, modification of some memory location or register.  Access to an object that is declared volatile.  Invocation of a system routine that produces side effects, like for instance input or output to files.  Invocation of functions that perform any of the above.

This is very broad definition and would be simplified such that a side effect would be considered to be “any change in the value of a variable which occurs when an expression is evaluated, other than an assignment expression-statement” [4]. According to Harman et al. “such side-effects are created using the assignment operator in an expression and the pre- and post- increment and decrement operators” [3]. Taking this into consideration a side-effect free program would be any program where assignment expressions are allowed to change only the value of the variable on the left-hand side of the assignment. The right-hand side must not change the value of any variable.

As shown in figure 1 he cycle of removing side-effects include transforming the C file into Abstract Syntax Tree (AST) version in XML. This is easily achieved with the ANSIC.exe parser supplied by Daimler Chrysler. Then, generated XML file is loaded into memory using Document Object Model (DOM). After relevant side- effect removal transformations are performed, the process is unrolled using xml.exe application, which transforms XML into C files.

Figure 1: System overview

These side-effects are created using the assignment operator in an expression and the pre- and post- increment and decrement operators. So to be side-effect free, assignment expressions are allowed to change only the value of the left-hand side of the assignment; the right-hand side must not change the value of any variables. Therefore, the project would concentrate on defining transformation rules that remove side-effects created with post increament/decreament and pre increament/decreament operators in any part of the program. In addition the project will consider the side-effect created with the assignment operator and will try to remove these side-effects from the program. The project would not be focusing on any other types of unstructuredness, removing of go-to statements or flags.

1.5 Academic and Commercial Interests

Such a research would be particularly beneficial to academics since it aims to prove that program transformations are a solution for removing side-effects in a program. It also has positive impact on concluding that side-effects have negative effect on program comprehension, and the newly created programs are more human readable even thought they are larger. The research would help to analyze the effectives and efficacy of the transformation algorithms on side-effect source code and will seek formal confirmation of the algorithm effectives on real-life source code. These transformations will lead to more effective automated test data generation that will lead to more effective program test coverage.

Commercially, program transformations and side-effect removal could be used by any company that uses automated test data generation. Particularly this project was closely related with the evolutionary testing system of Daimler Chrysler. Currently, the system is negatively influenced only by flags and expects presence of side-effects in the testing code thus creating extra processing complexity in the process of automated test case generation. If the tested code was guaranteed to be side-effect free the evolutionary algorithm would be much faster and would generate more efficient test cases. Furthermore, a solution that removes side-effects and preserves the semantics of the program is also beneficial in transforming company’s legacy systems for sole purpose of understanding the program, maintenance and reverse-engineering.

1.6 Conclusion

This chapter introduced the side-effect problem and set the background for following chapters of this report. It has described broadly the reasons for developing side-effect removal tool, stated the aims and objectives of this project and briefly explained some of the academic and commercial interests closely related to side-effect removal.

Chapter two: Literature Review

1.1 Introduction

This chapter explores the areas of program transformation, test data generation, evolutionary testing and examines at how they are related with respect to side-effects. It also gives brief history of what has been done.

1.2 Transformations

Program transformation can be broadly defined as the “act of changing one program to another”. [11,14] The term program transformation is also used to describe an algorithm that implements program transformation. The languages in which the transformed and resulting program is written are called source and target languages respectively. The aim of program transformation is to “increase programmer productivity by automating programming tasks, thus enabling programming at higher level of abstraction”. [14] Program transformation is used in many areas of software engineering including compiler construction, software visualization, documentation generation and automatic software renovation. [11] According to Visser there are two main categories of program transformation: the one in which the source and target languages are different (translations) and the one in which they are the same (rephrasing). These main categories can be further sub-categorized based on the level of semantics preservation in the program. [14] In order efficiently removing side-effects from a program, the program semantics should be preserved. Therefore this project implements transformation that has one central idea, “it changes the syntax of a program without changing its semantics”. [12] An early example of program transformation is the pretty printer, software that alternates the existing text by removing and inserting white spaces according to a set of predefined rules as illustrated in table 1. Another example of program transformation happens inside the compiler when it cleans the produced object code thus making the program faster. [12]

Original program Transformed program public class HelloWorld { f public class HelloWorld{ public static void main( String[] args ) public static void main( String[] args ){ { f

fSystem.out.println( “Hello World!” );}} System.out.println( “Hello World!” ); } }

Table 1: Pretty-printer example transformation

Moreover, program transformations denote any set of rules which when applied to a source program produce equivalent target program. The defined rules, often called ‘transformation axioms’ transform the program in series of stages applying many rules at each stage in order producing the desired outcome. According to Ward, transformations can be described by semantic rules and can thus be used for a whole class of problems and situations. [13] By introducing the transformation axioms in the program it can be easily guaranteed the equivalence between source and target program. Furthermore, this report uses program transformation and transformation rules in order to remove side-effects and transform the programs into more understandable and more maintainable programs.

1.2 Testing - Evolutionary testing

There are many published definitions for software testing; however all of them share the same meaning: a software testing is the process of executing a program with the intent of finding errors [15, 16] Software testing is a critical element of software quality assurance and represent the ultimate review of specifications, design, and code generation. In order to point out the importance of testing it should be mentioned that it is not unusual for a software development organization to expand between 30 and 40 percent of the total project effort on testing. The time spend on testing can increase drastically if the software is human related, (flight control, nuclear reactor monitoring) and can cost three to five times as much as all other software engineering steps combined. [5] Once the software code has been generated, software must be tested to uncover and correct as many errors as possible before delivery is made. The process of testing is highly labor intensive and generating good test cases that have high likelihood of finding errors is extremely hard. In other words, exhaustive testing of a program is usually impossible so the main aim would be to find a way to test a program as much as possible. Consequently, the need for automated test case generation appears. [5, 17]

Evolutionary testing is “iterative testing method which is based on the process of natural genetics, the theory of evolution, and Darwin’s survival of the fittest principle”.[18] Evolutionary testing can be employed for the automation of existing test methods or it may be used as an independent test method. No meter how evolutionary testing is used it always involves generation and evaluation of new population of individuals. The new populations are generated via selection, recombination, mutation, fitness assignment and reinsertion of new offspring until and optimal solution has been found or a predetermined termination criterion was met. [18] The notation of fitness is fundamental to the evolutionary algorithms. Good fitness function definition may be crucial for success of the algorithms. The fitness value is a numerical value that expresses the performance of an individual with regard to the current optimum. It is desirable that the fitness value changes neither too rapidly nor too slowly with the design parameters. [19] The fitness of an individual can be the value obtained directly from the objective function. The idea of selection is to favor the fitter individuals, in the hope of breading better offspring. [20]

Example program Control flow-graph input x; if(x == 2) 1 { x++; 2 } else { 3 4 x--; } 5

Table 2: Evolutionary testing example

The above table shows a simple program example with its corresponding flow graph. Let us suppose that the node 3 is to be tested. In order to achieve this, the conditional at node 2 (x == 2) must evaluate to true. Hence, only one test input would evaluate this conditional to true which is very unlikely with automated random test data generation. Therefore, evolutionary testing provides an automated solution for this problem by repeatedly reducing the search space until the criterion is met. [21] As mentioned earlier, evolutionary testing requires fitness function that gives a score to each test executed test case. Each test case is evaluated by how close test data comes to executing a desired node and corresponding fitness value is allocated. Therefore, the test case that was closer to executing the desired node would be considered fittest individual of all.

Going back to the previous example, if the input to the program was 50 the evolutionary algorithm would ensure that this individual is marked as unfit and would gradually decrease the value towards the desired value. In the next try the value might be 20. This value is still unfit compared to the previous individual, however the search space is drastically reduced and the newly produced individual is much fitter i.e. closer to the solution. Evolutionary algorithms are closely dependant of the correctness of the fitness function in order to determine how close they are to satisfying certain criterion. Unfortunately this is not possible in all cases. There are program language characteristics that impede the correctness of the fitness function and make the testing random. These are side-effects, flags and program unstructuredness.

1.3 Flags and unstructuredness

Flags and unstructuredness are beyond the scope of this project. However they are mentioned very briefly in this section since the side-effect removal algorithm is not allowed to introduce any of these program futures when performing the transformations. A flag is a variable whose value is either true or false. They degenerate the performance of the evolutionary testing when generating test cases. The presence of a flag variable yields a coarse fitness landscape with a single super-fit plateau and a single super-unfit plateau. These two possibilities correspond to the two possible values of the flag variable. [22] Consequently, the evolutionary approach to incrementally find best test cases fails, since there is no measure of how good or bad the test input performed. The algorithm fails to guide the search because the fitness criteria can have only two values: true or false. An unstructured program is a program that contains ‘break’, ‘return’, ‘goto’, or ‘continue’ statements. All of these statements are known as jump statements, giving the opportunity to the programmer to skip and return to desired parts of the program uncontrollably. Therefore, the creation of more then one return and exit point negatively influences the correctness of the fitness function in evolutionary testing. An example of this is the presence of more then one ‘break’ statement in a loop. If this is the case the fitness function path would be unclear and the algorithm could easily misguide the search. [23]

1.4 Side-effects

Simplifying the term side-effect in project scope section, (please see 1.4) side- effect was taken to be any change in a program which consists of assignments to variables when an expression is evaluated. Such side-effects are created using the assignment operator in an expression and the pre- and post- increment and decrement operators. [3]Side-effect are considered by many programmers as a fast way of writing compact programs but many authors advice that “one should consider the trade-off

9 Side-effect removal tool between increased speed and decreased maintainability that results embedded assignments are used in artificial places”. [24] According to Cannon et al. [24] a = b + c; d = a + r; should not be replaced by d = (a = b + c) + r; even though the latter may save one cycle. In the long run the time difference between the two will decrease as the optimizer gains maturity, while the difference in ease of maintenance will increase as the human memory of what’s going on in the latter piece of code begins to fade. A simple example of side effect in a loop could be:

int b = 0; int d = 5; int a = 0;

while(++b < d) { a++;} Figure 2: Side-effects in a while loop

The condition of the while loop is influenced be the pre-increment operator of the variable b in the first iteration. This side-effect will increment the variable b by one before entering the loop. Therefore, one iteration of the loop will be lost and the value of the variable a will be smaller by one. In more complex loops, the order of side-effect evaluation could be quite difficult to follow, and someone could easily assume that the value of b is incremented after the first iteration of the loop. The presence of this type of side-effects in loops as well as in conditionals influences negatively the correctness of the fitness function in evolutionary algorithm. The presence of side-effects could misguide the algorithm search for best test input, since the measure of how good the test input performed could be incorrect. When transformed with Side-Effect Removal Tool (SERT) the program fragment from figure 2 yields:

int b = 0; int d = 5; int a = 0;

while (((b + 1) < d)) { b = (b + 1); a = (a + 1); } b = (b + 1); Figure 3: Side-effect free version

On execution on both programs, they both return 5 for variable b and 4 for variable a. However, it is a lot easier to understand the program in figure 3. The side- effect free version of the program should generate better test cases with the evolutionary algorithm since the loop was simplified and correct fitness assignment of the input value can be assigned. Another type of side-effects is created with pre- increament/decreament, post- increament/decreament and an assignment operator as shown in the figure 4.

a. a = b++;

b. while(y = x++ < 10){d++;}

c. a = ++x && ++c;

Figure 4: Assignment side-effects example

It was found that these side-effects create flags when assigned in loops. (Figure 4, example b) For example, no matter what the value of a variable was before the loop, the variable will be assigned a value of 1 inside the loop and 0 after the loop. In the body of the loop, the expression is evaluated as expected and the variable is assigned another variable before the side-effect takes place. Furthermore, a flag variable is created if the assignment operator is combined with boolean operators in an expression. Therefore, the example expression in figure 4, c, will assign true or false to variable a, depending on the correctness of the expression on the left hand side of the assignment operator. As mentioned in part 1.3, flags are unwanted program future when automated test case generation is considered. Therefore, transformations of side-effects that introduce flags, besides removal of pre-post increament/decreament side-effects, are main focus on this report. Another type of side-effects is well known to impede evolutionary testing, side-effects in function calls, but this type of side-effects are beyond the scope of this project. However, they will be mentioned briefly in the following section for completeness.

1.5 Side-effects in functions

Side-effects in functions occur when an assignment to global variables is performed within the body of the function declaration. Consider the following example:

int sum(int x, int y) { int sum = 0; w = 2y; sum = x + w; return sum; }

Figure 5: Side-effects in functions example

The function sum (int x, int y) multiplies the second parameter by 2, and returns the sum of the two parameters. Therefore, if sum (2, 3), then the sum will be 8. However, in the process of computing the sum the function uses temporary global variable which is assigned the value of 2y. This function could be called from the main of the program as follows:

int w = 1; int globalSum = 0; globalSum = sum(2,3) + w;

Figure 6: Function call from the main The function call shown in figure 6, changes the value of the global variable before that variable is used to compute the globalSum from 1 to 6. However, if the function call was on the other side of the arithmetic expression, the global variable would not be affected in this particular example. For example, the expression sum (2, 3) + w will return a value of 14, while w + sum (2, 3) will return 9. [26]

1.6 Compiler differences

Many languages, including C and C++, do not specify the order of evaluation in an expression. In most languages this is not a problem, but in languages that allow side-effects in expressions it can cause chaos. [25] Therefore, side-effects within expressions can result in code whose semantics are compiler-dependant. Consider the examples bellow:

a. i = 0; a = b[i++] b. i = 5; k = i - --i;

Figure 4: Side-effects compiler dependability

In the example ‘a’ the index into b could be incremented either before or after the assignment to a. The second example points out that a variable 0 or 1 could be assigned to k, depending on which compiler the code will be compiled with. [24, 25] The compiler used for development of this project, embedded into Microsoft Visual C++, evaluates the pre- increment and decrement operators before the evaluation of the expression. Therefore, for the first example a will be the first element in the b array. Similarly, k will be assigned variable k = 1, since the pre-decrement operator decrements i before assignment to variable k.

1.7 Previous work Very similar side-effect removal toll called Linsert was also developed by Lin Hu et al. and the theoretical foundations, post-placement side-effect removal algorithm as well as an empirical study are presented in their publications.[1,3,4] Linsert implements side-effect removal for a subset of the full C language, called C--. C-- contains sufficient statement and expression syntax to allow all side effects issues to be considered. The tool is implemented based on post-placement side effect removal approach. This approach has the advantage of being applicable in all cases while the disadvantage is that it produces much bigger code due to the coping of the code which models the side-effect. Other two possible side effect removal approaches are pre- placement approach and introducing temporary variable. The second approach has the disadvantage of not being applicable in all cases but it is desirable since it can avoid large increases in code size. Finally, introducing temporary variables can lead to decreased comprehension but it has more flexibility in the choice of where to place side effects. However, the last two approaches were not considered for the development of the Linsert tool and standard approaches to program simplification were used to clear the code after side effects were removed. The side effect removal algorithm uses a top level transformation which walks over the abstract syntax tree of

the side-effect version of the program and replaces statements which contain side- effects with side-effects free equivalents. After the algorithm for Linsert tool was developed an empirical evaluation study was performed. It explored the effect of side-effect removal on program comprehension by 18 students. According to their ability and understanding of C programming language, the students were allocated into two groups. Each group was presented with small samples of C code and questions concerning the final variable of the sample program. The program comprehension questions were split into two tests. In one version of the test the sample code contained side-effects and in the other version of the test, same questions were presented to the students but without side- effects. In order reducing possible bias in the choice of the test the authors produced two versions of the test and used a ‘cross-over’ design for the experiment. The first version of the test was made by academics familiar with the side-effect removal algorithm called ‘possibly biased test’. The second test, ‘unbiased test’, was made by academics familiar with C programming, but not with the side-effect algorithm. The results that were obtained were similar to the expectations. They were noticeable better for all side-effect free versions of the program. Therefore, it was concluded that side-effect removal using the post-placement algorithm improves program comprehension.

1.8 Experienced data binding problems

As specified by the project objectives, a reasonable solution for transforming the XML representation of the C file was data binding. “XML data binding reefers to the process of representing the information in an XML document as an object in computer memory” [8]. This allows direct access to the XML data from the created object rather than using the DOM to retrieve data from the memory representation of the XML file itself. Consequently, data binding enables more natural way of manipulation of information. Data binding products enable mapping to an XML document to an object and in the same time they preserve the structure of the document using XML Schema or DTD that are paired with the XML file [9]. In the research phase of the project many data binding tools were considered but unfortunately none of them worked properly. The data binding tools that were taken into consideration were: Liquid XML binding tool, Zeus, Castor, Xerces, JAXB, and the build-in data binding tool BorlandXML from JBuilder. Since the correspondent DTD that was used with the free source data binding tools (except Liquid Technologies) was too complicated, the data binding could not be accomplished. Mainly, the XLinks that are used for attaching links to an XML documents were not supported or were supported in some cases (BorlandXML & Liquid XML binding tool) but the code and the classes generated were not usable [10]. After substantial time spent to clean the generated code from bugs and change the DTD and make it compatible with the binding tools, DOM and SAX were considered as reasonable solution to the problem.

1.9 DOM and SAX

A DOM Document is a collection of nodes that represent pieces of information organized in a hierarchy. This enables easy navigation around the tree, finding and manipulating of relevant information, since the document parsed with DOM is entirely loaded into computer memory. Because it is based on a hierarchy of information, the DOM is said to be tree-based, or object-based. For exceptionally large documents, parsing and loading the entire document can be slow and resource-intensive, so other means are better for dealing with the data. These event-based models, such as the Simple API for XML (SAX), work on a stream of data, processing it as it goes by. An event-based API eliminates the need to build the data tree in memory, but doesn't allow a developer to actually change the data in the original document. This is a negative exception for SAX in relation with the project since the main aim of the side-effect removal tool is to remove side-effects and thus, change the initial data representation. As shown on figure 2, DOM, on the other hand, provides an API that allows a developer to add, edit, move, or remove nodes at any point on the tree in order to create an application [6]. As mentioned previously the side effect removal tool uses DOM parser to represent the XML file created with the third party ANCIS.exe parser. The XML representation of the data in a document is not an accurate tree representation of the DOM nodes. With different types of information contained in an XML document, several different types of nodes are created. The elements from an XML file are just one type of nodes represented by DOM. Since an element node is a container of information, that information may be other element nodes, text nodes, attribute nodes, or other type of information.

Figure 5: A DOM road map [6]

1.8 Conclusion

Chapter 3: Project Definition

3.1 Introduction

This chapter introduces the design of the side-effect removal algorithm for the removal of side-effects created with pre- post- increament/decreament operators as well as assignment operator. Also, the formal approach used to analyze and develop the algorithms is described in more detail.

3.2 Software process

After starting the implementation the development process was also reconsidered and more appropriate Incremental model development is now used. It combines the elements of the linear sequential model used previously with the difference that each increment delivers a working version of the system. However, the proposed Incremental model was slightly changed for the purpose of this project and as shown on figure 6 another testing stage, before the implementation, was included.

increment 1

Test Analysis Design Code Test Design

increment 2

increment 3

Figure 6: Incremental process model

Introduction of another testing stage in each increment resulted in developing good and precise algorithm design before any implementation was done. This

16 Side-effect removal tool shortened the development time and any design mistakes and ambiguities were resolved in time, since the designed solution to a particular problem was directly translated into C programming language and both files, the file with the side-effects and the side-effect free file, were compiled, run and the corresponding output was compared. If the outputs were the same then it suggested that the transformation is correct. On the other hand, if the outputs were different then the design was reconsidered and the black box design testing was performed until a correct solution was found.

3.2.1 Increments explained In this section the development of the increments is explained briefly stating the design rules and patterns used in each increment. The first increment was a core version of the product that removed only simple side-effects in conditionals and loops that are created with the post increment operator. Before solving the problem of removing side effects, a relevant design rules were developed. Each expression is transformed differently depending of the types of side- effects as well as the position of those side-effects in the program. Therefore, every ‘for loop’ is converted to a while loop simply by coping the conditional statement of the ‘for loop’ and placing it into a while loop. If it does contain side-effects the, conditional is further transformed by the while rule which places a copy of the side- effect after the loop to take account of the fact that the body of the loop is not executed on the last test of the side-effect free version of the conditional statement in the while loop. The ‘do…while’ rule naturally transforms into while loop since ANSIC doesn’t have specific representation for do…while loops. This involves coping of the loop body together with the side-effects since the side-effects does not take place until after the first iteration of the loop body. The rule for conditionals statements creates an else branch if it not already present and replicate the side effects in both branches to ensure correct semantics. At this stage of the implementation only basic requirements were addressed, but many supplementary and crucial features, some known, other unknown, remain undelivered. In the next increment multiple simple loops and conditionals were considered. At this stage of development the program was able to transform correctly any combination of loops and conditional statements within those loops. Furthermore, post-decrement and pre-increment/decrement were introduced. This addition changed the rules specified in the first increment since pre increament/decreament operator changes the value of a variable before the start of the loop. This increment also introduced side-effects created with assignment and pre/post operators. It was found that these side-effects create flags when assigned in loops. For example, no matter what the value of a variable was before the loop, the variable will be assigned a value of 1 inside the loop and 0 after the loop. In the body of the loop, the expression is evaluated as expected and the variable is assigned another variable before the side- effect takes place. In the third and final increment of the project complex conditionals will be considered. As planned, this increment would be implemented in three steps. First step is extracting the preorder expression of a complex conditional, which is implemented;

second step is checking for side-effects and finally in the third step performing the required transformation.

3.3 Side-Effect Removal Case Scenarios

This section presents the side-effect algorithm design from an informal point of view. In other words, the algorithm is explained using output / input examples. The algorithm presented in this section focuses on two main categories on side-effects: side-effects created with pre- post- increment / decrement operators in loops and conditional and side-effects created with an assignment operator combined with pre/post operators outside loops. Also, this section introduces the side-effect rules mentioned in increment two into more depth. It starts by introducing the simple side- effect removal rules and goes into more complicated ones at the end of this section. All of the examples shown are transformed with the developed Side-Effect Removal Tool (SERT). Moreover, due to the limited space for this report and tendency of SERT to produce large transformations for complex examples, the examples discussed in this section are quite simple, aiming to explain the basic features of the algorithm. For more complex examples please see Appendix A.

3.3.1 Example one: basic transformation Depending on the place of the side-effect and what kind of transformation is needed, the developed algorithm uses two functions for simple side-effect removal. The sideEffect() function takes an expression that is outside the controlling expression and replicate the side-effects in a form of an assignment statements. This replication is performed since the side-effect must happen when the transformed program is executed. The transform() function takes an expression and returns side-effect free version of it i.e. an expression which has the same value, but which does not cause a side-effect. [12] For example if(x++ && --y) will yield a conditional with the form if(x && y - 1). These simple transformations are showed in the following tables.

Function Transformation

a. sideEffect (x++); a. x = x + 1; b. sideEffect (x--); b. x = x - 1; c. sideEffect (++x); c. x = x + 1; d. sideEffect (--x); d. x = x - 1;

Table 3.1: side-effects transformation

Function Transformation

a. transform (x++); a. x; b. transform (x--); b. x; c. transform (++x); c. x + 1; d. transform (--x); d. x - 1;

Table 3.2: side-effects removal

3.3.2 Example one: for-to-while transformation

This is one of the simplest transformation where all for loops are transformed into while loops for easier manipulation.

Example Code Transformed code for(count = 1; count < 3; count++) count = 1; { while ((count < 10)) a++; { } count = (count + 1); a = (a + 1); }

Table 3.3: For-to-while example

The transformed while loop needs to take care of the initialization and update condition in the for loop. Therefore, the initialization of the variable is placed as a first element before the produced while loop, thus excluding the possibility for re- initialization of the same variable. The update condition is placed inside the body of the while loop. The remaining condition of the for loop is used for the condition of the while loop.

3.3.3 Example two: while transformations This example concentrates on transformation of while loops. The rule for while loops places a copy of the side-effect after the loop to take account of the fact that the body is not executed on the last test of the side-effect free version of the controlling expression. Consider the example bellow:

Example Code Transformed code count = 1; while ((count < 10)) { for(count = 1; count++ < 10; count++ ) count = (count + 1); { count = (count + 1); while(s++ < --w) while ((s < (w - 1))) { {

x++; w = (w - 1); } s = (s + 1); a++; x = (x + 1); } } w = (w - 1); s = (s + 1); a = (a + 1); } count = (count + 1);

Table 3.4: Simple While transformation example

In the first transformation the for loop will be transformed to while loop and then the two while loops will be transformed. Since the transformed while loop will have a side-effect in the controlling expression while(count++ < 10) the count variable side-effect is cloned two times into the body of the first while and once after the loop. The second while loop simply follows the rule specified and multiplies the side-effect appropriately. Furthermore, the side-effects that are inside the controlling expression are removed by applying the transform function() function explained earlier in this section. However, the side-effect removal tends to get more complicated when complex statements are encountered. The example in table 3.5 shows a complex transformation of a while loop.

Example Code Transformed code while ((((w < (s - 1)) && v) && (g < 10))) { g = (g + 1); v = (v + 1); s = (s - 1); w = (w + 1); c = (c + 1); } if ((! (w < (s - 1)))) { s = (s - 1); while(w++ < --s && v++ && g++ < 10) w = (w + 1); { } c++; else } { if ((! v)) { w = (w + 1); s = (s - 1); v = (v + 1); } else { if ((! (g < 10))) { g = (g + 1); v = (v + 1); s = (s - 1); w = (w + 1); } }

}

Table 3.4: Complex While transformation example

The rule used for simple while expressions is modified when transforming complex while conditions connected with the && operator. On the last loop iteration, only the side-effects that are on the left hand side of the ‘termination point”, while condition that ended the loop, are executed. For example, if the last condition (g < 10) was the ‘termination point, the side-effects in the condition will be evaluated (increment g by 1) on the last loop iteration, thus creating the condition false. After the loop, only the conditional that is false (if (!g < 10)) will be executed. In case where two or more while conditions have same ‘termination point’, after the loop, the first false conditional is executed.

3.3.4 Example three: Conditionals (IF) transformation

This section will describe the design of the transformation that removes side- effects from conditionals. The rule for conditionals creates an else branch, if one is not already present, and replicates the side-effects in both if and else parts. This is shown in the example that follows.

Example Code Transformed code if ((my_array[(x + 1)] < c)) { c = (c + 1); if(my_array[++x] < c++) x = (x + 1); { s = (s + 1); s++; } else } { c = (c + 1); x = (x + 1); }

Table 3.5: Simple Conditional transformation example

The simple rule of replicating the side-effects in both branches becomes more complex when boolean operators are introduced. Every boolean operator changes the side-effect evaluation order and controls if a side-effect is evaluated or not. Consider the example bellow.

Example Code Transformed code if (((((a + 1) < 10) || b) || (c < (q - 1)))) { if (((a + 1) < 10)) { a = (a + 1); } else { if(++a < 10 || b || c++ < --q) x++; if (b) { a = (a + 1); } else { if ((c < (q - 1))) { a = (a + 1); c = (c + 1); q = (q - 1); } } } x = (x + 1); } else { q = (q - 1); c = (c + 1); a = (a + 1); }

Table 3.6: Complex Conditional transformation example

The above example demonstrates a complex conditional combined with logical OR operator. If a condition is evaluated by the conditional, and if that condition statement contains side-effect then no matter if it’s true or false the side-effect will be executed. Also, every side-effect that is on the left side of the evaluated condition will be executed since the compiler evaluates the conditional from left to right. If all of the conditions in the conditional are false then all side-effects are executed since all conditions need to be evaluated. For example, if ((++a<10) was false and b true then the side effects in both conditions will be evaluated. However, the condition (c++ < --q) will not be evaluated since in the case of boolean OR operator evaluation stops when the first true condition is encountered. Consider the following example on the next page:

Example Code Transformed code if ((((a + 1) && ((b - 1) < 10)) && c)) { c = (c + 1); b = (b - 1); a = (a + 1); x = (x - 1); } else if(++a && --b < 10 && c++) { { if (((a + 1) && ((b - 1) < 10))) x--; { } a = (a + 1); else b = (b - 1); { c = (c + 1); x++; } } else { if ((a + 1)) { a = (a + 1); b = (b - 1); } else { if ((! (a + 1))) { a = (a + 1); } } } x = (x + 1); }

Table 3.7: Complex Conditional transformation example

The above example demonstrates a complex conditional combined with logical AND operator. The main difference between the two Boolean operators is that AND operator makes the compiler to evaluate the next condition, if current condition was true, while OR operator will stop the evaluation if a true condition was encountered. Therefore, if all of the conditions are true then all side-effects will be executed. If, a condition is false then the side-effects on the left hand side including the current condition will be executed.

References

[1] Harman M, Lin H, Malcolm M, Xingyuan Z, “Side-Effect Removal Transformation”, 9th IEEE International Workshop on Program Comprehension (IWPC 2001). Toronto, Canada, May 12th-13th, 2001, pages 309-319.

[2] Vienneau L. Robert, Data & Analysis Center for Software(DACS), “The Present Value of Software Maintenance”, Journal of Parametrics, Vol XV, No 1, 25th April 1995, pages 18-36.

[3] Harman M, Lin H, Rob H, Malcolm M, Xingyuan Z, Jose J.D, Mari C.O, Joachim W, “A Post-Placement Side-Effect Removal Algorithm”, 18th IEEE International Conference on Software Maintenance (ICSM 2002), 3 - 6 October, 2002, Montreal, Canada. Pages 2- 11.

[4] Dolado J. J, Mark H, Mari C.O, Lin H, “An Empirical Investigation of the Influence of a Type of Side Effects on Program Comprehension”, IEEE Transactions on Software Engineering, 2003, pages 665-670.

[5] Pressman S. Roger, “Software Engineering”, Fifth Edition, McGraw-Hill, New York, International Edition 2001

[6] Chase Nicholas, Understanding DOM, IBM tutorial, url:https://www6.software.ibm.com/developerworks/education/x-udom/index.html,

[7] Langer Angelika, Klaus Kreft, “Sequence Points and Expression Evaluation”, Visual System Journal, August 2002, url:http://www.langer.camelot.de/Articles/VSJ/SequencePoints/SequencePoints.html

[8] Wikipedia, The Free Encyclopedia, XML Data Binding, url: http://en.wikipedia.org/wiki/XML_Data_Binding

[9] Bourret Ronald, XML Data Binding Recourses, last updated: April 14, 2005 url: http://www.rpbourret.com/xml/XMLDataBinding.htm#databinding

[10] Harold R. Elliot, W. Scott Means, “XML in a Nutshell”, Third Edition, O’Reilly Media, Inc., Sebastopol, September 2004

[11] Visser Eelco, Program Transformation, last visited: July 19, 2005, url: http://www.program-transformation.org/Transform/ProgramTransformation

[12] Harnam Mark, “Program Transformation – new programs for old”, Centaur Communications Ltd, EXE Magazine, July 1997, pages 25-30

[13] Ward. M. P, “Reverse Engineering through Formal Transformation”, Computer Journal, Vol 37, No 7, 1994

[14] Visser E, A Survey of Rewriting Strategies in Program Transformation Systems, In Electronic Notes in Theoretical Computer Science, The Netherlands, 2001

[15] Dyck S, David S, “Software Testing”, Seng 621, Winter 1999, University of Calgary url: http://sern.ucalgary.ca/~sdyck/courses/seng621/webdoc.html#Intro

[16] Osterlie T, “An introduction to software Testing”, IPL Information Processing Ltd,”13/05/99

[17] Wegener J. “Evolutionary Testing – Overview”, Daimler Chrysler AG, Research and Technology,

[18] Harman M, “Slides on Evolutionary testing”, King’s College London, 2005

[19] Sthamer H, Joachim W, Andre B, “Using Evolutionary testing to improve Efficiency and Quality in Software Testing”, Daimler Chrysler AG, Research and Technology

[20] McMinn P, “Search-based Software Test Data Generation: A Survey”, Software Testing, Verification and Reliability, 14, pages 105-156, June 2004

[21] Sthamer H, Joachim W, Andre B, “Evolutionary Testing of Embedded Systems”, Daimler Chrysler AG, Research and Technology

[22] Harman M, Lin H, Robert H, Baresel A, Sthamer H (2002), Improving Evolutionary Testing by Flag Removal, AAAI Genetic and Evolutionary Computation COnference 2002 (GECCO 2002). New York, USA, July 9th-13th 2002. Pages 1351-1358.

[23] Gellerich W, Kosiol M, Ploedereder E, Where does GOTO go to?, in: Reliable Software Technologies, Ada-Europe 1996, Lecture Notes in Computer Science 1088, pp 385-395, 1996

[24] L. Cannon, R. Elliot, L. Kirchhof, J. Miller, J. Milner, R. Mitze, E. Schan, N. Whittington, “Recommended C Style and coding Standards”, Revision 6.0, 25 June, 1990 url: http://www.cs.cornell.edu/courses/cs314/2004FA/tutorials/cstyle.pdf

[25] Polak. S, “In Search of the Ideal Programming Language”, Pace University, Department of computer Science, 1997 url: http://members.aol.com/SergeyP/paper.html

[26] Fatiregun. A, “Testability Transformation”, BSc report, Brunel University, May,2002

25