Automatic Test Data Generation Using Constraint Programming and Search Based Software Engineering Techniques

UNIVERSITE´ DE MONTREAL´ AUTOMATIC TEST DATA GENERATION USING CONSTRAINT PROGRAMMING AND SEARCH BASED SOFTWARE ENGINEERING TECHNIQUES ABDELILAH SAKTI DEPARTEMENT´ DE GENIE´ INFORMATIQUE ET GENIE´ LOGICIEL ECOLE´ POLYTECHNIQUE DE MONTREAL´ THESE` PRESENT´ EE´ EN VUE DE L'OBTENTION DU DIPLOME^ DE PHILOSOPHIÆ DOCTOR (GENIE´ INFORMATIQUE) DECEMBRE´ 2014 c Abdelilah Sakti, 2014. UNIVERSITE´ DE MONTREAL´ ECOLE´ POLYTECHNIQUE DE MONTREAL´ Cette thèseintitulée: AUTOMATIC TEST DATA GENERATION USING CONSTRAINT PROGRAMMING AND SEARCH BASED SOFTWARE ENGINEERING TECHNIQUES présentée par: SAKTI Abdelilah en vue de l'obtention du diplôme de: Philosophiæ Doctor a étédûment acceptéepar le jury d'examen constituéde: M. BELTRAME Giovanni, Ph.D., président M. PESANT Gilles, Ph.D., membre et directeur de recherche M. GUEH´ ENEUC´ Yann-Gaël, Doctorat, membre et codirecteur de recherche M. ANTONIOL Giuliano, Ph.D., membre M. HARMAN Mark, Ph.D., membre externe iii This dissertation is dedicated to my wife, Khadija, my sons, Adam and Anas, and to my mother, A¨ıcha. Their support, encouragement, patience, and understanding have sustained me throughout my life. iv ACKNOWLEDGMENTS First of all, I would like to thank almighty ALLAH. Without his wish nothing is possible. Completion of this doctoral research would not have been possible without the contributions of many people throughout the research project. Foremost are the generous support and patience of my supervisors. I would like to take this opportunity to express my appreciation to my principal supervisor, Prof. Dr. Gilles Pesant, and associate supervisor Prof. Dr. Yann-Gaël Guéhéneuc, for their encouragement, advice, guidance, and inspiration throughout the duration of this research project. I have learned much from them about the attitudes and skills for conducting research and presenting ideas. They brought forth this research and allowed me to extend my education beyond the formal studies leading to this dissertation. They could not even realize how much I have learned from them. I am very obliged for their motivation and immense knowledge in Software Engineering that, taken together, make them great mentors. I would also thank Prof. Dr. Giuliano Antoniol, for his feedback and productive discussions during the various stages of the research project. I would also like to thank the members of my Ph.D. committee who enthusiastically accepted to monitor and read my dissertation. I am very thankful to all my colleagues of Quosseca laboratory and Ptidej teams for their feedback and the productive discussions. I am deeply grateful for the support my family provided during every stage of this dissertation. Finally, appreciation is extended to the staff of the Ecole´ Polytechnique de Montréal for their support throughout my research project. v RESUM´ E´ Prouver qu'un logiciel correspond àsa spécification ou exposer des erreurs cachées dans son implémentation est une tâche de test très difficile, fastidieuse et peut coûter plus de 50% de coût total du logiciel. Durant la phase de test du logiciel, la génération des données de test est l'une des tâches les plus coûteuses. Par conséquent, l'automatisation de cette tâche permet de réduire considérablement le coût du logiciel, le temps de développement et les délais de commercialisation. Plusieurs travaux de recherche ont proposédes approches automatiséespour générer des données de test. Certains de ces travaux ont montréque les techniques de génération des données de test qui sont basées sur des métaheuristiques (SB-STDG) peuvent générer au- tomatiquement des donnéesde test. Cependant, ces techniques sont trèssensibles àleur orientation qui peut avoir un impact sur l'ensemble du processus de génération des données de test. Une insuffisance d'informations pertinentes sur le problème de génération des don- néesde test peut affaiblir l'orientation et affecter négativement l'efficacitéet l'effectivitéde SB-STDG. Dans cette thèse, notre proposition de recherche est d'analyser statiquement le code source pour identifier et extraire des informations pertinentes afin de les exploiter dans le processus de SB-STDG pourrait offrir davantage d'orientation et ainsi d'améliorer l'efficacitéet l'effectivitéde SB-STDG. Pour extraire des informations pertinentes pour l'orientation de SB-STDG, nous analysons de manière statique la structure interne du code source en se concentrant sur six caractéristiques, i.e., les constantes, les instructions conditionnelles, les arguments, les membres de données, les méthodes et les relations. En mettant l'accent sur ces caractéristiques et en utilisant différentes techniques existantes d'analyse statique, i.e, la programmation par contraintes (CP), la théorie du schéma et certains analyses statiques légères, nous proposons quatre approches : (1) en mettant l'accent sur les arguments et les instructions conditionnelles, nous définissons une approche hybride qui utilise les techniques de CP pour guider SB-STDG àréduire son espace de recherche ; (2) en mettant l'accent sur les instructions conditionnelles et en utilisant des techniques de CP, nous définissons deux nouvelles métriques qui mesurent la difficultéàsatisfaire une branche (i.e., condition), d'où nous tirons deux nouvelles fonctions objectif pour guider SB-STDG ; (3) en mettant l'accent sur les instructions conditionnelles et en utilisant la théorie du schéma, nous adaptons l'al- gorithme génétique pour mieux répondre au problème de la génération de donnéesde test ; (4) en mettant l'accent sur les arguments, les instructions conditionnelles, les constantes, les membres de données,les méthodes et les relations, et en utilisant des analyses statiques vi légères, nous définissons un générateur d'instance qui génère des donnéesde test candidates pertinentes et une nouvelle représentation du problème de génération des données de test orienté-objet qui réduit implicitement l'espace de recherche de SB-STDG. Nous montrons que les analyses statiques aident àaméliorer l'efficacitéet l'effectivitéde SB-STDG. Les résultats obtenus dans cette thèsemontrent des améliorations importantes en termes d'efficacitéet d'effectivité.Ils sont prometteurs et nous espérons que d'autres re- cherches dans le domaine de la génération des données de test pourraient améliorer davantage l'efficacitéou l'effectivité. vii ABSTRACT Proving that some software system corresponds to its specification or revealing hidden errors in its implementation is a time consuming and tedious testing process, accounting for 50% of the total software. Test-data generation is one of the most expensive parts of the software testing phase. Therefore, automating this task can significantly reduce software cost, development time, and time to market. Many researchers have proposed automated approaches to generate test data. Among the proposed approaches, the literature showed that Search-Based Software Test-data Generation (SB-STDG) techniques can automatically generate test data. However, these techniques are very sensitive to their guidance which impact the whole test-data generation process. The insufficiency of information relevant about the test-data generation problem can weaken the SB-STDG guidance and negatively affect its efficiency and effectiveness. In this dissertation, our thesis is statically analyzing source code to identify and extract relevant information to exploit them in the SB-STDG process could offer more guidance and thus improve the efficiency and effectiveness of SB-STDG. To extract information relevant for SB-STDG guidance, we statically analyze the internal structure of the source code focusing on six features, i.e., constants, conditional statements, arguments, data members, methods, and relationships. Focusing on these features and using different existing techniques of static analysis, i.e., constraints programming (CP), schema theory, and some lightweight static analyses, we propose four approaches: (1) focusing on arguments and conditional statements, we define a hybrid approach that uses CP techniques to guide SB-STDG in reducing its search space; (2) focusing on conditional statements and using CP techniques, we define two new metrics that measure the difficulty to satisfy a branch, hence we derive two new fitness functions to guide SB-STDG; (3) focusing on conditional statements and using schema theory, we tailor genetic algorithm to better fit the problem of test-data generation; (4) focusing on arguments, conditional statements, constants, data members, methods, and relationships, and using lightweight static analyses, we define an instance generator that generates relevant test-data candidates and a new representation of the problem of object-oriented test-data generation that implicitly reduces the SB-STDG search space. We show that using static analyses improve the SB-STDG efficiency and effectiveness. The achieved results in this dissertation show an important improvements in terms of effectiveness and efficiency. They are promising and we hope that further research in the field of test-data generation might improve efficiency or effectiveness. viii TABLE OF CONTENTS DEDICATION . iii ACKNOWLEDGMENTS . iv RESUM´ E...........................................´ v ABSTRACT ......................................... vii TABLE OF CONTENTS . viii LIST OF TABLES . xiii LIST OF FIGURES . xiv LIST OF ABBREVATIONS . xvi CHAPTER 1 INTRODUCTION . 1 1.1 ProblemandMotivation ..............................2 1.2 Thesis ........................................4 1.3 Contributions ....................................6 1.4 Organization of the Dissertation . .7 I Background 10 CHAPTER 2 AUTOMATIC TEST DATA GENERATION . 11 2.1 White-boxTesting ................................. 11 2.1.1 Control Flow Graph (CFG) . 12 2.1.2 Coverage Criterion . 12 2.1.3 Data Flow Analysis . 12 2.2 Search Based Software Test Data Generation (SB-STDG) . 14 2.2.1 Fitness Function .

Automatic Test Data Generation Using Constraint Programming and Search Based Software Engineering Techniques

The Design and Implementation of Object-Constraint Programming

Constraint Programming and Operations Research

Chapter 1 GLOBAL CONSTRAINTS and FILTERING ALGORITHMS

Prolog Lecture 6

Quantum-Accelerated Constraint Programming

Handbook of Constraint Programming

Non-Systematic Backtracking for Mixed Integer Programs

Towards Flexible Goal-Oriented Logic Programming

Overview of the Monadic Constraint Programming Framework

Constraint Programming

The Constraint Programming Solver This Document Is an Individual Chapter from SAS/OR® 15.1 User’S Guide: Mathematical Programming

Constraint Programming and Scheduling Materials from the Course Taught at HTWG Constanz, Germany