Automatic Test Data Generation Using Constraint Programming and Search Based Software Engineering Techniques
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITE´ DE MONTREAL´ AUTOMATIC TEST DATA GENERATION USING CONSTRAINT PROGRAMMING AND SEARCH BASED SOFTWARE ENGINEERING TECHNIQUES ABDELILAH SAKTI DEPARTEMENT´ DE GENIE´ INFORMATIQUE ET GENIE´ LOGICIEL ECOLE´ POLYTECHNIQUE DE MONTREAL´ THESE` PRESENT´ EE´ EN VUE DE L'OBTENTION DU DIPLOME^ DE PHILOSOPHIÆ DOCTOR (GENIE´ INFORMATIQUE) DECEMBRE´ 2014 c Abdelilah Sakti, 2014. UNIVERSITE´ DE MONTREAL´ ECOLE´ POLYTECHNIQUE DE MONTREAL´ Cette th`eseintitul´ee: AUTOMATIC TEST DATA GENERATION USING CONSTRAINT PROGRAMMING AND SEARCH BASED SOFTWARE ENGINEERING TECHNIQUES pr´esent´ee par: SAKTI Abdelilah en vue de l'obtention du dipl^ome de: Philosophiæ Doctor a ´et´ed^ument accept´eepar le jury d'examen constitu´ede: M. BELTRAME Giovanni, Ph.D., pr´esident M. PESANT Gilles, Ph.D., membre et directeur de recherche M. GUEH´ ENEUC´ Yann-Ga¨el, Doctorat, membre et codirecteur de recherche M. ANTONIOL Giuliano, Ph.D., membre M. HARMAN Mark, Ph.D., membre externe iii This dissertation is dedicated to my wife, Khadija, my sons, Adam and Anas, and to my mother, A¨ıcha. Their support, encouragement, patience, and understanding have sustained me throughout my life. iv ACKNOWLEDGMENTS First of all, I would like to thank almighty ALLAH. Without his wish nothing is possible. Completion of this doctoral research would not have been possible without the contribu- tions of many people throughout the research project. Foremost are the generous support and patience of my supervisors. I would like to take this opportunity to express my appreciation to my principal super- visor, Prof. Dr. Gilles Pesant, and associate supervisor Prof. Dr. Yann-Ga¨el Gu´eh´eneuc, for their encouragement, advice, guidance, and inspiration throughout the duration of this research project. I have learned much from them about the attitudes and skills for conducting research and presenting ideas. They brought forth this research and allowed me to extend my education beyond the formal studies leading to this dissertation. They could not even realize how much I have learned from them. I am very obliged for their motivation and immense knowledge in Software Engineering that, taken together, make them great mentors. I would also thank Prof. Dr. Giuliano Antoniol, for his feedback and productive discus- sions during the various stages of the research project. I would also like to thank the members of my Ph.D. committee who enthusiastically accepted to monitor and read my dissertation. I am very thankful to all my colleagues of Quosseca laboratory and Ptidej teams for their feedback and the productive discussions. I am deeply grateful for the support my family provided during every stage of this disser- tation. Finally, appreciation is extended to the staff of the Ecole´ Polytechnique de Montr´eal for their support throughout my research project. v RESUM´ E´ Prouver qu'un logiciel correspond `asa sp´ecification ou exposer des erreurs cach´ees dans son impl´ementation est une t^ache de test tr`es difficile, fastidieuse et peut co^uter plus de 50% de co^ut total du logiciel. Durant la phase de test du logiciel, la g´en´eration des donn´ees de test est l'une des t^aches les plus co^uteuses. Par cons´equent, l'automatisation de cette t^ache permet de r´eduire consid´erablement le co^ut du logiciel, le temps de d´eveloppement et les d´elais de commercialisation. Plusieurs travaux de recherche ont propos´edes approches automatis´eespour g´en´erer des donn´ees de test. Certains de ces travaux ont montr´eque les techniques de g´en´eration des donn´ees de test qui sont bas´ees sur des m´etaheuristiques (SB-STDG) peuvent g´en´erer au- tomatiquement des donn´eesde test. Cependant, ces techniques sont tr`essensibles `aleur orientation qui peut avoir un impact sur l'ensemble du processus de g´en´eration des donn´ees de test. Une insuffisance d'informations pertinentes sur le probl`eme de g´en´eration des don- n´eesde test peut affaiblir l'orientation et affecter n´egativement l'efficacit´eet l'effectivit´ede SB-STDG. Dans cette th`ese, notre proposition de recherche est d'analyser statiquement le code source pour identifier et extraire des informations pertinentes afin de les exploiter dans le proces- sus de SB-STDG pourrait offrir davantage d'orientation et ainsi d'am´eliorer l'efficacit´eet l'effectivit´ede SB-STDG. Pour extraire des informations pertinentes pour l'orientation de SB-STDG, nous analysons de mani`ere statique la structure interne du code source en se concentrant sur six caract´eristiques, i.e., les constantes, les instructions conditionnelles, les arguments, les membres de donn´ees, les m´ethodes et les relations. En mettant l'accent sur ces caract´eristiques et en utilisant diff´erentes techniques existantes d'analyse statique, i.e, la programmation par contraintes (CP), la th´eorie du sch´ema et certains analyses statiques l´eg`eres, nous proposons quatre approches : (1) en mettant l'accent sur les arguments et les instructions conditionnelles, nous d´efinissons une approche hybride qui utilise les techniques de CP pour guider SB-STDG `ar´eduire son espace de recherche ; (2) en mettant l'accent sur les instructions conditionnelles et en utilisant des techniques de CP, nous d´efinissons deux nouvelles m´etriques qui mesurent la difficult´e`asatisfaire une branche (i.e., condition), d'o`u nous tirons deux nouvelles fonctions objectif pour guider SB-STDG ; (3) en mettant l'accent sur les instructions conditionnelles et en utilisant la th´eorie du sch´ema, nous adaptons l'al- gorithme g´en´etique pour mieux r´epondre au probl`eme de la g´en´eration de donn´eesde test ; (4) en mettant l'accent sur les arguments, les instructions conditionnelles, les constantes, les membres de donn´ees,les m´ethodes et les relations, et en utilisant des analyses statiques vi l´eg`eres, nous d´efinissons un g´en´erateur d'instance qui g´en`ere des donn´eesde test candidates pertinentes et une nouvelle repr´esentation du probl`eme de g´en´eration des donn´ees de test orient´e-objet qui r´eduit implicitement l'espace de recherche de SB-STDG. Nous montrons que les analyses statiques aident `aam´eliorer l'efficacit´eet l'effectivit´ede SB-STDG. Les r´esultats obtenus dans cette th`esemontrent des am´eliorations importantes en termes d'efficacit´eet d'effectivit´e.Ils sont prometteurs et nous esp´erons que d'autres re- cherches dans le domaine de la g´en´eration des donn´ees de test pourraient am´eliorer davantage l'efficacit´eou l'effectivit´e. vii ABSTRACT Proving that some software system corresponds to its specification or revealing hidden errors in its implementation is a time consuming and tedious testing process, accounting for 50% of the total software. Test-data generation is one of the most expensive parts of the soft- ware testing phase. Therefore, automating this task can significantly reduce software cost, development time, and time to market. Many researchers have proposed automated approaches to generate test data. Among the proposed approaches, the literature showed that Search-Based Software Test-data Generation (SB-STDG) techniques can automatically generate test data. However, these techniques are very sensitive to their guidance which impact the whole test-data generation process. The insufficiency of information relevant about the test-data generation problem can weaken the SB-STDG guidance and negatively affect its efficiency and effectiveness. In this dissertation, our thesis is statically analyzing source code to identify and extract relevant information to exploit them in the SB-STDG process could offer more guidance and thus improve the efficiency and effectiveness of SB-STDG. To extract information relevant for SB-STDG guidance, we statically analyze the internal structure of the source code focusing on six features, i.e., constants, conditional statements, arguments, data members, methods, and relationships. Focusing on these features and using different existing techniques of static analysis, i.e., constraints programming (CP), schema theory, and some lightweight static analyses, we propose four approaches: (1) focusing on arguments and conditional statements, we define a hybrid approach that uses CP techniques to guide SB-STDG in reducing its search space; (2) focusing on conditional statements and using CP techniques, we define two new metrics that measure the difficulty to satisfy a branch, hence we derive two new fitness functions to guide SB-STDG; (3) focusing on conditional statements and using schema theory, we tailor genetic algorithm to better fit the problem of test-data generation; (4) focusing on arguments, conditional statements, constants, data members, methods, and relationships, and using lightweight static analyses, we define an instance generator that generates relevant test-data candidates and a new representation of the problem of object-oriented test-data generation that implicitly reduces the SB-STDG search space. We show that using static analyses improve the SB-STDG efficiency and effectiveness. The achieved results in this dissertation show an important improvements in terms of effectiveness and efficiency. They are promising and we hope that further research in the field of test-data generation might improve efficiency or effectiveness. viii TABLE OF CONTENTS DEDICATION . iii ACKNOWLEDGMENTS . iv RESUM´ E...........................................´ v ABSTRACT ......................................... vii TABLE OF CONTENTS . viii LIST OF TABLES . xiii LIST OF FIGURES . xiv LIST OF ABBREVATIONS . xvi CHAPTER 1 INTRODUCTION . 1 1.1 ProblemandMotivation ..............................2 1.2 Thesis ........................................4 1.3 Contributions ....................................6 1.4 Organization of the Dissertation . .7 I Background 10 CHAPTER 2 AUTOMATIC TEST DATA GENERATION . 11 2.1 White-boxTesting ................................. 11 2.1.1 Control Flow Graph (CFG) . 12 2.1.2 Coverage Criterion . 12 2.1.3 Data Flow Analysis . 12 2.2 Search Based Software Test Data Generation (SB-STDG) . 14 2.2.1 Fitness Function .