Genetic Improvement of Software for Multiple Objectives

GENETIC IMPROVEMENT OF SOFTWARE From Program Landscapes to the Automatic Improvement of a Live System saemundur oskar haraldsson Doctor of Philosophy Institute of Computing Science and Mathematics University of Stirling May 2017 [ 18th October 2017 at 10:37 ] DECLARATION I, Saemundur Oskar Haraldsson, hereby declare that all material in this thesis is my own original work except where reference has been given to other authors. The work presented here has not been submitted for the award of any other degree at the University of Stirling nor any other Institute. Stirling, May 2017 Saemundur Oskar Haraldsson ii [ 18th October 2017 at 10:37 ] ABSTRACT In today’s technology driven society, software is becoming increasingly important in more areas of our lives. The domain of software extends beyond the obvious domain of computers, tablets, and mobile phones. Smart devices and the internet-of-things have inspired the integra- tion of digital and computational technology into objects that some of us would never have guessed could be possible or even necessary. Fridges and freezers connected to social media sites, a toaster activated with a mobile phone, physical buttons for shopping, and verbally asking smart speakers to order a meal to be delivered. This is the world we live in and it is an exciting time for software engineers and computer scientists. The sheer volume of code that is currently in use has long since outgrown beyond the point of any hope for proper manual maintenance. The rate of which mobile application stores such as Google’s and Apple’s have expanded is astounding. The research presented here aims to shed a light on an emerging field of research, called Genetic Improvement (GI) of software. It is a methodology to change program code to improve existing software. This thesis details a framework for GI that is then applied to explore fitness landscape of bug fixing Python software, reduce execution time in a C++ program, and integrated into a live system. We show that software is generally not fragile and although fitness landscapes for GI are flat they are not impossible to search in. This conclusion applies equally to bug fixing in small programs as well as execution time improvements. The framework’s application is shown to be transportable between programming languages with minimal effort. Additionally, it can be easily integrated into a system that runs a live web service. The work within this thesis was funded by EPSRC grant EP/J017515/1 through the DAASE project. 1 1 http://daase.cs.ucl.ac.uk/ iii [ 18th October 2017 at 10:37 ] To my wife and children iv [ 18th October 2017 at 10:37 ] ACKNOWLEDGMENTS During the four years of my studies at the University of Stirling I have been fortunate to benefit from discussions with a large number of people. I will never be able to include each of them personally in my thanks and so I extend my gratitude to all I have had the pleasure to exchange ideas with. First I would like to thank my supervisory team for a very fruitful partnership, especially John which I consider my close friend. Sandy has contributed no less to my progress even though he joined the team half-way through. David Cairns gets my thanks for insightful and well thought out observations every time we meet. Edmund Burke was responsible for bringing me to Scotland and for that I am truly thankful. I am also very thankful for having had the opportunity to work with the other original PI’s of the DAASE project Mark Harman, John Clark, and Xin Yao. Every discussion (social and technical) with any one of you had an impact on my work and career. Additionally I have benefited for getting to know Earl Barr, the newly appointed PI of the DAASE project. Our discussions when you came to Stirling and while we had our day of adventure in Buenos Aires have helped my research and career directions after the PhD. Much of the work in this thesis had not been possible without brilliant collaborators. Janus Rehabilitation and the Icelandic Heart Association have provided me with the material to experiment with. Albert V. Smith has my thanks for good ideas and giving me a very short lesson in genetics. I have been extremely lucky by having the full trust of Kristín and Vilmundur. Half of the contents of this thesis would not have been realised if it were not for their complete faith in me. Within DAASE, I have had the extreme pleasure to work with one of my long time idols, Bill Langdon. Every time we meet I am reminded why I want to stay in academia. Justyna Petke and David White, I thank you for our collaborations, ideas, and most of all friendship. There is never a dull moment when we meet, in conferences and events. Everyone else on DAASE has my thanks as well but there are just to many to list. I would like to thank the other PhD students, past and present, in particular: Jason, Kevin, Annan, Ken, Paul, and Sarah thanks for sharing coffee, interesting conversations, laughter, and fun whenever there was need. My parents and brother are admirable for putting up with me before Heiða took me off their hands: I hope I make you proud. My in-laws, Brynjólfur and Kristín, have a special place in my thoughts and I will be forever grateful for their support. v [ 18th October 2017 at 10:37 ] Lastly and most importantly I have everything to thank for my wonderful wife, Heiða, ég elska þig. This thesis along with all the research and work that led to its submission would not have been possible without you. Your patience, support, and encouragements have inspired and helped me immensely. My children, Guðný Birna and Brynjólfur Kristinn, deserve compliments for their endurance in waiting for their dad to finish the thesis. vi [ 18th October 2017 at 10:37 ] LISTOFPUBLICATIONS The following publications present work that was used in this thesis: • Saemundur O. Haraldsson and John R. Woodward Automated design of algorithms and genetic improvement: contrast and commonalities. In Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation 2014 Jul 12 (pp. 1373-1380). ACM. This postition paper served as a basis for the literature review. • Saemundur O. Haraldsson and John R. Woodward Genetic Improvement of Energy Usage is only as Reliable as the Measurements are Accurate. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 2015 Jul 11 (pp. 821-822). ACM. This postition paper is referenced in the literature review and forms a large part of the energy optimisation section. • Saemundur O. Haraldsson, John R. Woodward, Alexander E.I. Brownlee and David Cairns Exploring Fitness and Edit Distance of Mutated Python Programs. In European Conference on Genetic Programming 2017 Mar 30 (pp. ?-?). Springer International Publishing. This paper is the underlying publication for the work in Chapter 6. • Saemundur O. Haraldsson, Ragnheidur Dora Brynjolfsdottir, John R. Woodward, Kristin Siggeirsdottir, and Vilmundur Gudnason. The Use of Predictive Models in a Dynamic Planning of Treatment. In Proceedings - IEEE Symposium on Computers and Commu- nications, Heraklion, Greece, 2017. IEEE. The work of this paper is used in Chapter 8 • Saemundur O. Haraldsson, John R. Woodward, Alexander E.I. Brownlee, and Kristin Siggeirsdottir. Fixing Bugs in Your Sleep: How Genetic Improvement Became an Overnight Success. In Proceedings of the 2017 Conference Companion on Genetic and Evolutionary Computation Companion, Berlin, Germany, 2017. ACM. This paper forms the work of Chapter 8 • Saemundur O. Haraldsson, John R. Woodward, Alexander E.I. Brownlee, Albert V Smith, and Vilmundur Gudnason. Genetic Improvement of Runtime and its fitness landscape vii [ 18th October 2017 at 10:37 ] in a Bioinformatics Application. In Proceedings of the 2017 Conference Companion on Genetic and Evolutionary Computation Companion, Berlin, Germany, 2017. ACM. This paper is the basis for Chapter 7 • Saemundur O Haraldsson, John R Woodward, and Alexander I E Brownlee. The Use of Automatic Test Data Generation for Genetic Improvement in a Live System. In 8th International Workshop on Search-Based Software Testing, Buones Aires, 2017. ACM. This paper is the predecessor of the main paper for Chapter 8 The following papers were written while conducting the research for this thesis. They are cited and mentioned but do not contribute to the main narrative. • Kristin Siggeirsdottir, Ragnheidur Dora Brynjolfsdottir, Saemundur Oskar Haraldsson, Sigurdur Vidar, Emanuel Geir Gudmundsson, Jon Hjalti Brynjolfsson, Helgi Jonsson, Omar Hjaltason, and Vilmundur Gudnason. Determinants of outcome of vocational rehabilitation. Work, 55(3):577–583, nov 2016. • Kocsis ZA, Neumann G, Swan J, Epitropakis MG, Brownlee AE, Haraldsson SO, Bowles E. Repairing and optimizing Hadoop hashCode implementations. In International Symposium on Search Based Software Engineering 2014 Aug 26 (pp. 259-264). Springer International Publishing. viii [ 18th October 2017 at 10:37 ] CONTENTS i introduction1 1 motivationsforimprovingprogramsautomatically 2 1.1 Introduction . 2 1.2 Genetic Improvement . 3 1.3 Thesis Structure . 4 2 hypotheses 5 2.1 Introduction . 5 2.2 Central Hypothesis . 5 2.2.1 Small Programs’ Landscapes . 5 2.2.2 Execution Time Improvements . 6 2.2.3 Dynamic Adaptive Software . 7 2.3 Summary . 7 ii literature review8 3geneticimprovementofsoftware 9 3.1 Introduction . 9 3.2 Representation of Changes to Programs . 10 3.2.1 Whole Programs . 11 3.2.2 Patches . 15 3.2.3 Programming Languages Targeted with GI . 20 3.2.4 Non-Evolutionary Representations . 24 3.3 Search Methodologies used for GI . 25 3.3.1 Genetic Programming . 25 3.3.2 Genetic Algorithm . 27 3.3.3 Other Search Methods .

Genetic Improvement of Software for Multiple Objectives

A Specification-Oriented Semantics for the Refinement of Real-Time Systems

Removing Redundant Refusals: Minimal Complete Test Suites for Failure Trace Semantics

A Comprehensive Survey of Trends in Oracles for Software Testing

Interval Temporal Logic

Issue 2005 2 June 2005

Interval Temporal Logic

Community Engagement

June 2005 a C T S

Formal Specification of Scientific Applications Using Interval

The Developer Factor in Software Privacy

A Model-Driven Architecture Based Evolution Method and Its Application in an Electronic Learning System

Set-Based Models for Cryptocurrency Software