PDF Hosted at the Radboud Repository of the Radboud University Nijmegen
Total Page:16
File Type:pdf, Size:1020Kb
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/204194 Please be advised that this information was generated on 2021-09-28 and may be subject to change. Nominal Techniques and Black Box Testing for Automata Learning Joshua Moerman ii Work in the thesis has been carried out under the auspices of the research school IPA (Institute for Programming research and Algorithmics) Printed by Gildeprint, Enschede Typeset using ConTEXt MKIV ISBN: 978–94–632–3696–6 IPA Dissertation series: 2019-06 Copyright © Joshua Moerman, 2019 www.joshuamoerman.nl Nominal Techniques and Black Box Testing for Automata Learning Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. dr. J.H.J.M. van Krieken, volgens besluit van het college van decanen in het openbaar te verdedigen op maandag 1 juli 2019 om 16:30 uur precies door Joshua Samuel Moerman geboren op 1 oktober 1991 te Utrecht iv Promotoren: – prof. dr. F.W. Vaandrager – prof. dr. A. Silva (University College London, Verenigd Koninkrijk) Copromotor: – dr. S.A. Terwijn Leden manuscriptcommissie: – prof. dr. B.P.F. Jacobs – prof. dr. A.R. Cavalli (Télécom SudParis, Frankrijk) – prof. dr. F. Howar (Technische Universität, Dortmund, Duitsland) – prof. dr. S. Lasota (Uniwesytet Warszawkski, Polen) – dr. D. Petrișan (Université Paris Diderot, Frankrijk) Paranimfen: – Alexis Linard – Tim Steenvoorden Samenvatting Het leren van automaten speelt een steeds grotere rol bij de verificatie van software. Tij- dens het leren, verkent een leeralgoritme het gedrag van software. Dit gaat in principe volledig automatisch, en het algoritme pakt vanzelf interessante eigenschappen op van de software. Het is hiermee mogelijk een redelijk precies model te maken van de werking van het stukje software dat we onder de loep nemen. Fouten en onverwacht gedrag van software kunnen hiermee worden blootgelegd. In dit proefschrift kijken we in eerste instantie naar technieken voor testgeneratie. Deze zijn nodig om het leeralgoritme een handje te helpen. Na het automatisch verken- nen van gedrag, formuleert het leeralgoritme namelijk een hypothese die de software nog niet goed genoeg modelleert. Om de hypothese te verfijnen en verder te leren, hebben we tests nodig. Efficiëntie staat hierbij centraal: we willen zo min mogelijk testen, want dat kost tijd. Aan de andere kant moeten we wel volledig testen: als er een discrepantie is tussen het geleerde model en de software, dan willen we die met een test kunnen aanwijzen. In de eerste paar hoofdstukken laten we zien hoe testen van automaten te werk gaat. We geven een theoretisch kader om verschillende, bestaande n-volledige testge- neratiemethodes te vergelijken. Op grond hiervan beschrijven we een nieuw, efficiënt algoritme. Dit nieuwe algoritme staat centraal bij een industriële casus waarin we een model van complexe printer-software van Océ leren. We laten ook zien hoe een van de deelproblemen – het onderscheiden van toestanden met zo kort mogelijke invoer – efficiënt kan worden opgelost. Het tweede thema in dit proefschrift is de theorie van formele talen en automaten met oneindige alfabetten. Ook dit is zinnig voor het leren van automaten. Software, en in het bijzonder internet-communicatie-protocollen, maken namelijk vaak gebruik van „identifiers” om bijvoorbeeld verschillende gebruikers te onderscheiden. Het liefst nemen we oneindig veel van zulke identifiers aan, aangezien we niet weten hoeveel er nodig zijn voor het leren van de automaat. We laten zien hoe we de leeralgoritmes gemakkelijk kunnen veralgemeniseren naar oneindige alfabetten door gebruik te maken van nominale verzamelingen.In het bijzonder kunnen we hiermee registerautomaten leren.Vervolgens werken we de theorie van nominale automaten verder uit. We laten zien hoe je deze structuren efficiënt kan implementeren. En we geven een speciale klasse van nominale automaten die een veel kleinere representatie hebben. Dit zou gebruikt kunnen worden om zulke automaten sneller te leren. vi Summary Automata learning plays a more and more prominent role in the field of software verification.Learning algorithms are able to automatically explore the behaviour of software. By revealing interesting properties of the software, these algorithms can create models of the, otherwise unknown, software.These learned models can, in turn, be inspected and analysed, which often leads to finding bugs and inconsistencies in the software. An important tool which we need when learning software is test generation. This is the topic of the first part of this thesis. After the learning algorithm has learned a model and constructed a hypothesis, test generation methods are used to validate this hypothesis. Efficiency is key: we want to test as little as possible, as testing may take valuable time. However, our tests have to be complete: if the hypothesis fails to model the software well, we better have a test which shows this discrepancy. The first few chapters explain black box testing of automata. We present a theo- retical framework in which we can compare existing n-complete test generation methods. From this comparison, we are able to define a new, efficient algorithm.In an industrial case study on embedded printer software, we show that this new algorithm works well for finding counterexamples for the hypothesis. Besides the test generation, we show that one of the subproblems – finding the shortest sequences to separate states – can be solved very efficiently. The second part of this thesis is on the theory of formal languages and automata with infinite alphabets.This, too, is discussed in the context of automata learning. Many pieces of software make use of identifiers or sequence numbers. These are used, for example, in order to distinguish different users or messages.Ideally, we would like to model such systems with infinitely many identifiers, as we do not know beforehand how many of them will be used. Using the theory of nominal sets, we show that learning algorithms can easily be generalised to automata with infinite alphabets.In particular, this shows that we can learn register automata. Furthermore, we deepen the theory of nominal sets. First, we show that, in a special case, these sets can be implemented in an efficient way. Second, we give a subclass of nominal automata which allow for a much smaller representation. This could be useful for learning such automata more quickly. viii Acknowledgements Foremost, I would like to thank my supervisors. Having three of them ensured that there were always enough ideas to work on, theory to understand, papers to review, seminars to attend, and chats to have. Frits, thank you for being a very motivating supervisor, pushing creativity, and being only a few meters away. It started with a small puzzle (trying a certain test algorithm to help with a case study), which was a great, hands-on start of my Ph.D.. You introduced me to the field of model learning in a way that showcases both the theoretical and practical aspects. Alexandra, thanks for introducing me to abstract reasoning about state machines, the coalgebraic way. Although not directly shown in this thesis, this way of thinking has helped and you pushed me to pursuit clear reasoning. Besides the theoretical things I’ve learned, you have also taught me many personal lessons inside and outside of academia; thanks for inviting me to London, Caribbean islands, hidden cocktail clubs, and the best food. And thanks for leaving me with Daniela and Matteo, who introduced me to nominal techniques, while you were on sabbatical. Bas, thanks for broadening my understanding of the topics touched upon in this thesis. Unfortunately, we have no papers together, but the connections you showed to logic, computational learning, and computability theory have influenced the thesis nevertheless. I am grateful for the many nice chats we had. I would like to thank the members of the manuscript committee, Bart, Ana, Falk, Sławek, and Daniela. Reading a thesis is undoubtedly a lot of work, so thank you for the effort and feedback you have given me.Thanks, also, to the additional members coming to Nijmegen to oppose during the defence, Jan Friso, Jorge, and Paul. On the first floor of the Mercator building,I had the pleasure of spending four years with fun office mates. Michele, thanks for introducing me to the Ph.D. life, by always joking around. Hopefully, we can play a game of Briscola again. Alexis, many thanks for all the tasty proeverijen, whether it was beers, wines, poffertjes, kroketten, or anything else.Your French influences will be missed. Niels, thanks for the abstract nonsense and bashing on politics. Next to our office, was the office with Tim, with whom I had the pleasure of working from various coffee houses in Nijmegen. Further down the corridor, there was the office of Paul and Rick. Paul, thanks for being the kindest colleague I’ve had and for inviting us to your musical endeavours. Rick, thanks for the algorithmic sparring, we had a great collaboration. Was there a more iconic duo on our floor? A good contender would be Petra and Ramon.Thanks for the fun we had with ioco, together with Jan and Mariëlle. Nils, thanks for steering me towards probabilistic x things and opening a door to Aachen.I am also very grateful to Jurriaan for bringing back some coalgebra and category theory to our floor, and