Synthesis Via Partial Exploration and Concise Representations

Fakultat¨ fur¨ Informatik Technische Universitat¨ Munchen¨ A Learning Twist on Controllers: Synthesis via Partial Exploration and Concise Representations Pranav Ashok VollständigerAbdruck der von der FakultätfürInformatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Prof. Dr. Helmut Seidl Prüfendeder Dissertation: 1. Prof. Dr. rer. nat. Jan Kˇret´ınský 2. Prof. Kim Guldstrand Larsen R, Aalborg University, Denmark 3. Prof. Dr.-Ing. Matthias Althoff Die Dissertation wurde am 28.09.2020 bei der Technischen UniversitätMünchen eingereicht und durch die FakultätfürInformatik am 10.02.2021 angenommen. Abstract Quantitative model checking and controller synthesis deal with the task of synthesiz- ing dependable controllers satisfying certain specifications for probabilistic and hybrid systems. However, commonly used techniques are susceptible to the curse of dimension- ality. That is, when the number of state variables increase, the size of the state space grows exponentially. Further, even if this problem is handled and controllers are synthesized successfully, the resulting controllers are typically large and incomprehensible. This thesis investigates techniques to mitigate these two problems. State-space explosion. We present techniques based on partial exploration for -optimal controller synthesis of discrete- and continuous-time Markov decision processes. Our ap- proach makes use of simulations to identify a small part of the state space, sufficient to synthesize a controller with the desired guarantees. For Markov decision processes with the reachability objective, we combine the guaranteed simulation-based technique of bounded real-time dynamic programming, and the extremely scalable game solving technique of Monte Carlo tree search. The resulting algorithm benefits from the best of both worlds. For continuous-time Markov decision processes with the time-bounded reachability objective (TBR), we use simulations to obtain under- and over-approximating sub- models. On them, we run existing TBR algorithms to obtain lower and upper bounds on the optimal TBR value. The sub-model is expanded with the help of more simulations until corresponding bounds sufficiently converge, allowing to extract an -optimal controller. Large and incomprehensible controllers. We propose the use of decision trees to represent controllers synthesized from probabilistic and hybrid systems. Decision tree learning can exploit internal structures of these controllers to group sets of states with the same control action. We extend standard decision tree learning algorithms with the ability to represent controllers with correctness guarantees without losing these guarantees. Based on this, we develop two tools. First, we extend the state-of-the-art control synthesis tool Uppaal Stratego to produce safe, near-optimal, and small controllers for hybrid Markov decision processes. We introduce a novel safe pruning technique that allows one to explore the size-optimality trade-off without losing safety guarantees of the original controller. Next, we introduce the dtControl toolkit, which can convert any non-randomized memoryless controller available in the form of a lookup table into a decision tree. We also present a novel determinization strategy that can locally determ- inize parts of a permissive controller in order to obtain significant reductions in the size of the resulting decision tree. We demonstrate our results on controllers obtained from the hybrid systems tool SCOTS as well as Uppaal Stratego. iii Zusammenfassung Quantitative Modellüberprüfungund Reglersynthese beschäftigensich mit der Aufgabe, zuverlässigeRegler zu synthetisieren, die bestimmte Spezifikationen fürprobabilistische und hybride Systeme erfüllen. Häufigverwendete Techniken leiden jedoch unter dem sogenannten Fluch der Dimensionalität.Das heißt die Größedes Zustandsraums wächst exponentiell mit der Anzahl der Zustandsvariablen. Außerdem, selbst wenn, trotz dieses Problems, Regler erfolgreich synthetisiert werden, sind die resultierenden Regler normalerweise groß und unverständlich. In dieser Arbeit werden Techniken untersucht, die diese beiden Probleme mildern. Zustandsraumexplosion. Wir stellen Techniken vor, die auf einer partiellen Erkundung füreine -optimale Reglersynthese von zeitdiskreten und zeitkontinuierlichen Markov- Entscheidungsprozessen basieren. Unser Ansatz nutzt Simulationen, um einen kleinen Teil des Zustandsraums zu identifizieren, der ausreicht, um einen Regler mit den ge- wünschten Garantien zu synthetisieren. FürMarkov-Entscheidungsprozesse mit dem Ziel der Erreichbarkeit kombinieren wir die garantierte simulationsbasierte Technik der bes- chränktendynamischen Echtzeitprogrammierung und die extrem gut skalierbare Spiele- Lösungstechnik der Monte-Carlo-Baumsuche. Der resultierende Algorithmus profitiert vom Besten aus beiden Welten. Fürzeitkontinuierliche Markov-Entscheidungsprozesse mit dem zeitgebundenen Erreichbarkeitsziel (TBR, aus dem Englischen "time-bounded reachability") verwenden wir Simulationen, um unter- und überapproximierende Teilm- odelle zu erhalten. Auf diesen lassen wir existierende TBR-Algorithmen laufen, um un- tere und obere Grenzen fürden optimalen TBR-Wert zu erhalten. Das Teilmodell wird mit Hilfe weiterer Simulationen erweitert, bis die entsprechenden Grenzen ausreichend konvergieren, so dass ein -optimaler Regler extrahiert werden kann. Kleine und erklärbare Controller. Wir schlagen vor, Entscheidungsbäumezu nutzen, um Regler darzustellen, die aus probabilistischen und hybriden Systemen synthetisiert wurden. Das Lernen von Entscheidungsbäumenkann interne Strukturen dieser Reg- ler ausnutzen, um Gruppen von Zuständenmit derselben Kontrollaktion zu vereinen. Wir erweitern die Standardalgorithmen zum Lernen von Entscheidungsbäumenum die Fähigkeit, Regler mit Korrekheitsgarantien darzustellen, ohne dabei diese Garantien zu verlieren. Auf dieser Grundlage entwickeln wir zwei Programme. Erstens erweitern wir das dem Stand der Technik entsprechende Regelungssynthese-Programm Uppaal Stratego um sichere, nahezu optimale und kleine Regler fürhybride Markov-Entscheid- ungsprozesse bereitzustellen. Wir führeneine neue Technik namens sicheres Stutzen ein, die es erlaubt, das Spannungsfeld zwischen Größeund Optimalitätzu erforschen, ohne die Korrektheitsgarantie des ursprünglichen Reglers zu verlieren. Als nächstes stellen wir v Zusammenfassung das Programm dtControl vor, das jeden nicht randomisierten, gedächtnislosen Control- ler, der in Form einer Nachschlagetabelle verfügbarist, in einen Entscheidungsbaum umwandeln kann. Außerdem stellen wir eine neuartige Bestimmungsstrategie vor, mit der Teile eines nicht deterministischen Reglers lokal bestimmt werden können,um die Größedes resultierenden Entscheidungsbaums deutlich zu reduzieren. Wir demonstri- eren unsere Ergebnisse an Reglern, die mit dem Programm fürhybride Systeme SCOTS sowie Uppaal Stratego gewonnen wurden. vi Acknowledgements First and foremost, I would like to thank my doctoral advisor, Jan, for all the support and guidance he has provided me. I am incredibly grateful for the belief and trust he has placed in me that has encouraged me to take up roles of responsibility. It has always been a pleasure to discuss with him, both work and life. I would like to express my sincere gratitude to Prof. Kim G. Larsen, for hosting me at Aalborg University for two wonderful research stays. I consider his infectious enthusiasm and excitement entirely responsible for driving my work on decision trees. I always look forward to meeting and brainstorming with Kim. Further, I would like to thank Prof. TomásBrázdilfor inviting me to Brno for my first research stay abroad. It was delightful working with you and Ondrej, not to forget the great lunches! I am thankful to all my co-authors, Adrien Le Coënt, Christoph H. Lampert, Hol- ger Hermanns, Jakob Haahr Taankvist, Krishnendu Chatterjee, Majid Zamani, Mathias Jackermeier, Przemyslaw Daca, Pushpak Jagtap, Stefanie Mohr, Tobias Winkler, Vahid Hashemi, Viktor Toman, and Yuliya Butkova for the great discussions and fruitful col- laborations. My doctoral studies and my stay in Germany wouldn't have been possible without the generous scholarship provided by the IGSSE Project \10.06 PARSEC" and employment funded by DFG Project \Statistical Unbounded Verification”. I would like to thank the whole IGSSE team, especially Marco Barden and Jo-Anna Küster,who have been amazingly supportive during the far too many times I approached them. I will never forget the immense support provided by Agnieszka Slota of TUM Graduate School during the deportation scare I had. I would like to thank Javier Esparza, our chair, whom I look up to for his patience and ability to calmly deal with challenging situations. Further, I would like to thank all my I-7 colleagues for the fun Mensa lunches and discussions. I cannot sufficiently express my words of gratitude towards Maxi, colleague and friend, for being such a great work partner and going out of his way to help me every time I got myself into trouble. I really enjoyed collaborating with him, both professionally and personally. I am thankful to Bala, Christoph, Kush, Maxi, Shyamlal, and Tobi for proofreading various parts of this thesis and for their valuable feedback. Special thanks to my sister Cookies for pointing out all those bugs in my use of English, and not to

Synthesis Via Partial Exploration and Concise Representations

January 2011 Prizes and Awards

The Best Nurturers in Computer Science Research

Prizes and Awards

October 2008

Valiant Transcript Final with Timestamps

The 2011 Noticesindex

January 2011 Prizes and Awards

Novel Frameworks for Mining Heterogeneous and Dynamic

BEATCS No 115

Tree-Like Structure in Graphs and Embedability to Trees A

Most Prolific DBLP Authors

48Th International Colloquium on Automata, Languages, and Programming