Foundational Analysis Techniques for High-Level Transformation Programs
Total Page:16
File Type:pdf, Size:1020Kb
Foundational Analysis Techniques for High-Level Transformation Programs Ahmad Salim Al-Sibahi Advisors: Committee: Andrzej Wąsowski Riko Jacob, IT University of Copenhagen Aleksandar Dimovski Tijs van der Storm, University of Groningen Submitted: September 2017 Xavier Rival, CNRS/ENS/INRIA/PSL Research University A A yst dqÌt Hy wkyF A w rysA rb yl rb Progress is beyond improving what is; Progress is moving towards what will be. Gibran Khalil Gibran v Abstract High-level transformation languages like TXL and Rascal, simplify writing and maintaining transformations such as refactorings and optimizations, by providing specialized language constructs that include powerful pattern matching, generic traversals, and fixed- point iteration. This expressivity unfortunately makes it hard to reason about the semantics of these languages, and there exists few automated validation and verification tools for these languages. The goal of this dissertation is to develop foundational tech- niques and prototype tools for verifying transformation programs, based on techniques from traditional programming language re- search. At first, I present a systematic analysis of declarative high- level transformation languages, like QVT and ATL, seeking to clar- ify their expressiveness. The analysis showed that virtually all ex- amined languages were Turing-complete and thus could not be treated specially for verification. I describe a joint effort in designing and validating an indus- trial modernization project using the high-level transformation lan- guage TXL. The transformation was validated using the translation validation and symbolic execution, catching 50 serious bug cases in an otherwise well-tested expert-written transformation, and thus showing the need for verification tools in this domain. Using these experiences, I develop a formal symbolic execu- tion algorithm for a small transformation language capturing key high-level constructs. A proto-type white-box test generation tool was implemented using the algorithm, and evaluated on a series of refactorings and model transformations comparing favorably against baseline black-box test generator in terms of test coverage. I present the first formal operational semantics for a large sub- set of Rascal, called Rascal Light, which has been validated by for- mally proving a series of semantic properties regarding strong typ- ing, safety and termination. This creates a solid basis for further foundational work on high-level transformation languages. Finally, I develop a formal abstract interpreter for Rascal Light, with a modular abstract domain design and a Schmidt-style ab- stract operational semantics, adapting the idea of widening by trace memoization to work with inputs from infinite domains. The tech- nique is evaluated on real Rascal transformations, showing that it is possible to effectively verify target type and shape properties. vi Resumé Højniveau transformationssprog såsom TXL og Rascal gør det nem- mere at skrive og vedligeholde transformationer, f.eks. refaktore- ringer og optimeringer, ved at understøtte specielle sprogskon- struktioner inklusiv kraftfulde mønstersamsvarsoperationer (orig. pattern matching), generisk gennemgang af datastrukturer, og fiks- punktsiteration. Denne ekspresivitet gør det dog desværre svært at få en fuld forståelse for semantikken til disse sprog, og der eksiste- rer derfor kun få automatiske værktøjer der understøtter validering og verifikation af dem. Hovedformålet med denne afhandling er at udvikle fundamen- tale teknikker og værktøjsprototyper som understøtter verifikation af transformationsprogrammer, baseret på teknikker fra traditionel programmeringssprogsforskning. I starten, præsenterer jeg en sy- stematisk analyse af deklarative højniveau transformationssprog, såsom QVT og ATL, for at tydeliggøre deres ekspresivitet. Analy- sen viste at stort set alle undersøgte sprog var Turing-komplette og kunne derfor ikke blive særbehandlet i verifikationsøjemed. Jeg beskriver en fælles indsats hvis formål er at designe og va- lidere et industrielt moderniseringsprojekt ved brug af højniveau transformationssproget TXL. Transformationen var valideret ved brug af oversættelsesvalidering (orig. translation validation) og sym- bolsk eksekvering, og opdagede 50 seriøse programfejl i en ellers veltestet transformation skrevet af en ekspert, hvilket fremviser be- hovet for verifikationsværktøjer i dette område. Gørende brug af disse erfaringer, udvikler jeg en formel sym- bolsk eksekveringsalgoritme for et lille transformationssprog der modellerer de vigtigste højniveau konstruktioner. Jeg implemen- terede en white-box testgenereringsværktøjsprototype ved brug af algoritmen, og evaluerede denne vha. en række refaktoreringer og modeltransformationer, hvilket viste gode resultater i forhold til programgrensdækning sammenligned med black-box testgenere- ringsværktøjet som blev brug som basislinje. Jeg præsenterer den første formelle operationelle semantik for en stor delmængde af sproget Rascal, kaldet Rascal Light, hvilket er valideret ved at bevise en række semantiske egenskaber omkring stært typedisciplin, sikkerhed, og programafslutning. Dette lægger et solidt fundament for videre formelt arbejde omkring højniveau transformationssprog. vii Afslutningsvis, udvikler jeg en formel abstrakt fortolker til Ra- scal Light, med et tilhørende modulært abstrakt domæne og en abstrakt operationel semantik i stil med Schmidt, og tilpasser kon- ceptet om udvidelse (orig. widening) via memoisering af programs- por til at virke med input fra uendeligt store domæner. Teknikken evalueres ved brug af rigtige Rascal transformationsprogrammer, hvilket viser at det er muligt at effektivt verificere de egenskaber omkring typer og form som vi havde til mål. Foreword How long a time is three years? It is too long, you get to know your project so well that you become an expert in its extremely narrow sub- ject. It is too short, it ended at the moment you just started to know enough in the field to dare to tackle more complex, more interesting problems. Maybe, it is just enough time to earn a PhD. The thing I recall most is my failing attempts at solving the prob- lems I encountered, moreso than the successes that I present in this dissertation. Yet, these failures, remaining hidden, under the carpet, in my mind, are what makes research all the worthwile. For how can we otherwise make worthwile contributions, if there is nothing to dare; is it truly research if it can not fail, if it does not fail, if it will not fail again? Acknowledgements I would like to thank my family, who helped me with material and moral support, and especially during my PhD studies. I would like to thank the Danish society for funding my education throughout my life, all the way up to this point. This project in particular was supported mainly by The Danish Council for Independent Research under a Sapere Aude project grant no. 0602-02327B., VARIETE, and partially by ARTEMIS JU under grant agreement n◦295397 together with Danish Agency for Science, Technology and Innovation. I would like to thank my supervisors, Andrzej W ˛asowski and Alek- sandar Dimovski for comments, discussions and guidance during my studies, and for putting up with my occasional cloud-headedness. I would like thanks the rest of my collaborators, Claus Brabrand, Thomas Jensen, Alexandru Florin Iosif-Laz˘ar, Juha Erik Savolainen and Krzysztof Sierszecki, for their contributions to this project as well. I would like to ii Foreword thank the comittee, Riko Jacob, Xavier Rival and Tijs van der Storm, for evaluating this dissertation and providing valuable feedback. I would like to thank Thomas Jensen, again, and his group, Celtique, at INRIA Rennes, for the great welcome I received during my stay- abroad, and for the interesting discussions during the reading groups. I would like to thank Lars Birkedal and his group at Aarhus Univer- sity, and similarly Constantin Enea and Michaela Sighireanu and their group at Paris Diderot University (VII) for giving me the opportunity to present my work and discuss it during the two short visits I had during my PhD. I thank Karl Portratz for introducing our team to the industrial mod- ernization project at Danfoss, Rolf-Helge Pfeiffer for implementing the modernization transformation and Claus Brabrand for discussions on its design. I would like to thank Paul Klint and the Rascal team at CWI Amsterdam for answering my questions about the formal Rascal seman- tics in details and providing corrections on the corresponding report. I would like to thank Rasmus Ejlers Møgelberg for discussions about the correctness of the fixed-point domain abstraction theorem in the Rascal abstract interpreter. Formal Notation • For a binary relation r ⊆ A × A, r∗ is used to denote the reflexive- transitive closure, and similarly r+ for the transitive (non-reflexive) closure. • For a set A, }(A) is used to denote the power set. • For a function f 2 A ! B, dom f is used to represent all defined inputs i.e., fx j 9y.y = f (x)g, img f to represent all de- fined outputs fy j 9x.y = f (x)g, and graph f to represent the set f(x, f (x)) j x 2 dom f g. • For a function f 2 A ! B, f [a 7! b] is used to represent function updates, so that f [a 7! b](a) = b and f [a 7! b](a0) = f (a0) when a0 6= a. • Approximate (symbolic or abstract) semantic components, sets and operations are marked with a wide hat ba. • A sequence of es is represented by either marked by an underline e or explicitly summarized e1,..., en, the empty sequence is denoted by # and the concatenation