Augmenting Type Inference with Lightweight Heuristics

Augmenting Type Inference with Lightweight Heuristics Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultät der Universität Bern vorgelegt von | downloaded: 29.9.2021 Nevena Lazarevic´ von Serbien Leiter der Arbeit: Prof. Dr. O. Nierstrasz Institut für Informatik Universität Bern Von der Philosophisch-naturwissenschaftlichen Fakultät angenommen. Der Dekan: Bern, 21.06.2017. Prof. Dr. Gilberto Colangelo https://doi.org/10.24442/boristheses.846 source: This dissertation can be downloaded from scg.unibe.ch. Copyright ©2017 by Nevena Lazarevic´ This work is licensed under the terms of the Creative Commons Attribution – Noncommercial-No Derivative Works 2.5 Switzerland license. The license is available at http://creativecommons.org/licenses/by-sa/2.5/ ch/ Attribution–ShareAlike To my parents who always support my every decision. To my husband who always understands me. Mojim roditeljima koji me uvek u svemu podržavaju. Mom suprugu koji uvek ima razumevanja za mene. Acknowledgements First of all, I would like to thank Oscar Nierstrasz, for giving me the opportunity to work as a part of the Software Composition Group. I am eternally thankful to him for guiding me through the PhD life, for encouraging the research I was doing, and for giving desperately needed advice whenever necessary. I really enjoyed being a member of SCG, and solving every one of the coffee puzzles! I am very grateful to Stéphane Ducasse, first and foremost for agreeing to be on my PhD committee, and for providing me with his comments on my work. I thank Gerhard Jäger for chairing the PhD committee. A big thank you to all of the guys from SCG, who shared their time with me. Thank you, Mircea, for making every day shiny, joyous and colourful, even when it poured like hell. :) Thank you, Claudio, for finding time to read my work numerous times, and especially for sharing the office with me for, hm, eight months. :) Thank you, Haidar, for being a good friend, giving me tips for my travels and singing with me. Thank you, Boris, for all of our discussions in Serbian. Thank you, Andrei, for always asking how my research is going. :) Thank you, Andrea, for always smiling and being positive. Thank you, Jan, for always having a calm word and a piece of advice. Thank you, Leonel, for your precious advice regarding my trips to Chile and Argentina. Thank you, Mohammad, for always forcing me to perform better, even when I did not believe it was possible. :) Thank you, Yuriy, for all the laugh during coffee breaks. Thank you, Manuel, for always being ready for a cup of coffee. I also thank Erwann for the advice he shared with me at the beginning of my PhD. And, thank you, Cédric and Oliver for being my officemates for a couple of months. I am very thankful to Iris for her huge help during my PhD journey, and all the nice chats we had at SCG. She made all the administrative stuff so easy! I also thank David Röthlisberger and Romain Robbes for allowing me to visit and work for a month in Chile. It was an incredible experience! i One thank you to Clément Béra, for his collaboration and help regarding some of the work presented in this thesis. I want to use this opportunity to thank my parents, who always believe in me and support me. I hope I made them proud. An enormous thank you to my husband for standing (by) me even when I was unbearable (who hasn’t been like that at least once during PhD?!), and for always giving me courage and support. I want to thank all of my friends from Caˇ cakˇ and Belgrade for always being there for me, and for helping home feel like home: Neda Niketic,´ Kaca´ Petrovic,´ Ana Gutic,´ Isidora Cpajak,ˇ Jelena Petronijevic,´ Jovana Pavlovic,´ Mladen Pavlovic,´ Jelena Ilic,´ Nina Radojiciˇ c,´ Ivana Milovic,´ Stefan Miškovic´ and Ðorde¯ Bojovic.´ Big thank you to all of my friends in Zürich for making Switzerland feel like home: Anja and Filip (and for all the board-games-playing weekends we had), Džimi and Alica, Markovic´ and Irena. I am very thankful to my friends in Paris, Igor and Alina, for making my every trip to Paris enjoyable and for making me wish to be a student there again. I thank all the people who directly, or indirectly, contributed to this thesis, and made it possible. Let the journey begin! Audaces fortuna iuvat. April 03, 2017 Nevena Lazarević ii Abstract Static type information facilitates program comprehension and analysis. Yet, such information is absent in dynamically-typed languages, and that increases the time needed for software maintenance. Type inference algorithms may provide type information to developers, but in order to be fast and assist in development phase, they sacrifice the precision for speed. One of the biggest obstacles for their precision is polymorphism presence. In this thesis we first analyse the prevalence of polymorphism in object- oriented software, to assess the criticality it imposes on simple type inference. We find that polymorphism is omnipresent in object-oriented code, and that static analysis in dynamically-typed languages is also hampered by the usage of cross-hierarchy polymorphism, i.e., duck typing. As this big obstacle for static code analysis cannot be bypassed, we propose the need for lightweight heuristics to tackle the problem of imprecision of simple type inference algorithms. Four lightweight heuristics are employed to improve the performance of two simple and fast type inference approaches. These heuristics are founded on the source code and run-time information that are easy to collect without interrupting the workflow of a developer. The heuristics are evaluated and compared with the underlying algorithms based on their inference time and precision. All of them show a significant improvement when compared to the basic algorithm. They in- troduce a negligible overhead on the inference time, thus we deem them usable during regular coding tasks. iii iv Contents 1 Introduction1 1.1 Thesis statement.......................5 1.2 Contributions........................6 1.2.1 A large-scale software study on the prevalence of polymorphism in statically and dynamically-typed languages......................6 1.2.2 Lightweight heuristics for improving simple type inference algorithms................7 1.3 Outline........................... 10 2 State of the art 11 2.1 Gradual typing....................... 11 2.2 Optional typing....................... 13 2.3 Static type inference.................... 14 2.4 Dynamic type inference................... 17 2.4.1 Code instrumentation................ 17 2.4.2 Inline caches.................... 18 2.5 Other techniques...................... 19 3 Study of polymorphism prevalence 21 3.1 Introduction......................... 21 3.2 Related Work........................ 26 3.3 Terminology......................... 27 3.4 Experimental Setup..................... 32 3.4.1 Data processing................... 33 3.4.2 Data analysis.................... 33 3.5 Experimental Results.................... 34 3.5.1 Implementing polymorphism............ 34 3.5.2 Using polymorphism................ 35 v 3.5.3 Cardinality of polymorphic message sends.... 37 3.5.4 Implementing duck typing............. 39 3.5.5 Using duck typing................. 40 3.6 Threats to Validity...................... 41 3.7 Discussion.......................... 43 3.8 Conclusion......................... 45 4 Static class usage frequency heuristics 47 4.1 Introduction......................... 47 4.2 Overview.......................... 49 4.3 Heuristics and Approaches................. 53 4.3.1 Terminology.................... 53 4.3.2 Heuristics...................... 57 4.3.3 Assigned types vs. selector types.......... 60 4.3.4 Approaches..................... 61 4.4 Evaluation.......................... 62 4.4.1 Class instantiation heuristic............ 64 4.4.2 Name occurrence heuristic............. 70 4.4.3 Comparison with EATI............... 71 4.5 Discussion and threats to validity.............. 74 4.6 Conclusion and future work................. 76 5 Mining inline caches for class usage 77 5.1 Introduction......................... 77 5.2 Gathering of dynamic type information.......... 79 5.2.1 Execution of message sends............ 79 5.2.2 Run-time type information gatherer built..... 80 5.3 Type inference algorithm.................. 81 5.3.1 Dynamic information................ 81 5.4 Evaluation.......................... 82 5.4.1 Inline caching type gathering............ 82 5.4.2 Projects used for evaluation............ 83 5.4.3 Overall results — Hierarchy-Based approach... 83 5.4.4 Difference between the basic algorithm and ICTI. 85 5.4.5 Not guessed variables............... 86 5.4.6 Position of the correct type............. 89 5.4.7 Overall results — Class-Based approach...... 90 5.5 Threats to validity...................... 91 5.6 Conclusion......................... 92 vi 6 Exploiting Type Hints in Method Argument Names 93 6.1 Introduction......................... 93 6.2 Motivation.......................... 95 6.3 Algorithm.......................... 98 6.3.1 The Cartesian Product Algorithm......... 98 6.3.2 Type hints from method argument names..... 102 6.3.3 Upgraded CPA — CPA∗ .............. 103 6.4 Implementation....................... 103 6.4.1 CPA......................... 103 6.4.2 Type Hints in Smalltalk.............. 104 6.5 Evaluation.......................... 106 6.5.1 Argument names without inferred type hint.... 110 6.6 Threats to validity...................... 112 6.7 Conclusion......................... 113 7 Conclusion 115 7.1 Contributions........................ 116 7.1.1

Load more