When and How Java Developers Give up Static Type Safety

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library When and How Java Developers Give Up Static Type Safety Doctoral Dissertation submitted to the Faculty of Informatics of the Università della Svizzera Italiana in partial fulfillment of the requirements for the degree of Doctor of Philosophy presented by Luis Mastrangelo under the supervision of Prof. Matthias Hauswirth and Prof. Nathaniel Nystrom June 2019 Git Commit d3fcf49 Dissertation Committee Prof. Antonio Carzaniga Università della Svizzera Italiana, Switzerland Prof. Gabriele Bavota Università della Svizzera Italiana, Switzerland Prof. Jan Vitek Northeastern University & Czech Technical University Prof. Hridesh Rajan Iowa State University Dissertation accepted on 13 June 2019 Research Advisor Research Advisor Prof. Matthias Hauswirth Prof. Nathaniel Nystrom Ph.D. Program Director Ph.D. Program Director Prof. Walter Binder Prof. Olaf Schenk i I certify that except where due acknowledgement has been given, the work presented in this thesis is that of the author alone; the work has not been submitted previously, in whole or in part, to qualify for any other academic award; and the content of the thesis is the result of work which has been carried out since the official commencement date of the approved research program. Luis Mastrangelo Lugano, 13 June 2019 ii To Ankur Singhal, In Memoriam iii iv “It is not only the violin that shapes the violinist, we are all shaped by the tools we train ourselves to use, and in this respect programming languages have a devious influence: they shape our thinking habits.” – Edsger W. Dijkstra, 2001, To the members of the Budget Councila ahttp://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html v vi Abstract The main goal of a static type system is to prevent certain kinds of errors from happening at run time. A type system is formulated as a set of constraints that gives any expression or term in a program a well-defined type. Besides detecting these kinds of errors, a static type system can be an invaluable maintenance tool, can be useful for documentation purposes, and can aid in generating more efficient machine code. However, there are situations when the developer has more information about the program that is too complex to explain in terms of typing constraints. To that end, programming languages often provide mechanisms that make the typing constraints less strict to permit more programs to be valid, at the expense of causing more errors at run time. These mechanisms are essentially two: Unsafe Intrinsics and Reflective Capabilities. We want to understand how and when developers give up these static constraints. This knowledge can be useful as: a) a recommendation for current and future language designers to make informed decisions, b) a reference for tool builders, e.g., by providing more precise or new refac- toring analyses, c) a guide for researchers to test new language features, or to carry out controlled programming experiments, and d) a guide for developers for better practices. In this dissertation, we focus on the Unsafe API and cast operator—a subset of unsafe intrinsics and reflective capabilities respectively—in Java. We report two empirical studies to understand how these mechanisms— Unsafe API and cast operator—are used by Java developers when the static type system becomes too strict. We have devised usage patterns for both the Unsafe API and cast operator. Usage patterns are recurrent programming idioms to solve a specific issue. We believe that having usage patterns can help us to better categorize use cases and thus understand how those features are used. vii viii Acknowledgements First of all, I want to thank both my advisors, Prof. Matthias Hauswirth and Prof. Nate Nystrom. Their expertise helped me to become a better researcher. They taught me how to ask the proper questions when facing a problem, that not only will help me in my professional career, but in everyday life. In our Unsafe study, Luca Ponzanellli, Andrea Mocci and Prof. Michele Lanza collaborated with us to correlate Unsafe comments in the Questions and Answers database, Stack Overflow, mentioned in Chapter 3. Thanks to them. Special thanks to Max Schäfer from Semmle. He executed our QL queries in the entire lgtm database used in Chapter 4. Without his help, this thesis would not have been possible. Prof. Antonio Carzaniga and Prof. Gabriele Bavota made valuable comments during the Ph.D. proposal. They contributed to find an appropriate sample size in the cast study. My group colleagues, Mohammad, Dmitri, Andrea, and Igor. With whom we had uncountable talks during my Ph.D. Special mention for Amanj and Nosheen, I learned a lot in our coffee breaks. Thank you guys. On the personal side, Vanesa, Cinzia, Mariano, Esteban, Dino, Olaf, Di- mos, Jeff, Marisa, Rali, Gabi, Tanya, Dilly, Marie-line, Lea, Celine, Stefania, Carmen and Grace who make me feel I found a family in Lugano. My flat mate and dearest friend, Ankur Singhal, I wish you were here to see this moment. The Singhal family, Anil, Anita, and Ankita. I appreciate your support in the toughest moments. And last but not least, I want to thank my family, for their unconditional support, and for always believing in me. ix x Contents Contents xi List of Figures xv List of Tables xvii List of Listings xvii 1 Introduction 1 1.1 Beyond Static Type Checking . .2 1.2 Research Question . .4 1.3 Thesis Outline . .6 2 Literature Review 7 2.1 Benchmarks and Corpora . .8 2.2 Tools for Mining Software Repositories . .9 2.3 Empirical Studies of Large Codebases . 11 2.3.1 Unsafe Intrinsics in Java . 13 2.3.2 Reflective Capabilities . 14 2.4 Conclusions . 17 3 Empirical Study on the Unsafe API 19 3.1 The Risks of Compromising Safety . 21 3.2 Is Unsafe Used? . 24 3.2.1 Which Features of Unsafe Are Actually Used? . 25 3.2.2 Beyond Maven Central . 29 3.3 Finding sun.misc.Unsafe Usage Patterns . 30 3.4 Usage Patterns of sun.misc.Unsafe ............... 32 3.4.1 Allocate an Object without Invoking a Constructor . 35 3.4.2 Process Byte Arrays in Block . 35 xi xii Contents 3.4.3 Atomic Operations . 36 3.4.4 Strongly Consistent Shared Variables . 36 3.4.5 Park/Unpark Threads . 37 3.4.6 Update Final Fields . 37 3.4.7 Non-Lexically-Scoped Monitors . 38 3.4.8 Serialization/Deserialization . 38 3.4.9 Foreign Data Access and Object Marshaling . 39 3.4.10 Throw Checked Exceptions without Being Declared 39 3.4.11 Get the Size of an Object or an Array . 40 3.4.12 Large Arrays and Off-Heap Data Structures . 40 3.4.13 Get Memory Page Size . 40 3.4.14 Load Class without Security Checks . 41 3.5 What is the Unsafe API Used for? . 41 3.6 Conclusions . 45 4 Empirical Study on the Cast Operator 47 4.1 Casts in Java . 48 4.2 Issues Developers have using Cast Operations . 52 4.3 Finding Cast Usage Patterns . 54 4.3.1 Corpus Analysis . 54 4.3.2 Is the Cast Operator used? . 55 4.3.3 Manual Detection of Cast Patterns . 56 4.3.4 Methodology . 56 4.4 Overview of the Sampled Casts . 58 4.5 Cast Usage Patterns . 60 4.5.1 Typecase .......................... 63 4.5.2 Equals ........................... 73 4.5.3 ParserStack ....................... 77 4.5.4 Stash ............................ 78 4.5.5 Factory .......................... 84 4.5.6 KnownReturnType ................... 86 4.5.7 Deserialization ..................... 88 4.5.8 NewDynamicInstance ................. 89 4.5.9 Composite ......................... 92 4.5.10 Family ........................... 93 4.5.11 CovariantReturnType ................. 95 4.5.12 FluentAPI . 97 4.5.13 UseRawType ....................... 99 4.5.14 RemoveWildcard .................... 101 xiii Contents 4.5.15 CovariantGeneric ................... 102 4.5.16 SelectTypeArgument .................. 103 4.5.17 GenericArray ...................... 105 4.5.18 UnoccupiedTypeParameter .............. 106 4.5.19 SelectOverload ..................... 108 4.5.20 SoleSubclassImplementation ............ 110 4.5.21 ImplicitIntersectionType ............... 111 4.5.22 ReflectiveAccessibility ................ 113 4.5.23 AccessSuperclassField ................. 114 4.5.24 Redundant ........................ 116 4.5.25 VariableSupertype ................... 118 4.5.26 ObjectAsArray ..................... 121 4.6 Discussion . 123 4.7 Conclusions . 126 5 Conclusions 127 5.1 Research Questions . 127 5.2 Java’s Evolution . 128 5.3 Limitations . 129 5.4 Future Work . 129 5.5 Lessons Learned . 131 5.6 Artifacts . 131 A Automatic Detection of Patterns using QL 133 A.1 Introduction to QL . 133 A.2 Additional QL Classes and Predicates . 135 B JNIF: Java Native Instrumentation 143 B.1 Introduction . 143 B.2 Related Work . 145 B.3 Using JNIF . 148 B.4 JNIF Design and Implementation . 151 B.5 Validation . 154 B.6 Performance Evaluation . 156 B.7 Limitations . 158 B.8 Conclusions . 159 Bibliography 163 xiv Contents Figures 2.1 Categorization of Unsafe usages from Sandoz’ survey . 15 3.1 sun.misc.Unsafe method usage on Maven Central ...... 26 3.2 sun.misc.Unsafe field usage on Maven Central ........ 27 3.3 com.lmax:disruptor call sites . 31 3.4 org.scala-lang:scala-library call sites . 32 3.5 Classes using off-heap large arrays . 33 4.1 Projects, by fraction of their methods containing casts. Bulk of data summarized by box plot, outliers shown as individual data points. 55 4.2 Cast distribution . 57 4.3 Cast Usage Pattern Occurrences . 62 4.4 Typecase Variant Occurrences . 63 4.5 Equals Variant Occurrences . 74 4.6 Stash Variant Occurrences . 79 B.1 Instrumentation time on DaCapo and Scala benchmarks . 161 xv xvi Figures Tables 1.1 Safe/Unsafe and Statically/Dynamically checked languages2 3.1 Patterns and their occurrences in the Maven Central repository 34 3.2 Patterns and their alternatives .

When and How Java Developers Give up Static Type Safety

Log4j-Users-Guide.Pdf

CDC Runtime Guide

Developer's Guide Sergey A

Safety Properties Liveness Properties

The Java Hotspot VM Under the Hood

Cyclone: a Type-Safe Dialect of C∗

Unravel Data Systems Version 4.5

LMAX Disruptor

High Performance with Distributed Caching

CSI Web Server for Linux

High Performance & Low Latency Complex Event Processor

Getting Started with Apache Avro