Record Types in Scala: Design and Evaluation

DEGREE PROJECT IN THE FIELD OF TECHNOLOGY ENGINEERING PHYSICS AND THE MAIN FIELD OF STUDY COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Record Types in Scala: Design and Evaluation OLOF KARLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Record Types in Scala: Design and Evaluation OLOF KARLSSON Master in Computer Science Date: June 28, 2017 Supervisor: Philipp Haller Examiner: Mads Dam Swedish title: Record-typer för Scala: Design och utvärdering School of Computer Science and Communication i Abstract A record type is a data type consisting of a collection of named fields that combines the flexibility of associative arrays in some dynamically typed languages with the safety guarantees and possible runtime performance of static typing. The structural typing of records is especially suitable for handling semi-structured data such as JSON and XML making efficient records an attractive choice for high-performance computing and large- scale data analytics. It has proven difficult to implement record types in Scala however. Existing libraries suffer from either severe compile-time penalties, large runtime over- head, or other restrictions in usability such as poor IDE integration and hard-to-compre- hend error-messages. This thesis provides a systematic description and comparison of both existing and possible new approaches to records in Scala and Dotty, a new compiler for the Scala 3 language. A novel benchmarking suite is presented, built on top of the Java Microbench- mark Harness (JMH), for measuring runtime and compile-time performance of records running on the Java Virtual Machine and currently supporting Scala, Dotty, Java and Whiteoak. To achieve field access times comparable to nominally typed classes, it is conjectured that width subtyping has to be restricted to explicit coercion and a compilation scheme for such record types is sketched. For unordered record types with width and depth subtyping however, hashmap-based approaches are found to have the most attractive runtime performance characteristics. In particular, Dotty provides native support for such an implementation using structural refinement types that might strike a good balance between flexibility and runtime performance for records in the future. ii Sammanfattning En record-typ är en datatyp som består av en en uppsättning namngivna fält som kom- binerar flexibiliteten hos associativa arrayer i vissa dynamiskt typade programmerings- språk med säkerhetsgarantierna och den potentiella exekveringshastigheten som fås av statisk typning. Records strukturella typning är särskilt väl lämpad för att hantera semi- strukturerad data såsom JSON och XML vilket gör beräkningseffektiva records ett attrak- tivt val för högprestandaberäkningar och storskalig dataanalys. Att implementera records i programmeringsspråket Scala har dock visat sig svårt. Existerande bibliotek lider an- tingen av långa kompileringstider, långsam exekveringshastighet, eller andra problem med användbarheten såsom dålig integration med olika utvecklingsmiljöer och svårför- stådda felmeddelanden. Den här uppsatsen ger en systematisk beskrivning och jämförelse av både existerande och nya lösningar för records i Scala och Dotty, en ny kompilator för Scala 3. Ett nytt benchmarkingverktyg för att mäta exekveringshastigheten och kompileringstiden av records som körs på den virtuella Java maskinen presenteras. Benchmarkingverktyget är byggt på Java Microbenchmark Harness (JMH) och stöder i nuläget Scala, Dotty, Java och Whiteoak. För att åstadkomma körtider som är jämförbara med nominellt typade klasser an- tas att subtypning på bredden måste begränsas till explicita konverteringsanrop och en skiss till en kompileringsstrategi för sådana records presenteras. För record-typer med ic- ke ordnade fält och subtypning på bredden och djupet visar sig istället records baserade på hashtabeller ha de mest attraktiva exekveringstiderna. Dotty tillhandahåller stöd för en sådan implementation med strukturella förfiningstyper som kan komma att träffa en bra balans mellan flexibilitet och exekveringshastighet för records i framtiden. iii Dedication To Dag, for providing shelter in times of need and always reminding me of what engineering is all about. I would also like to thank my friends and family for invaluable support, and A3J - thanks for all the coffee! Contents Contents v 1 Introduction 1 1.1 Problem Description and Objective . .1 1.2 Research Question and Report Structure . .2 1.3 Contribution . .2 1.4 Societal and Ethical Aspects . .2 2 Background 4 2.1 Definition of Record and Record Type . .4 2.2 Type Systems for Polymorphic Records . .5 2.2.1 Structural Subtyping . .5 2.2.2 Bounded Quantification . .7 2.2.3 Other Forms of Parametric Polymorphism . .8 2.3 The Scala Language . .8 3 Method 13 3.1 Qualitative Comparison . 14 3.2 Quantitative Comparison . 14 3.2.1 Wreckage Benchmarking Suite Generator Library . 14 3.2.2 Runtime Benchmarks . 17 3.2.3 Compile-Time Benchmarks . 18 3.2.4 Statistical treatment . 18 3.2.4.1 Runtime Benchmarks . 19 3.2.4.2 Compile-Time benchmarks . 20 4 Description of Existing Approaches 21 4.1 Scala’s Structural Refinement Types . 21 4.1.1 Basic Features . 22 4.1.2 Implementation . 24 4.2 scala-records v0.3 . 24 4.2.1 Basic Features . 25 4.2.2 Lack of Explicit Types . 26 4.2.3 Other Features . 28 4.3 scala-records v0.4 . 28 4.3.1 Basic Features . 29 4.3.2 Explicit Types . 30 v vi CONTENTS 4.3.3 Other Features . 31 4.4 Compossible . 31 4.4.1 Creation through Extension through Concatenation . 31 4.4.2 Extension and (Unchecked) Update . 33 4.4.3 Access and Select . 33 4.4.4 Explicit Types . 34 4.4.5 Polymorphism . 35 4.4.6 Other Features . 36 4.5 Shapeless 2.3.2 . 37 4.5.1 HList Records . 37 4.5.2 Create . 38 4.5.3 Field Access . 39 4.5.4 Explicit Types . 41 4.5.5 Subtyping . 42 4.5.6 Parametric Polymorphism . 42 4.5.7 Other Type Classes . 43 4.5.8 HCons Extension . 46 4.6 Dotty’s New Structural Refinement Types . 47 4.6.1 Implementation . 47 4.6.2 Basic Features . 48 4.6.3 Polymorphism . 50 4.6.4 Extension . 51 4.6.5 Update . 54 5 Comparison of Existing Approaches 56 5.1 Qualitative Comparison . 56 5.2 Quantitative Evaluation using Benchmark . 58 5.2.1 Runtime performance . 58 5.2.1.1 Creation Time against Record Size . 58 5.2.1.2 Access Time against Field Index . 58 5.2.1.3 Access Time against Record Size . 59 5.2.1.4 Access Time against Degree of Polymorphism . 60 5.2.2 Compile-Time Performance . 62 5.2.2.1 Create . 62 5.2.2.2 Create and Access All Fields . 62 6 Analysis and Possible new Approaches 65 6.1 Strengths and Weaknesses of Existing Approaches . 65 6.2 Design Space for Records . 66 6.3 Record Type Representations . 67 6.4 Compilation Schemes for Subtyped Records . 68 6.4.1 P −W −D±: No Permutation, No Width Subtyping . 68 6.4.2 P −W +D±: Width Subtyping for Ordered Fields . 69 6.4.3 P +W −D±: Unordered Records without Width Subtyping . 69 6.4.4 P +W +D±: Unordered Records with Width Subtyping . 69 6.4.4.1 Option 1: Searching . 70 6.4.4.2 Option 2: Information Passing . 70 CONTENTS vii 6.4.4.3 Option 3: Use the JVM . 72 6.4.5 Summary . 73 6.5 Benchmarks of Possible Data Structures . 75 6.5.1 Access Time against Record Size . 75 6.5.2 Access Time against Degree of Polymorphism . 75 7 Discussion and Future Work 78 7.1 Subtyping and Field Access . 78 7.2 Type-level Operations . 79 7.3 Not One but Three Record Types to Rule Them All? . 79 7.4 Future work . 80 8 Related Work 81 8.1 Theoretical Foundations . 81 8.2 Structural Types on the JVM . 81 9 Conclusions 83 Bibliography 85 A Whiteoak 2.1 Benchmarks 89 Chapter 1 Introduction Software is getting more and more complex and programming languages need to con- stantly evolve to help programmers cut through this complexity. In a perfect world it is effortless to develop systems in a short amount of time that are easy to understand, maintain and augment while at the same time being robust with few bugs, high runtime performance and low operating cost. In the real world however, there do not seem to be a silver bullet and these factors have to be weighted against each other. Different programming paradigms tend to focus more on some aspects at the expense of others; Scripting languages emphasize rapid development and syntactic simplicity while com- piled languages tend to focus more on robustness and runtime efficiency. Scala is a statically typed language with lightweight syntax that is designed to pro- vide a middle-ground between these two extremes. It is a multi-paradigm language com- bining the virtues of object-oriented and functional programming, and an advanced type system is combined with local type inference to lessen the syntactic burden [1]. Further- more, Scala has its theoretical foundation in the vObj calculus [2], recently replaced by DOT [3], which combines nominally typed classes and objects with structural typing. It is therefore natural to consider the possibility of extending the Scala language with struc- turally typed records. A record-type is a collection of named fields that combines the flexibility of associative arrays in some dynamically-typed languages with the safety guarantees of static typing. Structural typing opens up several possibilities for record polymorphism, including width and depth subtyping, making records especially suitable for handling complex and semi-structured heterogeneous data such as JSON and XML. Together with the safety benefits and potential run-time performance of static typing, this makes records an attractive choice for high performance computing and large-scale data analytics and a po- tentially valuable addition to the Scala language.

Load more