Tail Call Elimination in the Opensmalltalk Virtual Machine

TAIL CALL ELIMINATION IN THE OPENSMALLTALK VIRTUAL MACHINE by Matthew Ralston Bachelor of Computer Science (Honours) University of Windsor, 2004 A thesis presented to Ryerson University in partial fulfillment of the requirements for the degree of Master of Science in the program of Computer Science Toronto, Ontario, Canada, 2019 © Matthew Ralston 2019 Author’s Declaration I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize Ryerson University to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. I understand that my thesis may be made electronically available to the public for the purpose of scholarly research only. ii Abstract Tail Call Elimination in the OpenSmalltalk Virtual Machine Matthew Ralston Master of Science, Computer Science Ryerson University, 2019 Tail call elimination is used by languages and compilers to optimize the activation of methods in tail position. While this optimization has been the source of much research, it has not previously been implemented in the OpenSmalltalk virtual machine - the open-source Smalltalk virtual machine used by Smalltalk environments such as Pharo and Squeak. There are many approaches described in the literature to implement tail call elimination, such as removing stack frames on method activation instead of method return. Two implementations of tail call elimination using a stack frame removal approach are presented for Opensmalltalk VM. One implementation is presented for the Inter- preter and one for the Cog JIT compiler. These implementations are tested with both ideal and real world scenarios and show improvements in execution time and memory usage. iii Table of Contents Abstract iii List of Tables viii List of Figures ix 1 Introduction 1 1.1 Problem Background . .1 1.1.1 Introduction to Smalltalk Syntax . .1 1.1.2 Definition of a Tail Call . .3 1.1.3 Tail Recursion: A Special Case . .8 1.1.4 Rationale for Tail Call Elimination . .9 1.2 Objectives and Proposed Methodology . .9 1.3 Contributions . 10 1.4 Dissertation Outline . 10 2 Related Work 11 2.1 The OpenSmalltalk Virtual Machine . 11 2.1.1 Early Development . 11 2.1.2 The Cog VM . 13 2.1.3 Other Work in OpenSmalltalk . 17 iv 2.2 Tail Calls in Different Environments . 19 2.2.1 Scheme . 19 2.2.2 Javascript . 19 2.2.3 Java Virtual Machine . 19 2.2.4 Common Language Runtime . 20 2.2.5 Python . 20 2.2.6 Other . 20 2.3 Research in Tail Call Elimination . 21 2.3.1 Continuation Passing Style and Trampolines . 25 3 Experimental Design 27 3.1 OpenSmalltalk Architecture . 27 3.2 Stack Interpreter . 28 3.2.1 Execution Flow in the Stack Interpreter . 28 3.2.2 Sending on the Stack Level . 33 3.2.3 Returning on the Stack Level . 36 3.2.4 Applying Tail Call Elimination . 38 3.2.5 Identifying a Tail Call . 40 3.3 Cog JIT Compiler . 41 3.3.1 Execution Flow in the Cog VM . 42 3.3.2 JIT Compiling a Method . 46 3.3.3 Tail Call Elimination in the JIT . 57 3.3.4 Garbage Collection and Method Compaction . 63 3.3.5 Debugging . 64 3.4 Designing Tests for Tail Call Elimination . 64 3.5 Test Runner . 65 3.5.1 Execution Time . 65 3.5.2 Memory . 65 v 3.6 Tail Call Recursion . 66 3.6.1 Factorial . 66 3.6.2 Fibonacci Sequence . 69 3.6.3 WhileTrue Recursive Loop . 71 3.7 Real World Scenarios . 73 3.7.1 Compiler . 73 3.7.2 Browse Test . 75 4 Results 78 4.1 Tail Call Statistics . 79 4.1.1 Static Tail Calls . 79 4.1.2 Dynamic Tail Calls . 80 4.2 Tail Recursion Tests . 81 4.2.1 Factorial 500 Tests . 81 4.2.2 Factorial 5000 Tests . 83 4.2.3 Fibonacci 1000 Tests . 84 4.2.4 Fibonacci 10000 Tests . 86 4.2.5 WhileTrue 1000 Tests . 86 4.2.6 WhileTrue 10000 Tests . 88 4.3 Real World Performance Test . 89 4.3.1 Compile All Test . 89 4.3.2 Windows Test . 91 4.4 Analysis . 93 4.4.1 Tail Call Recursion Analysis . 93 4.4.2 Real World Analysis . 94 4.4.3 Final Analysis . 95 vi 5 Conclusions 97 5.1 Contributions . 97 5.2 Future Work . 98 Bibliography 99 Glossary 104 vii List of Tables 3.1 Execution variables in StackInterpreter . 30 3.2 Send and Return Bytecodes . 30 3.3 IA-32 Registers and Usage . 49 4.1 Execution Tests . 79 4.2 Memory Tests . 79 4.3 Implementations to Test . 80 4.4 Static Tail Call Counts . 80 4.5 Dynamic Tail Call Counts . 81 viii List of Figures 1.1 Stack before sending at: ...........................5 1.2 Stack before sending linkAt: .........................6 1.3 Stack after sending linkAt: and before sending value ...........6 1.4 Stack after sending value ...........................7 1.5 Stack after returning at: ...........................7 2.1 Smalltalk-80 Context Object . 12 3.1 OpenSmalltalk Architecture . 27 3.2 Method Execution in StackInterpreter . 29 3.3 Spur 32 bit Method Header . 31 3.4 Sending methodB: from methodA: ...................... 34 3.5 Activating methodB: after send . 35 3.6 After activation of methodC: ......................... 36 3.7 Prior to return in methodC: .......................... 37 3.8 Stack of methodB: after returning from methodC: .............. 37 3.9 Stack of methodA: after returning from methodB: .............. 38 3.10 Tail call activation of methodC: ........................ 39 3.11 Method Execution in CogVM . 44 3.12 Stack before returning after Send 1 . 49 3.13 Spur Object Header Format . 51 3.14 Spur Object Pointer Formats . 51 3.15 Stack before returning after Send 2 . 55 4.1 Factorial 500 - Execution Time . 82 4.2 Factorial 500 - Memory Usage . 83 4.3 Factorial 5000 - Execution Time . 83 4.4 Factorial 5000 - Memory Usage . 84 4.5 Fibonacci 1000 - Execution Time . 85 4.6 Fibonacci 1000 - Memory Usage . 85 4.7 Fibonacci 10000 - Execution Time . 86 4.8 Fibonacci 10000 - Memory Usage . 87 4.9 WhileTrue 1000 - Execution Time . 87 4.10 WhileTrue 1000 - Memory Usage . 88 4.11 WhileTrue 10000 - Execution Time . 89 ix 4.12 WhileTrue 10000 - Memory Usage . 89 4.13 Compile-All - Execution Time . 90 4.14 Compile-All - Memory Usage . 91 4.15 Windows - Execution Time (8 mb JIT space for Cog) . 92 4.16 Windows - Memory Usage . 92 x Chapter 1 Introduction My thesis is that tail call elimination is a useful optimization for an object-oriented language such as Smalltalk. This is demonstrated in the form of two implementations using the open-source Smalltalk virtual machine OpenSmalltalk. One implementation is done in the Stack Interpreter, which is the bytecode interpreter component of the VM. The second implementation uses the just-in-time compiler (occasionally known as the Cog JIT compiler, or Cog VM) to implement tail call elimination in the generated machine code. 1.1 Problem Background Before exploring the details of these implementations, it is worthwhile to fully explain what tail calls are, and why elimination of tail calls would be considered an.

Load more