UNIVERSITY of CALIFORNIA, IRVINE Efficient Hosted Interpreter
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, IRVINE Efficient Hosted Interpreter for Dynamic Languages DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Engineering by Wei Zhang Dissertation Committee: Professor Michael Franz, Chair Professor Kwei-Jay Lin Professor Guoqing Xu 2015 Portion of Chapter 3 c 2013, 2014 ACM, doi 10.1145/2532642 Portion of Chapter 4 c2014 ACM, doi 10.1145/2660193.2660223 Portion of Chapter 5 c 2014 ACM, doi 10.1145/2660193.2660223 All other materials c 2015 Wei Zhang DEDICATION To my supporting parents and lovely wife. ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES viii ACKNOWLEDGMENTS ix CURRICULUM VITAE x ABSTRACT OF THE DISSERTATION xii 1 Introduction 1 2 Background 3 2.1 VirtualMachines ................................. 3 2.2 Interpreters . 4 2.3 Just-In-TimeCompilers . 5 2.4 Type Specialization for Dynamic Languages . 7 3 Fast Instruction Dispatch for Hosted Bytecode Interpreters 9 3.1 PerformanceAnatomyofBytecodeInterpreters . 10 3.1.1 Switch-based Dispatch . 11 3.2 Efficient Instruction Dispatch Techniques . 12 3.2.1 Direct Threading Dispatch . 12 3.2.2 SubroutineThreadingDispatch . 13 3.3 Just-In-Time Threaded Code for Hosted Bytecode Interpreters . 14 3.3.1 System Overview . 15 3.3.2 Threaded Code Generation . 16 3.4 Evaluation . 19 4 ZipPy: A Fast Python 3 for the JVM 23 4.1 Python on Tru✏e................................. 23 4.2 FastArithmeticsViaTypeSpecialization . 25 4.2.1 NumericTypes .............................. 26 4.2.2 ApplyingTypeSpecializations. 28 4.3 Efficient Data Representation for Composite Data Types . 29 4.3.1 UnboxedSequenceStorage. 30 iii 4.3.2 Profiling-based List Literal Specialization . 30 4.4 ControlFlowSpecializations . 33 4.4.1 For Loop Specializations . 33 4.4.2 List Comprehensions . 34 4.5 Discussion . 35 5 Generator Peeling 36 5.1 Motivation . 36 5.2 Generators in Python . 38 5.3 GeneratorsUsinganASTInterpreter . 40 5.3.1 ASTInterpretersvs. BytecodeInterpreters. 41 5.3.2 GeneratorASTs.............................. 42 5.4 Optimizing Generators with Peeling . 47 5.4.1 Peeling Generator Loops . 47 5.4.2 PeelingASTTransformations . 51 5.4.3 Polymorphism and Deoptimization . 52 5.4.4 FramesandControlFlowHandling . 53 5.4.5 ImplicitGeneratorLoops. 56 5.4.6 Multi-level Generator Peeling . 58 6 Optimizing Object Model and Calls 62 6.1 Object Model . 62 6.1.1 Python Object Data Representations . 62 6.1.2 Attribute Resolutions . 65 6.1.3 Modeling Custom Mutable Types . 67 6.1.4 InlineCachingforAttributeAccesses . 71 6.2 Call Site Modeling . 76 6.2.1 Call Site Structures in Python . 77 6.2.2 CallNodeSpecializations . 79 6.2.3 CallSiteDispatchandInlining . 85 6.3 FlexibleObjectStorages . 87 6.3.1 FlexibleObjectStorageGeneration . 88 6.3.2 Continuous Storage Class Generation . 91 6.3.3 A Generalization to the Object Model . 95 6.3.4 ZombieResurrection . 97 6.3.5 Discussion . 97 7 Evaluation 99 7.1 The Performance of ZipPy . 100 7.1.1 Experiment Setup . 100 7.1.2 Benchmark Selection . 101 7.1.3 Experiment Results . 102 7.1.4 PerformanceAnalysis. 103 7.2 TheE↵ectivenessofGeneratorPeeling . 105 7.2.1 Benchmark Selection . 105 iv 7.2.2 Experiment Results . 106 7.2.3 PerformanceAnalysis. 108 7.2.4 ZipPy vs. PyPy . 111 7.3 TheE↵ectivenessofFlexibleObjectStorages. 114 7.3.1 ObjectModelConfigurations . 114 7.3.2 ThePerformanceofFlexibleObjectStorages. 115 7.3.3 The Space EfficiencyofFlexibleObjectStorages. 118 7.3.4 Discussion . 120 8RelatedWork 122 8.1 HostedInterpretersforDynamicLanguages . 123 8.2 Tru✏e Languages . 125 8.3 Generators and Coroutines . 126 9 Conclusions 129 Bibliography 131 v LIST OF FIGURES Page 3.1 Interpretation costs of bytecode interpreters . 11 3.2 switch-based dispatch . 12 3.3 direct threading dispatch . 13 3.4 subroutine threading dispatch . 14 3.5 JythononModularVM ............................. 15 3.6 Threaded code generation . 16 3.7 Annotatedi-op .................................. 17 3.8 I-opwithnextdispatch.............................. 17 3.9 Direct threading example . 19 3.10 Jython’s direct threaded interpreter vs. switch-based . 21 3.11 Jython’s direct threaded interpreter vs. class file compiler . 22 4.1 Python on Tru✏e................................. 24 4.2 NumerictypesinZipPy ............................. 26 4.3 Implementation of NotNode in ZipPy . 27 4.4 Derivatives of NotNode in ZipPy . 28 4.5 Sequence storage types in ZipPy . 30 4.6 List construction loop . 31 4.7 List literal specialization . 32 4.8 For loop specialization for range iterators . 33 4.9 List comprehensions . 34 5.1 AsimplegeneratorfunctioninPython . 38 5.2 AsimplegeneratorexpressioninPython . 39 5.3 Idiomatic uses of generators . 40 5.4 Two di↵erent WhileNode versions ........................ 41 5.5 TranslationtogeneratorAST . 43 5.6 Translation of a yield expression . 46 5.7 Program execution order of a generator loop . 47 5.8 Peeling transformation . 49 5.9 Transformed generator loop . 49 5.10 PeelingASTtransformation . 50 5.11 Handlingofpolymorphicgeneratorloop . 52 5.12 The caller and generator frame objects of the Fibonacci example . 53 5.13 Complexcontrolflowhandling. 55 vi 5.14 Implicit generator loop transformation . 57 5.15 Multi-level generator peeling . 59 6.1 Three data representations for Python objects . 64 6.2 Attribute resolution for di↵erent data representations . 66 6.3 The implementation of PythonObject ..................... 68 6.4 Mutable object layout . 69 6.5 Attribute access dispatch chain . 72 6.6 Thetransformationofagetattributedispatchnode . 74 6.7 Two types of calls in Python . 77 6.8 The structure of a PythonCallNode ....................... 78 6.9 Call node specializations for a simple call site . 80 6.10 Attribute call sites with di↵erent primary object representations . 82 6.11 Call node specializations for an attribute call site . 83 6.12 Operator overloading by overwriting special method . 84 6.13 AddNode specialization for special method overwriting . 84 6.14 AddNode specialized for special method dispatch . 85 6.15 Call dispatch chain . 86 6.16 Flexibleobjectstorageexample . 88 6.17 Python object layout change example . 90 6.18 Constructor call site transformation . 92 6.19 Continuous storage class generations and object layout changes of an example Python class . 94 6.20 A Python constructor that exposes a reference to self ............ 96 7.1 Detailed speedups of di↵erent Python implementations normalized to CPython 3.4.0 . 109 7.2 Generator optimization in PyPy . 112 7.3 Detailed speedups of di↵erent object model configurations normalized to fixed object storage of size 5 . 116 7.4 The memory overheads of fixed object storage of size 1, 3 and 5 relative to flexible storage allocation with continuous generation . 118 7.5 The slowdowns of fixed object storage of size 1, 3 and 5 relative to flexible storage allocation with continuous generation . 119 vii LIST.