Increasing the Performance and Predictability of the Code Execution
Total Page:16
File Type:pdf, Size:1020Kb
— Dissertation — Increasing the Performance and Predictability of the Code Execution on an Embedded Java Platform — Ansätze zur Steigerung der Leistungsfähigkeit und Vorhersagbarkeit der Codeausführung auf einer eingebetteten Java-Plattform zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) vorgelegt an der Technischen Universität Dresden Fakultät Informatik eingereicht von Dipl.-Inf. Thomas B. Preußer geboren am 02.Mai 1978 in Frankfurt(Oder) verteidigt am 12. Oktober 2011 Betreuer: Prof. Dr.-Ing. habil. Rainer G. Spallek, TU Dresden Zweitgutachter: Prof. Dr.-Ing. habil. Andreas Koch, TU Darmstadt Fachreferent: Prof. Dr. rer. nat. habil. Uwe Aßmann, TU Dresden Selbständigkeitserklärung Hiermit erkläre ich, dass ich diese Dissertation selbständig, unter ausschließlicher Verwen- dung der angegebenen Hilfsmittel und Quellen angefertigt habe. Alle Zitate sind als solche gekennzeichnet. Dresden, 19.Oktober2011 Thomas B. Preußer Marken Die in dieser Arbeit genannten Marken sind Handelsmarken und Markennamen ihrer jeweili- gen Inhaber und deren Eigentum. Die Wiedergabe von Marken, Warenbezeichnungen u.ä. in diesem Dokument berechtigt auch ohne besondere Kennzeichnung nicht zur Annahme, dass diese frei von Schutzrechten sind und frei benutzt werden dürfen. Abstract This thesis explores the execution of object-oriented code on an embedded Java platform. It presents established and derives new approaches for the implementation of high-level object- oriented functionality and commonly expected system services. The goal of the developed techniques is the provision of the architectural base for an efficient and predictable code execution. The research vehicle of this thesis is the Java-programmed SHAP platform. It consists of its platform tool chain and the highly-customizable SHAP bytecode processor. SHAP offers a fully operational embedded CLDC environment, in which the proposed techniques have been implemented, verified, and evaluated. Two strands are followed to achieve the goal of this thesis. First of all, the sequential execution of bytecode is optimized through a joint effort of an optimizing offline linker and an on-chip application loader. Additionally, SHAP pioneers a reference coloring mechanism, which enables a constant-time interface method dispatch that need not be backed a large sparse dispatch table. Secondly, this thesis explores the implementation of essential system services within des- ignated concurrent hardware modules. This effort is necessary to decouple the computational progress of the user application from the interference induced by time-sharing software im- plementations of these services. The concrete contributions comprise a spill-free, on-chip stack; a predictable method cache; and a concurrent garbage collection. Each approached means is described and evaluated after the relevant state of the art has been reviewed. This review is not limited to preceding small embedded approaches but also includes techniques that have proven successful on larger-scale platforms. The other way around, the chances that these platforms may benefit from the techniques developed for SHAP are discussed. i Contents 1 Introduction 1 1.1 Motivation.................................... 1 1.2 ObjectiveandContributions . 3 1.3 TerminologyandNotation . 4 1.4 Structure..................................... 6 1.5 Acknowledgments ............................... 6 2 Embedded Java Processing: State of the Art 7 2.1 ASurveyofEmbeddedPlatforms . 7 2.2 EfficientBytecodeExecution . 12 2.2.1 GeneralAspects ............................ 12 2.2.2 BytecodeLinking............................ 13 2.2.3 Object-OrientedBytecodes. 14 2.3 HardwareAssistance .............................. 22 2.3.1 HardwareStack............................. 22 2.3.2 MethodCaching ............................ 23 2.3.3 ManagedObjectHeaps. 24 2.3.3.1 Motivation.......................... 24 2.3.3.2 StandardTechniques. 25 2.3.3.3 Hardware-AssistedHeapManagement . 28 3 The SHAP Platform 31 3.1 Introduction................................... 31 3.2 ApplicationEmployment . 32 3.3 TargetPlatforms................................. 33 3.4 Microarchitecture ............................... 34 3.4.1 Overview ................................ 34 3.4.2 InstructionSetArchitecture . 35 3.4.3 DataRepresentation . .. .. .. .. .. .. .. 38 3.4.4 Object-OrientedBytecodes. 39 3.4.5 BytecodeExecution . .. .. .. .. .. .. .. 41 3.5 RuntimeLibrary................................. 42 3.6 PlatformFootprint ............................... 43 3.7 Contributors................................... 44 ii 4 Optimization Measures 45 4.1 OptimizedSequentialBytecodeExecution . 45 4.1.1 BytecodeOptimization. 45 4.1.1.1 Overview .......................... 45 4.1.1.2 LocalVariableRelocation . 45 4.1.1.3 RedundantPopElimination . 50 4.1.1.4 JumpMerging ........................ 50 4.1.1.5 ClassModifierStrengthening . 50 4.1.1.6 MethodCallPromotion . 53 4.1.1.7 MethodInlining . .. .. .. .. .. .. 54 4.1.1.8 MethodCallForwarding . 56 4.1.1.9 StringDataCompaction. 56 4.1.1.10 Evaluation .......................... 58 4.1.2 ConstantPoolElimination . 62 4.1.2.1 Overview .......................... 62 4.1.2.2 ResolutionProcess. 63 4.1.2.3 Evaluation .......................... 66 4.1.3 ReferenceColoring. .. .. .. .. .. .. .. 67 4.1.3.1 Overview .......................... 67 4.1.3.2 DataFlowAnalysis . 67 4.1.3.3 Runtime Data Structures and Bytecode Implementation . 72 4.1.3.4 BoundingtheTableSize. 76 4.1.3.5 Evaluation .......................... 78 4.2 ExploitingHardwareConcurrency . 79 4.2.1 IntegratedStackModule . 79 4.2.1.1 FunctionalDescription . 79 4.2.1.2 JustificationforDesignParameters . 84 4.2.1.3 Evaluation .......................... 89 4.2.2 MethodCache ............................. 89 4.2.2.1 ArchitecturalOverview . 89 4.2.2.2 DesignJustification . 92 4.2.2.3 QuantitativeEvaluation . 96 4.2.2.4 FutureImprovements . 101 4.2.3 HeapMemoryManagement . 103 4.2.3.1 Overview ..........................103 4.2.3.2 StructureofMemoryManagementUnit . 104 4.2.3.3 MemoryOrganization . 106 4.2.3.4 GarbageCollectionCycle . 107 4.2.3.5 Properties ..........................109 4.2.3.6 WeakHandling . .. .. .. .. .. .. .110 4.2.3.7 RootinganApplication . 112 4.2.3.8 ExperimentalMeasurements . 112 4.2.3.9 Evaluation ..........................117 4.3 JemBenchResults................................119 5 Summary and Future Work 121 iii Appendices A AHistoricalAccountonObjectOrientation I A.1 Subroutines ................................ I A.2 Coroutines................................. II A.3 TheAbstractDataType.......................... II A.4 ClassesandMethods ........................... III A.5 DynamicTypingandPrototypes . IV A.6 Multimethods............................... IV A.7 Polymorphism............................... VI B Method Dispatching IX B.1 TheKindsofJavaMethodInvocations . IX B.2 GenericLookup.............................. XII B.3 MultipleInheritance............................ XIV B.4 Interfaces ................................. XVI C Accessibility Issues in the Virtual Table Construction XIX D SHAP Code Tables XXV D.1 BytecodeInstructionSet ......................... XXV D.2 MicrocodeInstructionSet. XXVI D.2.1 Mnemonics............................ XXVI D.2.2 Encodings............................. XXVIII E SHAP Data Structures XXXI E.1 SHAPFileFormat ............................ XXXI E.2 ClassandObjectLayout . XXXIII Bibliography XXXV v List of Figures 1.1 UMLClassRepresentationwithOmittedDetails . 5 2.1 CACAO:CompactClassDescriptorLayout . 16 2.2 CACAO:FastClassDescriptorLayout . 16 2.3 ExampleofFilledIMTswithaConflict . 18 2.4 Example VirtualTable of a Class Loaded by the SableVM . 19 3.1 The SHAP Application Employment and Platform Design Flows....... 32 3.2 TheSHAPMicroarchitecture. 35 3.3 LayoutofReferenceValues. 39 3.4 Implementations of Standard Dispatch Schemes on SHAP . 40 3.5 SHAPPipelineExecution. 41 4.1 Extraction of Inlining Equivalent by Bytecode Reordering .......... 55 4.2 MethodInliningandForwardingCombined . 57 4.3 Evaluation of ShapLinker Optimizations on a Single-CoreSHAP(ML505) . 60 4.4 Evaluation of ShapLinker Optimizations on a 9-Core SHAP (ML505) . 61 4.5 Overall Resolution Flow from the classfiles to the Running SHAP Application 65 4.6 ExampleColorInference . 70 4.7 Virtual Table of SHAP Implementation of Thread Class........... 73 4.8 Virtual Table on SHAP with Specializing Interface . ......... 75 4.9 Redundancy in Virtual Tables on SHAP in Parallel Hierarchies ........ 76 4.10 Dense Interface Inheritance Hierarchy . 77 4.11 Alternative Indirect itable Implementation . 78 4.12 BasicStackOrganizationandOperations . 80 4.13 MethodFrameAbstractionbytheStackModule . 81 4.14 SegmentedStackMemoryOrganization . 83 4.15 DistributionofArgumentCounts . 85 4.16 DistributionofLocalVariableCounts . 86 4.17 Distribution of Maximum Operation Stack Depths . ......... 88 4.18 Classic Cache Overview (Direct-Mapped) . 90 4.19 Linked-List Organization of Bump-Pointer Cache Storage .......... 91 4.20 CompositeDesignPattern . 95 4.21 Conservative Call Graph Cycle on Thread.toString() ......... 96 4.22 Comparison of the Reply Rates of the Cache Architectures.......... 98 4.23 Virtual Trace of Call Depth of Embedded CaffeineMark . .......... 99 4.24 Virtual Trace of Call Depth of FSTest Application (Prefix) ..........100 4.25 GCFeaturesofDifferentJavaEditions . 104 4.26 Heap Memory Management: Structural Overview . 105 4.27 LifecycleofMemorySegments. 106 vi 4.28 PhasesoftheGarbageCollectionCycle . 107 4.29 ReferenceStrengthTags