List of lectures
1Introduction and Functional Programming 2Imperative Programming and Data Structures 3Parsing TDDA69 Data and Program Structure 4Evaluation Virtual Machines and Bytecode 5Object Oriented Programming 6Macros and decorators Cyrille Berger 7Virtual Machines and Bytecode 8Garbage Collection and Native Code 9Distributed Computing 10Declarative Programming 11Logic Programming 12Summary
2 / 46
How is a program interpreted? Lecture content
Source code Parser Virtual Machines Types of virtual/hardware machines Parser Bytecode Abstract Syntax Tree Tree visitor Simple instruction set From AST to Bytecode Generator Source code ... Bytecode Interpreter Meta-circular evaluator Bytecode Virtual Machine
Assembler Assembly Operating System CPU
3 / 46 4 / 46 What is a Virtual Machine?
A Virtual Machine is a hardware or software emulation of a real or hypothetical computer system A system virtual machine emulates a Virtual Machines complete system and is intented to execute a complete operating system Examples: VirtualBox, VMWare, Parallels... A process/language virtual machine runs a single program in a single process Examples: JVM, CPython, V8, Dalvik...
6
Language Virtual Machine Native Environment vs Emulated Environment
Operating System Source Code Input / Output
Native Program Hardware Compiler Libraries Byte Code Virtual Machine Operating System Emulated Virtual Machine Virtual Machine Virtual Machine Input / Output Input / Output
Emulated Emulated Program Hardware Interface Hardware Windows Linux Mac Bindings Libraries
7 8 Benefits of Virtual Machines
Portability: Virtual Machines are compatible with various hardware platforms and Operating Systems Isolation: Virtual Machines are Types of virtual/hardware machines isolated from each other For running incompatible applications concurrently Encapsulation: computation is seperated from the operating system Beneficial for security
9
Types of virtual/hardware machines Register Machine: Formal definition
Register Machine A (in)finite set of registers, which holds a Stack Machine single non-negative integer An instruction set, which defines the operation on registers Arithmetic, control, input/output... A state register, which holds the current instruction and its index Sequential list of labeled instructions which defines the program to be executed
11 12 Stack Machine: Formal definition Stack Machines vs Register Machines
An (almost) infinite stack, which holds Stack Machines need more compact code integers Arithmetic instruction are smaller An instruction set, which defines the Stack Machines have simpler compiler, operation on the stack interpreters and a minimal processor state Arithmetic operations are always applied on the top two Stack Machines have a performance elements and the results is stored in the top disadvantages A state register, which holds the current More memory references, less caching of temporaries instruction and its index Higher cost of factoring out common subexpressions It has to be stored as a temporary variable Sequential list of labeled instructions Most common hardware are register machines which defines the program to be executed
13 14
Virtual Machines for Dynamic Typing
Remember: With static typing, types are checked during compilation With dyamic typing, types are checked during execution Bytecode Implication Stack/Registers contains pointer to objects Function call convention
15 Bytecode
For a virtual stack machine Instruction set Generate the bytecode Interpreting the bytecode Simple instruction set
17
Stack managment Arithmetic operators
PUSH [constant_value] ADD, MUL... Push the constant on the stack Pop two arguments on the stack POP [number] Push the result Pop a certain numbers of variables from the stack
19 20 Variables Jumps
LOAD [varname] JMP [idx] Push the value of variable Jump to execute instruction at the given index DCL [varname] IFJMP [idx] Declare the variable Pop the value and if true jump to [idx] STORE [varname] Get the value, store the result and push the value
21 22
Functions Objects
CALL [arguments] LOAD_MEMBER [varname] Pop the function object and call it with the Pop the object and push the value of variable given number of arguments STORE_MEMBER [varname] RET Pop the object and value and store it and push Return the value To simplify the bytecode this can be implemented as function call
23 24 From AST to Bytecode
With a tree visitor... It can take several pass: Find the variables Compute the jumps From AST to Bytecode Generate the code The real challenge is to map high level language to instructions
26
Arithmetic operation Functions
1+2 func(1) a-2 func(1,2) var a console.log('Hello world!') a=1 function func(a, b) { return a + b; var a = 1 } a+=2 a=b*c+2 a.b=c
27 28 Controls
if(a) { console.log('test') } if(a) { console.log('hello') } else { console.log('world') } var a = 10; Bytecode Interpreter while(a) { a -= 1; } for(var a = 0; a < 10; a += 1) { console.log(a); } for(var a = 0; a < 10; a += 1) { console.log(a); if(a+b) { break; } }
29
Components of bytecode interpreter Verification
Verifier Verifier checks correctness of Bytecode may come from buggy compiler or bytecode malicious source Every instruction must have a valid operation Dynamic checking of types, array bounds, code function arguments... Every instruction must have valid parameters Instruction executer Every branch instruction must branch to the start of some other instruction, not middle of instruction
31 32 Bytecode Interpreter Interpreter loop
Standard virtual machine instruction_index = 0 instructions = [ ... ] interprets instructions stack = [ ] Perform run-time checks such as array bounds current_env = Environment() and types and function arguments while instruction_index < len(instructions): Possible to compile bytecode to native code next_instruction = instructions[instruction_index] (JIT: Just-In-Time) switch(next_instruction.opcode): Call native methods case ADD: .. Typically functions written in C case JMP: ...
33 34
Interpreter Example Function Call
Print the absolute value of 4-1: Recursive call 1 PUSH 4 For a function call, instantiate a new 2 PUSH 1 interpreter loop 3 SUB Stack 4 DUP Push on the stack some information on how to 5 PUSH 0 restore the interpreter when returning 6 SUP More complex code, but higher performance 7 IFJMP 9 and flexibility 8 NEG Allow infinite recursion 9 LOAD 'print' 10 CALL
35 36 Function Call - Stack Handling Exceptions (1/2)
instruction_index = 0 instructions = [ ... ] stack = [ ] Use the implementation language current_env = Environment() while instruction_index < len(instructions): next_instruction = instructions[instruction_index] exceptions switch(next_instruction.opcode): ... case CALL: env = Environment() Add exception information to the func = stack.pop() for arg in func.args: value = stack.pop() stack env.set(arg.name, value) stack.push( [ next_instruction, instructions, current_env ] ) next_instruction = 0 instructions = func.instructions current_env = env
case RET: retval = stack.pop() info = stack.pop() stack.push(retval) next_instruction = info[0] instructions = info[1] current_env = info[2]
37 38
Handling Exceptions (2/2)
instruction_index = 0 instructions = [ ... ] stack = [ ] current_env = Environment() while instruction_index < len(instructions): next_instruction = instructions[instruction_index] switch(next_instruction.opcode): ... case TRY_PUSH: Meta-circular evaluator stack.push(Exception(next_instruction.rescue_index, instructions)) case TRY_POP: stack.pop() case THROW: while True: info = stack.pop() if(info is Exception): instructions = info.instructions next_instruction = info.rescue_index
39 Meta-circular evaluator RPython
An evaluator that is written in the RPython is a subset of Python same language that it evaluates is Variables contains value of at most one type Module globals are constants said to be metacircular (SICP 4.1). Restrictions on control flow, objects, Easy to extend the language exceptions... Make it easier to build sophisticated debuggers Meant to be easy to interpret More suited for building new languages
41 42
RPython and PyPy PyPy Virtual Machine
Usually interpreters are written in a Source Code target platform language such as C Compiler PyPy VM But C is a complex and unsafe language, and interpreter tend to Byte Code get complicated PyPy uses RPython: PyPy VM PyPy VM PyPy VM to provide a set of tools for implementing interpreters for any language Windows Linux Mac a python implementation using those tools
43 44 Benefits of RPython over host language Conclusion
Easier to develop Virtual Machines for interpreting If you know Python, you know RPython programs Easier to debug Stack machines are easier to Since you can run RPython program with a regular Python interpreter implement but slower than Factor the port to different register machines platform Virtual Machines introduce compilation overhead, but are faster to execute
45 46 / 46