(Implementation Details of Ruby 2.0 VM).succ
笹田 耕一 Koichi Sasada
[email protected] @koichisasada 1 (“Implementation Details of Ruby 2.0 VM”).succ #=> "Implementation Details of Ruby 2.0 VN"
笹田 耕一 Koichi Sasada
[email protected] @koichisasada 2 (Implementation Details of Ruby 2.0 VM).succ != Ruby 2.0 sucks 笹田 耕一 Koichi Sasada
[email protected] @koichisasada 3 (Implementation Details of Ruby 2.0 VM).succ == Ruby 2.0 Rocks! 笹田 耕一 Koichi Sasada
[email protected] @koichisasada 4 Disclaimer
• (As you can see) I can
speak English little. http://www.flickr.com/photos/andosteinmetz/2901325908/ • Ask me an questions in 日本語 Japanese (WELCOME!), Ruby or SLOW English • All of I want to say is on the screen. You can read them.
5 Who am I ?
• 笹田耕一 (Koichi Sasada) – Matz team at Heroku, Inc. • Full-time CRuby development – CRuby/MRI committer • Virtual machine (YARV) from Ruby 1.9 • YARV development since 2004/1/1 – 2.0 Release manager assistant • Organizing feature request • Many mails to ruby-core/ruby-dev
6 Matz team at Heroku, Inc.
Matz @ Shimane Boss
Communication with Skype ko1 @ Tokyo Nobu @ Tochigi me Drunker
7 Commit number/day of Ruby's trunk 90
80
70
60
50 40 total 30
20
10
0
8 Commit number/day of Ruby's trunk 90
80
70
60
50
40 total
30 matz
20
10
0
9 Commit number/day of Ruby's trunk 90
80
70
60 50 total 40 matz 30
20 ko1
10
0
10 Commit number/day of Ruby's trunk 90
80
70
60 total 50
40 matz
30 ko1 20 nobu 10
0
11 Today’s topics
• Ruby 2.0 Features • Ruby 2.0 Optimizations – Method dispatch • After Ruby 2.0
12 Ruby 2.0
20th Anniversary Release of Ruby language
will be release at 2013/02/24 (Fixed)
ADD (Anniversary Driven Development) 13 Ruby 2.0 Release policy
• Compatibility (Ruby level) • Compatibility (Ruby level) • Compatibility (Ruby level) • Usability • Performance
14 Ruby 2.0 Roadmap
2012/Dec 2013/2/24 2012/Oct Now 2013/Jan Ruby 2.0 Release Feature freeze Preview2 RC1 (20th anniversary)
2012/Aug 2012/Nov 2012/Dec 2013/Feb “Big-feature” freeze Preview1 X’mas RC2 (was ignored) Code freeze
Only about *few weeks* to introduce new codes
“[ruby-core:40301] A rough release schedule for 2.0.0” and Endo-san’s (release manager) leak 15 Introduction of Ruby 2.0 features
What features are introduced?
16 # -*- rdoc -*- "Fiber#transfer". * __callee__ has returned * added nil.to_h which variable has been set. * new methods: This version is largely now stringifies the given * String#chars to the original behavior, and returns {} * added * Net::HTTP#local_host OpenSSL::SSL::OP_DONT_INSE backwards-compatible with object using to_s. * String#codepoints = NEWS for Ruby 2.0.0 * File now Thread#backtrace_locations * Net::HTTP#local_host= RT_EMPTY_FRAGMENTS. previous rdoc versions. * Shellwords#shelljoin() * String#bytes * extended method: returns the called name * Process which returns similar * Net::HTTP#local_port * OpenSSL requires The most notable change is accepts non-string objects in but not the original name in an information of passwords for decrypting an update to the ri data format the given This document is a list of user * File.fnmatch? now * added method: * Net::HTTP#local_port= These methods no longer aliased method. Kernel#caller_locations. PEM-encoded files to be at (ri data must array, each of which is visible feature changes made expands braces in the pattern * added getsid for getting * extended method: return an Enumerator, * Kernel#inspect does not * incompatible changes: least be regenerated for gems stringified using to_s. between if session id (unix only). * Net::HTTP#connect uses although passing a call #to_s anymore * Thread#join and four characters long. This shared across rdoc versions). releases except for bug fixes. File::FNM_EXTGLOB local_host and local_port if led to awkward situations Further API changes block is still supported for option is given. (it used to call redefined * Range Thread#value now raises a specified. * syslog backwards compatibility. #to_s). ThreadError if target thread where an export with are internal and won't affect Note that each entry is kept so * added method: * Added Syslog::Logger which is the current or main a password with fewer than most users. provides a Logger API atop brief that no reason behind or * GC * added Range#size for lazy * net/imap four characters was possible, Code like * LoadError thread. Syslog. reference information is * improvements: size evaluation. * new methods: but accessing the str.lines.with_index(1) { |line, * added method: See * Syslog::Priority, lineno| ... } no longer supplied with. For a full list of * introduced the bitmap * added Range#bsearch for * Net::IMAP.default_port file afterwards failed. https://github.com/rdoc/rdoc/ changes * added LoadError#path * Time Syslog::Level, Syslog::Option works because str.lines marking which suppresses to binary search. * OpenSSL::PKey::RSA, blob/master/History.rdoc for a and Syslog::Macros with all sufficient information, copy a memory page method to return the file * change return value: OpenSSL::PKey::DSA and list of returns an array. Replace lines name that could not be Net::IMAP.default_imap_port are introduced for easy with see the ChangeLog file. with Copy-on-Write. * Signal * Time#to_s returned OpenSSL::PKey::EC changes in rdoc 4.0. loaded. encoding defaults to US-ASCII * detection of available each_line in such cases. * introduced the non- * added method: Net::IMAP.default_tls_port therefore now enforce the constants on a == Changes since the 1.9.3 recursive marking which but automatically same check when exporting a * added Signal.signame transcodes to * * resolv running system. release avoids unexpected stack * Module which returns signal name Net::IMAP.default_ssl_port private key to PEM with a * Signal.trap overflow. Encoding.default_internal if it password - it has to be at least * new methods: * added method: * is set. four characters * Resolv::DNS#timeouts= * tmpdir === C API updates * added Module#prepend * incompatible changes: Net::IMAP.default_imaps_port See above. * GC::Profiler which is similar to long. * * incompatible changes: * NUM2SHORT() and * Signal.trap raises Resolv::DNS::Config#timeouts NUM2USHORT() added. They * added method: Module#include, * TracePoint * SSL/TLS support for the * Dir.mktmpdir uses ArgumentError * objspace = FileUtils.remove_entry instead * Merge Onigmo. are similar to NUM2INT, but * added however a method in the * new class. This class is Next Protocol Negotiation when :SEGV, :BUS, :ILL, :FPE, : replacement of set_trace_func. * new method: extension. Supported of https://github.com/k- short. GC::Profiler.raw_data which prepended module overrides VTALRM takata/Onigmo the Easy to use and efficient * with OpenSSL 1.0.1 and * rexml * rb_newobj_of() and returns raw profile data for GC. are specified. ObjectSpace.reachable_object FileUtils.remove_entry_secure. NEWOBJ_OF() added. They corresponding method in implementation. higher. * REXML::Document#write create a new object of a given the prepending module. s_from(obj) * OpenSSL::OPENSSL_FIPS supports Hash arguments. This means that applications * The :close_others option is * Hash should not true by default for system() class. * added Module#refine, * String * toplevel allows client applications to * REXML::Document#write * added method: change the permission of and exec(). which extends a class or * added method: * added method: * openssl detect whether OpenSSL supports new :encoding * added Hash#to_h as module locally. * added String#b returning * Consistently raise an error is running in FIPS mode and option. It changes the created temporary Also, the close-on-exec flag === Library updates explicit conversion method, * added directory to make is set by default for all new file (outstanding ones only) [experimental] a copied string whose when trying to encode nil to react to the special XML document encoding. like Array#to_a. main.define_method which accessible from other users. descriptors. * added encoding is ASCII-8BIT. defines a global function. values. All instances requirements this Without :encoding option, * extended method: * change return value: of OpenSSL::ASN1::Primitive might impy. encoding in This means file descriptors * builtin classes Module#refinements, which doesn't inherit to spawned * Hash#default_proc= can returns refinements defined in * String#lines now returns now raise TypeError when XML declaration is used for * yaml be passed nil to clear the * cgi process unless the an array instead of an calling to_der on an * ostruct XML document encoding. * Syck has been removed. * Array default proc. enumerator. * Add HTML5 tag maker. instance whose value is nil. YAML now completely explicitly requested such as receiver. [experimental] * new methods: system(..., fd=>fd). * added method: * String#chars now returns * CGI#header has been All instances of * RubyGems depends on libyaml being * added Module#using, renamed to CGI#http_header * OpenStruct#[], []= * added Array#bsearch for * Kernel which imports refinements an array instead of an OpenSSL::ASN1::Constructive * Updated to 2.0.0.preview2 installed. binary search. and * OpenStruct#each_pair * Kernel#respond_to? * added method: into the receiver. enumerator. raise NoMethodError in the * incompatible changes: aliased to CGI#header. same case. Constructing such * OpenStruct#eql? against a protected method * added Kernel#Hash [experimental] * String#codepoints now RubyGems 2.0.0 features * zlib * random parameter of conversion method like Array() returns an array instead of an * When HTML5 tagmaker values is still * OpenStruct#hash now returns false You * extended method: cancalled, overwrite readCGI#header, themthe following improvements: * Added streaming support unless the second argument Array#shuffle! and or Float(). enumerator. permitted. * OpenStruct#to_h converts * Module#define_method for Zlib::Inflate and is true. Array#sample now * added Kernel#using, * String#bytes now returns CGI#header function is to * TLS 1.1 & 1.2 support by the struct to a hash. Zlib::Deflate. This allows accepts a UnboundMethod create a
• Traces Internal – TracePoint Inspection – DTrace Ruby 2.0 introduces • Inspection huge inspection support – caller_locations – debug_inspector API • Memory inspection – ObjectSpace.reachable_objects_from(obj) – GC.stat[:total_allocated_object]
18 Internal Inspection features
• Generally, you don’t need to use these inspection features • If you got a trouble, please remember inspection features
Nobody knows, so I introduce them today
19 TracePoint
• OO designed set_trace_func • Usage # old style set_trace_func(lambda{|ev,file,line,id,klass,binding| puts “#{ev} #{file}:#{line}” }
# new style with TracePoint trace = TracePoint.trace{|tp| # access event info by methods puts “#{tp.event}, #{tp.path}:#{tp.line}”
} 20
TracePoint Flexible on/off trace = TracePoint.new{...} trace.enable do ... # enable trace only in this block end trace.enabe # enable trace after this point trace.disalbe{ ... # disable trace only in this block }
21 TracePoint Events • Same as set_trace_func – line – call/return, c_call/c_return – class/end – raise • New events (only for TracePoint) – thread_begin/thread_end – b_call/b_end
22 TracePoint Filtering events • TracePoint.new(events) only hook “events”
TracePoint.trace(:call, :return){...} ...
23 TracePoint Event info • Same as set_trace_func – event – path, lineno – defined_class, method_id – binding • New event info – return_value (only for retun, c_return, b_return) – raised_exception (only for raise)
24 TracePoint Advantages • Advantage of TracePoint compared with set_trace_func – OO style – On/Off – Lightweight • Creating binding object each time is too costly – Event filtering
25 DTrace
• Solaris, MacOSX FreeBSD and Linux has DTrace tracing features • Ruby interpreter support some events • See https://bugs.ruby- lang.org/projects/ruby/wiki/DTraceProbes – Not stable. Be careful this probe spec can be changed before and after Ruby 2.0 release.
Skip this section because I’m not an expert.
26 caller_locations
• caller() returns Backtrace strings array. – like ["t.rb:1:in `
27 Debug inspection API
• Returns all bindings of current stack – Provided as C API – Debugger can use them
28 Memory inspection ObjectSpace.reachable_objects_from • ObjectSpace.reachable_objects_from(obj) returns reachable objects – Examples: (1) When obj is [“a”, “b”, “c”], returns [Array, “a”, “b”, “c”] (2) When obj is [“a”, “a”], returns [Array, “a”, “a”] (3) When obj is [a = “a”, a], returns [Array, “a”] (1) array (2) Array array (3) obj Array obj array Array obj “b” “c” “a” “a” “a” “a” 29 Memory inspection ObjectSpace.reachable_objects_from • You can analyze memory leak. ... Maybe. • Combination with ObjectSpace.memsize_of() (introduced at 1.9) is also helpful to calculate how many memories consumed by obj.
array Array obj 12 byte
“a” “a” DEMO 1 byte Total 14 bytes 1 byte
(this is fake example) 30 Memory inspection GC.stat[:total_allocated_object] • GC.stat returns implementation dependent GC (memory) usage by hash – :count means how many GC occurs • From 2.0, two information are added • :total_allocated_object • How many objects are allocated since interpreter launched • :total_freeed_object • How many objects are freed by GC. – Note that these numbers can be overflow. 31
Desirable behavior
100_000.times{|i| ""; # Generate an empty string h = GC.stat puts "#{i}¥t#{h[:total_allocated_object]}¥t#{h[:total_freed_object]}"} Leakey behavior
Live obj#
ary = [] 100_000.times{|i| ary << "" # generate an empty string and store (leak) h = GC.stat puts "#{i}¥t#{h[:total_allocated_object]}¥t#{h[:total_freed_object]}"} Internal Inspection features Again • Generally, you don’t need to use these inspection features • If you got a trouble, please remember inspection features
34 goto :next_topic
Change the title of this presentation to...
35 Lecture series of Computer Science How to make interpreter? #3 Method dispatch
Prof. Koichi Sasada (*1) Akihabara University (*2)
*1: Prof. means ... *2: Of course, joking. No such University 36 Review slide Requirement and Assumption • You need to finish “Ruby language basic” course • This course uses “Ruby” language/interpreter – One of the most popular languages – Used in world-wide programming • Web application • Text processing • and everything!! – CRuby • Ruby has many alternative implementations • CRuby has their own VM 37 Review slide How to implement virtual machine? • Execute instructions – Execute compiled instructions (bytecodes) – Pointed by “Program counter” (PC) • Stack machine architecture – All of values on the stack – Stack top is pointed by “Stack pointer” (SP) – V.S. Register machine architecture • Advantages and disadvantages • Yunhe Shi, et al: “Virtual machine showdown: stack versus registers” (2005) 38 Review slide Stack machine execution (basic)
Ruby Program b+ca a = b + c Compile b c YARV Instructions c getlocal b b+cb getlocal c local env YARV Stack send + setlocal a PC SP
39 Review slide [Advanced] Optimization techniques • Peephole optimizations (compiler technique) – Reduce instruction number • Make macro instructions – Operand unification – Instruction unification • Direct threading – Using GCC specific feature • Stack caching – n-level stack caching – Impact on CPU’s branch prediction
40 Review slide [Advanced] VM generator
VM generator enables VM Instrunction flexible VM building Description Dis-assembler
Assembler
Virtual Machine Documents
Compiler (Optimizer)
41 Today’s lecture: Method dispatch # Example recv.selector(arg1, arg2)
• recv: receiver • selector: method id • arg1, arg2: arguments
42 sp Before method dispatch
1. Evaluate `recv’ arg2 2. Evaluate `arg1’ and `arg2’ arg1 3. Method dispatch (`selector’) recv
Stack after #2 # Ruby’s disassembled bytecodes of Ruby 2.0 trunk 0016 getlocal recv, 0 # 1 receiver 0019 getlocal arg1, 0 # 2 arg1 0022 getlocal arg2, 0 # 2 arg2 0025 send
43 Method dispatch Overview 1. Get class of `recv’ (`klass’) 2. Search method `body’ named `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 3. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) 3. Store `PC’ and `SP’ to continue after method returning 4. Build `local environment’ 5. Set program counter 4. And continue VM execution
44 Overview Method search
• Search method from `klass’ BasicObject 1. Search method table of `klass’ 1. if method `body’ is found, return Kernel `body’ 2. `klass’ = super class of `klass’ and Object selector: body repeat it ... 2. If no method is given, Each Class has C1 exceptional flow method table C2 • In Ruby language, `method_missing’ will be called
45 Overview Cheking arity and visibility • Checking arity – Compare with given argument number and expected argument number • Checking visibility – In Ruby language, there are three visibilities (can you explain each of them ?:-p) • public • private • protected
46 Overview Building `local environment’ • How to maintain local variables? → Prepare `local variables space’ in stack → `local environment’ (short `env’) • Parameters are also in `env’
47 Overview Building `local environment’
sp
loc2 ep ep: env pointer sp loc1 env arg2 arg2 all local variables are accessible with arg1 arg1 `ep’ (ep[0], ep[-1], ep[-2], ...) recv recv Stack before Stack after method dispatch method dispatch 48 Method dispatch Overview (again) 1. Get class of `recv’ (`klass’) 2. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 3. Dispatch method with `body’ About 7 steps 1. Check visibility 2. Check arity (expected args # and given args #) 3. Store `PC’ and `SP’ to continue after method returning 4. Build `local environment’ 5. Set program counter It seems very easy 4. And continue VM execution and simple! and slow... 49 Method dispatch Ruby’s case • Quiz: How many steps in Ruby’s case? – Hint: More complex than I explained overview ① 8 steps ② 12 steps ③ 16 steps Answer is ④ ④ 20 steps About 20 steps
50 Method dispatch Ruby’s case
1. Check caller’s arguments 1. Check splat (*args) 2. Check block (given by compile time or block parameter (&block)) 2. Get class of `recv’ (`klass’) 3. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 4. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) and process 1. Post arguments 2. Optional arguments 3. Rest argument 4. Keyword arguments 5. Block argument 3. Push new control frame 1. Store `PC’ and `SP’ to continue after method returning 2. Store `block information’ 3. Store `defined class’ 4. Store bytecode info (iseq) ... simple? 5. Store recv as self 4. Build `local environment’ 5. Initialize local variables by `nil’ 6. Set program counter 5. And continue VM execution (*) Underlined items are additonal process 51 Ruby’s case 4. Dispatch method with `body’ • Previous explanation is for Ruby methods – `body’ (defined as rb_method_definition_t in method.h) has several types at least the following two types: • Method defined by Ruby code • Method defined by C function (in C-extension) • Quiz: How many method types in CRuby? – Hint: At least 2 types (Ruby method and C method) ① 3 types ② 6 types Answer is ③ 9 types ④ ④ 11 types About 11 types
52 Ruby’s case Method types 1. VM_METHOD_TYPE_ISEQ: Ruby method (using `def’ keyword) 2. VM_METHOD_TYPE_CFUNC: C method 3. VM_METHOD_TYPE_ATTRSET: defined by @attr_accessor 4. VM_METHOD_TYPE_IVAR: defined by @attr_reader 5. VM_METHOD_TYPE_BMETHOD: defind by `define_method’ 6. VM_METHOD_TYPE_ZSUPER: used in internal 7. VM_METHOD_TYPE_UNDEF: `undef’ed method 8. VM_METHOD_TYPE_NOTIMPLEMENTED: not implemet 9. VM_METHOD_TYPE_OPTIMIZED: optimization 10. VM_METHOD_TYPE_MISSING: method_missing type 11. VM_METHOD_TYPE_CFUNC_FRAMELESS: optimization two There are 11th different method dispatch procedure (dispatch by switch/case statement) 53 Ruby’s case
• Quiz: I introduce (virtual) registers `pc’, `sp’ and `ep’. How many registers in virtual machine (in Ruby 1.9.x)? ① 4 registers Answer is ② 6 registers About ④ 11 registers ③ 9 registers ↓ ④ 11 registers Need to store/restore 11 registers each method call
54 Ruby’s case Store registers • Introduce “control frame stack” to store registers – To store `pc’, `sp’, `ep’ and other information, VM has another stack named “control frame stack” – Not required structure, but it makes VM simple → Easy to maintain
11 regs 10 regs /* 1.9.3 */ /* 2.0 */ reduced, but many yet typedef struct { typedef struct { VALUE *pc; /* cfp[0] */ VALUE *pc; /* cfp[0] */ VALUE *sp; /* cfp[1] */ VALUE *sp; /* cfp[1] */ VALUE *bp; /* cfp[2] */ rb_iseq_t *iseq; /* cfp[2] */ rb_iseq_t *iseq; /* cfp[3] */ VALUE flag; /* cfp[3] */ VALUE flag; /* cfp[4] */ VALUE self; /* cfp[4] / block[0] */ VALUE self; /* cfp[5] / block[0] */ VALUE klass; /* cfp[5] / block[1] */ VALUE *lfp; /* cfp[6] / block[1] */ VALUE *ep; /* cfp[6] / block[2] */ VALUE *dfp; /* cfp[7] / block[2] */ rb_iseq_t *block_iseq; /* cfp[7] / block[3] */ rb_iseq_t *block_iseq; /* cfp[8] / block[3] */ VALUE proc; /* cfp[8] / block[4] */ VALUE proc; /* cfp[9] / block[4] */ const rb_method_entry_t *me;/* cfp[9] */ const rb_method_entry_t *me;/* cfp[10] */ } } rb_control_frame_t; 55 pushed values
locals rb_thread_t::cfp points current control frame spval SP PC self args BP me LFP pushed iseq DFP foo block values block iseq iseq def foo flags locals proc bar{ iseq SP .. PC spval BP } me end args flags self def bar block bar iseq yield pushed LFP iseq end values DFP proc foo locals SP PC BP self spval me LFP iseq args DFP block foo iseq iseq pushed flags proc values SP PC flipflo p locals BP self data me local[0] LFP per method $~ iseq spval DFP block iseq Ruby 1.9 VM stacks structure $_ args flags proc top iseq 56 Value stack Control frame Ruby’s case Complex parameter checking • “def foo(m1, m2, o1=..., o2=..., p1, p2, *rest, &block)” – m1, m2: mandatory parameter – o1, o2: optional parameter – p1, p2: post parameter – rest: rest parameter – block: block parameter • From Ruby 2.0, keyword parameter is supported
57 Method dispatch Ruby’s case
1. CHeck caller’s arguments 1. Check splat (*args) 2. Check block (given by compile time or block parameter (&block)) 2. Get class of `recv’ (`klass’) 3. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 4. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) and process 1. Post arguments 2. Optional arguments 3. Rest argument 4. Keyword arguments 5. Block argument 3. Push new control frame 1. Store `PC’ and `SP’ to continue after method returning Complex 2. Store `block information’ 3. Store `defined class’ 4. Store bytecode info (iseq) and 5. Store recv as self 4. Build `local environment’ Slow!!! 5. Initialize local variables by `nil’ 6. Set program counter 5. And continue VM execution 58 Method dispatch Method dispatch overhead is big Overhead especially on micro-benchmarks Fib Pentomino
Others Others NotRuby NotRuby Others Others Cfunc Insn Cfunc CompileOthersBlockMMGC Obj
Insn MM Obj VM Method VM GC CompileInst
Block Method
OS: Linux 2.6.31 32-bit CPU: IntelCore2Quad 2.66GHz ruby 1.9.3dev (2010-05-26) Mem: 4GB C Compiler: GCC 4.4.1, -O3 Profiled by Mr. Shiba 59 Profiled by Oprofile Homework • Report about “Method Dispatch speedup techniques” 1. Analyze method dispatch overhead on your favorite application 2. Survey method dispatch speed-up techniques 3. Propose your optimization techniques to improve method dispatch performance 4. Implement techniques and evaluate their performance
• Deadline: 2012/12/23 (Sun) 23:59 JST • Submit to: Koichi Sasada
60 Lecture was finished
Presentation is not finished
61 Back to the presentation “Implementation Details of Ruby 2.0 VM”
Report “Optimization techniques for Ruby’s method dispatch”
Koichi Sasada
62 Speedup techniques for method dispatch 1. Specialized instructions Note that these optimizations 2. Method caching may not be my original. 3. Caching checking results 4. Frameless CFUNC method 5. Special path for `send’ and `method_missing’
Introduced techniques from Ruby 2.0 Today’s main subject 63 Method dispatch overheads 1.Check caller’s arguments 2.Search method `body’ `selector’ from `klass’ 3.Dispatch method with `body’ 1. Check visibility and arity 2. Push new control frame 3. Build `local environment’ 4. Initialize local variables by `nil’ 64
Optimization Specialized instruction (from 1.9) • Make special VM instruction for several methods – +, -, *, /, ...
def opt_plus(recv, obj) if recv.is_a(Fixnum) and obj.is_a(Fixnum) and Fixnum#+ is not redefined return Fixnum.plus(recv, obj) else return recv.send(:+, obj) # not prepared end end 65 Optimization Specialized instruction • Pros. – Eliminate all of dispatch cost (very effective) • Cons. – Limited applicability • Limited classes, limited selectors • Tradeoff of VM instruction numbers – Additional overhead when not prepared class
66 Optimization Method caching • Eliminate method search overhead – Reuse search result BasicObject – Invalidate cache entry with VM stat • Two level method caching Kernel – Inline method caching – Global method caching Object
method search class, id => body C1 class, id => body miss .... class => miss fill class, id => body C2 body return fill Inline cache Global cache naive search 1 element per call-site hash table 67 Optimization Caching checking results (from 2.0) • Idea: Visibility and arity check can be skipped after first checking – Store result in inline method cache 1. Check caller’s arguments
2. Search method `body’ `selector’ from `klass’
3. Dispatch method with `body’ First time First 1. Check visibility and arity
Second Second time 1. Cache result into inline method cache 2. Push new control frame 3. Build `local environment’
4. Initialize local variables by `nil’ 68
Optimization Frameless CFUNC (from 2.0) • Introduce “Frameless” CFUNC methods – Idea: Several CFUNC doesn’t need method frame • For example, String#length doesn’t need method frame. It only return the size of given String 1. Check caller’s arguments 2. Search method `body’ `selector’ from `klass’ 3. Dispatch method with `body’
1. Check visibility and arity
2. Push new control frame
3. Build `local environment’ Skip here Skip 4. Initialize local variables by `nil’ 69 Optimization Eliminate frame building (from 2.0) • Compare with specialized instruction – Pros. • You can define unlimited number of frameless methods – Cons. • A bit slow compare with specialized instruction • Note that evaluation result I will show you doesn’t include this technique Optimization Special path for `send’ and `method_missing’ (from2.0)
Before After
send(:foo) send(:foo)
invoke send method search `foo’ invoke `foo’ method (cfunc)
invoke `foo’ method
71 Evaluation result Micro benchmarks Faster than first date 2
vm1_attr_ivar*
1.5 vm1_attr_ivar_set* vm1_block* vm1_simplereturn* vm1_yield* 1 vm2_defined_method* vm2_method* vm2_method_missing* vm2_method_with_block* Speedup ratio Speedupratio 0.5 vm2_poly_method* vm2_send* vm2_super* 0 vm2_zsuper*
trunk 2012/10/13 trunk 2012/10/31 72 Evaluation results Applications Faster than first date 1.4
1.2
app_aobench 1 app_erb app_factorial 0.8 app_fib app_mandelbrot 0.6 app_pentomino app_raise
0.4 app_strconcat Speedup ratio Speedupratio app_tak 0.2 app_tarai app_uri 0
trunk 2012/10/13 trunk 2012/10/31 73 Future work
• Restructure “method frame” – Reduce required information per frame • Improve “yield” performance – Using something cached
74 Conclusion Method dispatch speed-up • Ruby’s method dispatch is nightmare – Too complex • Speedup upto 50% at simple method dispatch with new optimizations • Need more effort to achieve performance improvements
75 Other optimizations from 2.0
• Introducing Flonum (only on 64bit OSs) • Lightweight Backtrace capturing • Re-structure VM stacks/ISeq data
• Bitmap marking garbage collection (by nari3) • “require” performance (not by me)
76 Introducing Flonum (only on 64bit CPU) • Problem: Float objects are not immediate on Ruby 1.9 – It causes GC overhead problem • To speedup floating calculation, represent Float object as immediate object – Specified range Float objects are represented as immediate object (Flonum) like Fixnum • 1.72723e-77 < |f| < 1.15792e+77 (approximately) and +0.0 • Out of this range and all Floats on 32bit CPU are allocated in heap – No more GCs! (in most of case) – Flonum and old Float are also Float classes – Proposed by [K.Sasada 2008] – On 64bit CPU, object representation was changed
77 Benchmark results
3
2.5
2
1.5
1 clean/flonum clean/flonum.z 0.5
0
app_tak
so_fasta
so_sieve
loop_for
so_array
io_select
so_nbody
so_pidigits
vm2_case*
vm2_array*
vm1_block*
app_answer
vm2_super*
vm2_raise2*
vm1_length*
app_factorial
vm1_rescue*
io_file_create
vm2_method*
vm1_lvar_set*
so_nsieve_bits
so_mandelbrot
vm3_backtrace
loop_whileloop
app_pentomino
vm1_float_calc*
vm_thread_pass
so_count_words
vm3_thread_mutex vm_thread_mutex1 vm2_poly_method_ov*
78 Flonum: Float in Heap (1.9 or before)
All of Float object On 64bit CPU are allocated in heap HEAD - T_FLOAT - Float Data structure in heap - etc contains IEEE754/double VALUE IEEE754 Double 8B x 6w = - 48 byte for Float object - Flonum: Encoding IEEE754 double floating number 64bit double
b63 b62-b52 b51-b0 s: sign(1bit) e: exponent (11bit) m: mantissa (52bit)
80 Flonum: Range
IEEE754 double b63 b60-b62 b60-b52 b51-b0
Check if e (b52 to b62) is with-in 768 to 1279, then it can be represent in Flonum.
This check can be done with b60-b62.
(+0.0 (0x00) is special case to detect)
81 Flonum: Encoding
IEEE754 double b61 b63 b60-b52 b51-b0 b62
Only “rotate” and “mask”
Ruby’s Flonum
b60-b52 b51-b0 b63 1 0
Flonum representation bits (2 bits) #define FLONUM_P(v) ((v&3) == 2)
☆ +0.0 is special case (0x02)
82 Flonum: Object representation on VALUE Non Flonum Flonum Fixnum ...xxxx xxx1 ...xxxx xxx1 Flonum N/A ...xxxx xx10 Symbol ...xxxx 0000 1110 ...xxxx 0000 1100 Qfalse ...0000 0000 ...0000 0000 Qnil ...0000 0100 ...0000 1000 Qtrue ...0000 0010 ...0001 0100 Qundef ...0000 0110 ...0011 0100 Pointer ...xxxx xx00 ....xxxx x000
83 Lightweight Backtrace capturing
• Backtrace is Array of String objects – [“file:lineno method”, ...] • Idea: Capture only ISeqs and translate to String (file and line) only when it is accessed – Backtrace information may be ignored
iseq1 String
iseq2 String
・・・・・・
iseq_n String 84 After Ruby 2.0
What we should do?
85 Performance overhead rdoc Rails
Others Others Insn Insn Method CompileOthersBlock NotRuby VM GC VM Method Obj MM OthersBlock NotRuby Others Compile
Cfunc Obj GC Others
MM Cfunc OS: Linux 2.6.31 32-bit CPU: IntelCore2Quad 2.66GHz ruby 1.9.3dev (2010-05-26) Mem: 4GB C Compiler: GCC 4.4.1, -O3 Profiled by Mr. Shiba 86 Profiled by Oprofile VM techniques
• On Rails and other applications, VM is not an bottleneck • On Mathematic, Symbolic computation, VM is matter – To speedup then, we need compilation framework • 2.0?
87 Object Allocation and garbage collection • Lightweight object allocation – Sophisticate object creation – Create objects in non-GC managed area • Sophisticate Garbage collection – Per-type garbage collection – Generational garbage collection • Introduce write barriers with dependable techniques
88 Parallelization
• Multiple processes • Multiple VMs • Multiple Threads
No time and space to discuss about them!
89 Conclusion
90 Conclusion Our challenge has just begun!!
俺たちの戦いはまだ始まったばかりだ!
91 謝謝 Thank you for your attention
笹田 耕一 Koichi Sasada Heroku, Inc.
[email protected] @koichisasada 92