(Implementation Details of Ruby 2.0 VM).succ

笹田 耕一 Koichi Sasada

[email protected] @koichisasada 1 (“Implementation Details of Ruby 2.0 VM”).succ #=> "Implementation Details of Ruby 2.0 VN"

笹田 耕一 Koichi Sasada

[email protected] @koichisasada 2 (Implementation Details of Ruby 2.0 VM).succ != Ruby 2.0 sucks 笹田 耕一 Koichi Sasada

[email protected] @koichisasada 3 (Implementation Details of Ruby 2.0 VM).succ == Ruby 2.0 Rocks! 笹田 耕一 Koichi Sasada

[email protected] @koichisasada 4 Disclaimer

• (As you can see) I can

speak English little. http://www.flickr.com/photos/andosteinmetz/2901325908/ • Ask me an questions in 日本語 Japanese (WELCOME!), Ruby or SLOW English • All of I want to say is on the screen. You can read them.

5 Who am I ?

• 笹田耕一 (Koichi Sasada) – Matz team at Heroku, Inc. • Full-time CRuby development – CRuby/MRI committer • Virtual machine (YARV) from Ruby 1.9 • YARV development since 2004/1/1 – 2.0 Release manager assistant • Organizing feature request • Many mails to ruby-core/ruby-dev

6 Matz team at Heroku, Inc.

Matz @ Shimane Boss

Communication with Skype ko1 @ Tokyo Nobu @ Tochigi me Drunker

7 Commit number/day of Ruby's trunk 90

80

70

60

50 40 total 30

20

10

0

8 Commit number/day of Ruby's trunk 90

80

70

60

50

40 total

30 matz

20

10

0

9 Commit number/day of Ruby's trunk 90

80

70

60 50 total 40 matz 30

20 ko1

10

0

10 Commit number/day of Ruby's trunk 90

80

70

60 total 50

40 matz

30 ko1 20 nobu 10

0

11 Today’s topics

• Ruby 2.0 Features • Ruby 2.0 Optimizations – Method dispatch • After Ruby 2.0

12 Ruby 2.0

20th Anniversary Release of Ruby language

will be release at 2013/02/24 (Fixed)

ADD (Anniversary Driven Development) 13 Ruby 2.0 Release policy

• Compatibility (Ruby level) • Compatibility (Ruby level) • Compatibility (Ruby level) • Usability • Performance

14 Ruby 2.0 Roadmap

2012/Dec 2013/2/24 2012/Oct Now 2013/Jan Ruby 2.0 Release Feature freeze Preview2 RC1 (20th anniversary)

2012/Aug 2012/Nov 2012/Dec 2013/Feb “Big-feature” freeze Preview1 X’mas RC2 (was ignored) Code freeze

Only about *few weeks* to introduce new codes

“[ruby-core:40301] A rough release schedule for 2.0.0” and Endo-san’s (release manager) leak 15 Introduction of Ruby 2.0 features

What features are introduced?

16 # -*- -*- "Fiber#transfer". * __callee__ has returned * added nil.to_h which variable has been set. * new methods: This version is largely now stringifies the given * String#chars to the original behavior, and returns {} * added * Net::HTTP#local_host OpenSSL::SSL::OP_DONT_INSE backwards-compatible with object using to_s. * String#codepoints = NEWS for Ruby 2.0.0 * File now Thread#backtrace_locations * Net::HTTP#local_host= RT_EMPTY_FRAGMENTS. previous rdoc versions. * Shellwords#shelljoin() * String#bytes * extended method: returns the called name * Process which returns similar * Net::HTTP#local_port * OpenSSL requires The most notable change is accepts non-string objects in but not the original name in an information of passwords for decrypting an update to the ri data format the given This document is a list of user * File.fnmatch? now * added method: * Net::HTTP#local_port= These methods no longer aliased method. Kernel#caller_locations. PEM-encoded files to be at (ri data must array, each of which is visible feature changes made expands braces in the pattern * added getsid for getting * extended method: return an Enumerator, * Kernel#inspect does not * incompatible changes: least be regenerated for gems stringified using to_s. between if session id (unix only). * Net::HTTP#connect uses although passing a call #to_s anymore * Thread#join and four characters long. This shared across rdoc versions). releases except for bug fixes. File::FNM_EXTGLOB local_host and local_port if led to awkward situations Further API changes block is still supported for option is given. (it used to call redefined * Range Thread#value now raises a specified. * syslog backwards compatibility. #to_s). ThreadError if target thread where an export with are internal and won't affect Note that each entry is kept so * added method: * Added Syslog::Logger which is the current or main a password with fewer than most users. provides a Logger API atop brief that no reason behind or * GC * added Range#size for lazy * net/imap four characters was possible, Code like * LoadError thread. Syslog. reference information is * improvements: size evaluation. * new methods: but accessing the str.lines.with_index(1) { |line, * added method: See * Syslog::Priority, lineno| ... } no longer supplied with. For a full list of * introduced the bitmap * added Range#bsearch for * Net::IMAP.default_port file afterwards failed. https://github.com/rdoc/rdoc/ changes * added LoadError#path * Time Syslog::Level, Syslog::Option works because str.lines marking which suppresses to binary search. * OpenSSL::PKey::RSA, blob/master/History.rdoc for a and Syslog::Macros with all sufficient information, copy a memory page method to return the file * change return value: OpenSSL::PKey::DSA and list of returns an array. Replace lines name that could not be Net::IMAP.default_imap_port are introduced for easy with see the ChangeLog file. with Copy-on-Write. * Signal * Time#to_s returned OpenSSL::PKey::EC changes in rdoc 4.0. loaded. encoding defaults to US-ASCII * detection of available each_line in such cases. * introduced the non- * added method: Net::IMAP.default_tls_port therefore now enforce the constants on a == Changes since the 1.9.3 recursive marking which but automatically same check when exporting a * added Signal.signame transcodes to * * resolv running system. release avoids unexpected stack * Module which returns signal name Net::IMAP.default_ssl_port private key to PEM with a * Signal.trap overflow. Encoding.default_internal if it password - it has to be at least * new methods: * added method: * is set. four characters * Resolv::DNS#timeouts= * tmpdir === API updates * added Module#prepend * incompatible changes: Net::IMAP.default_imaps_port See above. * GC::Profiler which is similar to long. * * incompatible changes: * NUM2SHORT() and * Signal.trap raises Resolv::DNS::Config#timeouts NUM2USHORT() added. They * added method: Module#include, * TracePoint * SSL/TLS support for the * Dir.mktmpdir uses ArgumentError * objspace = FileUtils.remove_entry instead * Merge Onigmo. are similar to NUM2INT, but * added however a method in the * new class. This class is Next Protocol Negotiation when :SEGV, :BUS, :ILL, :FPE, : replacement of set_trace_func. * new method: extension. Supported of https://github.com/k- short. GC::Profiler.raw_data which prepended module overrides VTALRM takata/Onigmo the Easy to use and efficient * with OpenSSL 1.0.1 and * rexml * rb_newobj_of() and returns raw profile data for GC. are specified. ObjectSpace.reachable_object FileUtils.remove_entry_secure. NEWOBJ_OF() added. They corresponding method in implementation. higher. * REXML::Document#write create a new object of a given the prepending module. s_from(obj) * OpenSSL::OPENSSL_FIPS supports Hash arguments. This means that applications * The :close_others option is * Hash should not true by default for system() class. * added Module#refine, * String * toplevel allows client applications to * REXML::Document#write * added method: change the permission of and exec(). which extends a class or * added method: * added method: * openssl detect whether OpenSSL supports new :encoding * added Hash#to_h as module locally. * added String#b returning * Consistently raise an error is running in FIPS mode and option. It changes the created temporary Also, the close-on-exec flag === Library updates explicit conversion method, * added directory to make is set by default for all new file (outstanding ones only) [experimental] a copied string whose when trying to encode nil to react to the special XML document encoding. like Array#to_a. main.define_method which accessible from other users. descriptors. * added encoding is ASCII-8BIT. defines a global function. values. All instances requirements this Without :encoding option, * extended method: * change return value: of OpenSSL::ASN1::Primitive might impy. encoding in This means file descriptors * builtin classes Module#refinements, which doesn't inherit to spawned * Hash#default_proc= can returns refinements defined in * String#lines now returns now raise TypeError when XML declaration is used for * yaml be passed nil to clear the * cgi process unless the an array instead of an calling to_der on an * ostruct XML document encoding. * Syck has been removed. * Array default proc. enumerator. * Add HTML5 tag maker. instance whose value is nil. YAML now completely explicitly requested such as receiver. [experimental] * new methods: system(..., fd=>fd). * added method: * String#chars now returns * CGI#header has been All instances of * RubyGems depends on libyaml being * added Module#using, renamed to CGI#http_header * OpenStruct#[], []= * added Array#bsearch for * Kernel which imports refinements an array instead of an OpenSSL::ASN1::Constructive * Updated to 2.0.0.preview2 installed. binary search. and * OpenStruct#each_pair * Kernel#respond_to? * added method: into the receiver. enumerator. raise NoMethodError in the * incompatible changes: aliased to CGI#header. same case. Constructing such * OpenStruct#eql? against a protected method * added Kernel#Hash [experimental] * String#codepoints now RubyGems 2.0.0 features * zlib * random parameter of conversion method like Array() returns an array instead of an * When HTML5 tagmaker values is still * OpenStruct#hash now returns false You * extended method: cancalled, overwrite readCGI#header, themthe following improvements: * Added streaming support unless the second argument Array#shuffle! and or Float(). enumerator. permitted. * OpenStruct#to_h converts * Module#define_method for Zlib::Inflate and is true. Array#sample now * added Kernel#using, * String#bytes now returns CGI#header function is to * TLS 1.1 & 1.2 support by the struct to a hash. Zlib::Deflate. This allows accepts a UnboundMethod create a

element. * Improved support for will be called with one which imports refinements from a Module. an array instead of an setting * extended method: processing of a stream argument, maximum value. into the current scope. enumerator. OpenSSL::SSL::SSLContext#ssl_ default gems shipping with * Dir.mktmpdir in * Module#const_get * OpenStruct.new also ruby 2.0.0+ without the use of large * when given Range [experimental] * iconv version to accepts an OpenStruct / Struct. lib/tmpdir.rb accepts a qualified constant * A gem can have arbitrary amounts of memory. arguments, Array#values_at * added Kernel#__dir__ string, e.g. * Struct * Iconv has been removed. :TLSv1_2, :TLSv1_2_server, : * Added support for the new now returns nil for each TLSv1_2_client metadata through which returns a current * added method: Use String#encode instead. * pathname Gem::Specification#metadata deflate strategies Zlib::RLE and See above. value that is out-of-range. dirname. or :TLSv1_1, :TLSv1_1_server Object.const_get("Foo::Bar::B * added Struct#to_h * extended method: * `gem search` now defaults Zlib::FIXED. * added az") returning values with keys * io/wait :TLSv1_1_client. The version * Zlib streams are now * OpenStruct new methods being effectively used can be * Pathname#find returns an to --remote and is anchored * Enumerable Kernel#caller_locations which corresponding to the * new features: enumerator if no block is given. like gem list. processed without the GVL. can conflict with custom returns an array of queried * added method: * Mutex instance variable names. * added IO#wait_writable * Added --document to This allows gzip, zlib and attributes named frame information objects. with * added Enumerable#lazy * added method: method. * replace --rdoc and --ri. Use -- deflate streams to be "each_pair", "eql?", "hash" * extended method: OpenSSL::SSL#ssl_version. processed in parallel. or "to_h". method for lazy enumeration. * added Mutex#owned? * Thread * added IO#wait_readable Furthermore, it is also possible * rake has been updated to no-document to * Kernel#warn accepts which returns the mutex is * added method: method as alias of IO#wait. to version 0.9.5. disable documentation, -- multiple args in like puts. * Enumerator held by current * added blacklist the new TLS document=rdoc to only === Language changes * Thread#join, Thread#value * added method: * Kernel#caller accepts thread orat not. Thread#thread_variable_get NEWS * net/http versions with filegenerate rdoc. This version is backwards- second optional argument `n' [experimental] for getting thread local OpenSSL::SSL:OP_NO_TLSv1_1 * Only ri-format * Added %i and %I for symbol See above. * added Enumerator#size which specify * new features: compatible with previous rake for lazy size evaluation. * incompatible changes: variables and versions and documentation is generated list creation (similar to %w required caller size. * Proxies are now by default. * extended method: * Mutex#lock, (these are different than automatically detected from contains many bug fixes. and %W). * Mutex#lock, Mutex#unlock, * Kernel#to_enum and * `gem server` uses * Enumerator.new accept Mutex#unlock, Fiber local variables). the http_proxy environment OpenSSL::SSL::OP_NO_TLSv1_ Mutex#try_lock, enum_for accept a block for Mutex#try_lock, * added 2. RDoc::Servlet from RDoc 4.0 to * Default source encoding is Mutex#synchronize and an argument for lazy size lazy size evaluation. variable. See See generate HTML evaluation. Mutex#synchronize Thread#thread_variable_set Net::HTTP::new for details. * Added changed to UTF-8. (was US- Mutex#sleep * incompatible changes: documentation. and Mutex#sleep are no for setting thread local * gzip and deflate OpenSSL::SSL::SSLContext#ren ASCII) * system() and exec() variables. egotiation_cb. A user-defined http://rake.rubyforge.org/doc/ * ENV longer allowed to be used compression are now release_notes/rake- See above. closes non-standard file from trap handler * added callback For an expanded list of * aliased method: requested for all requests by 0_9_5_rdoc.html for a list === Compatibility issues descriptors and raise a ThreadError in Thread#thread_variables for may be set which gets called updates and bug fixes see: (excluding feature bug fixes) * ENV.to_h is a new alias default. See Net::HTTP for of changes in rake 0.9.3, (The default such case. getting a list of the thread details. whenever a new handshake is for ENV.to_hash of :close_others option is local negotiated. This 0.9.4 and 0.9.5. * Mutex#sleep may * SSL sessions are now https://github.com/rubygems/ * Array#values_at changed to true by default.) variable keys. also allows to /blob/master/Histor spurious wakeup. Check after reused across connections for * Fiber * respond_to? against a wakeup. * added programmatically decline * rdoc y.txt a single instance. See above. * incompatible changes: protected method now Thread#thread_variable? for (client) renegotiation attempts. * rdoc has been updated to returns false unless This speeds up connection version 4.0 * Fiber#resume cannot * NilClass testing to see if a particular by using a previously * Support for "0/n" splitting * shellwords resume a fiber which invokes the second argument is thread negotiated session. of records as BEAST mitigation * Shellwords#shellescape() * String#lines true. * added method: via Doubled from Nov/2012 17 Ruby 2.0 Main features

• Traces Internal – TracePoint Inspection – DTrace Ruby 2.0 introduces • Inspection huge inspection support – caller_locations – debug_inspector API • Memory inspection – ObjectSpace.reachable_objects_from(obj) – GC.stat[:total_allocated_object]

18 Internal Inspection features

• Generally, you don’t need to use these inspection features • If you got a trouble, please remember inspection features

Nobody knows, so I introduce them today

19 TracePoint

• OO designed set_trace_func • Usage # old style set_trace_func(lambda{|ev,file,line,id,klass,binding| puts “#{ev} #{file}:#{line}” }

# new style with TracePoint trace = TracePoint.trace{|tp| # access event info by methods puts “#{tp.event}, #{tp.path}:#{tp.line}”

} 20

TracePoint Flexible on/off trace = TracePoint.new{...} trace.enable do ... # enable trace only in this block end trace.enabe # enable trace after this point trace.disalbe{ ... # disable trace only in this block }

21 TracePoint Events • Same as set_trace_func – line – call/return, c_call/c_return – class/end – raise • New events (only for TracePoint) – thread_begin/thread_end – b_call/b_end

22 TracePoint Filtering events • TracePoint.new(events) only hook “events”

TracePoint.trace(:call, :return){...} ...

23 TracePoint Event info • Same as set_trace_func – event – path, lineno – defined_class, method_id – binding • New event info – return_value (only for retun, c_return, b_return) – raised_exception (only for raise)

24 TracePoint Advantages • Advantage of TracePoint compared with set_trace_func – OO style – On/Off – Lightweight • Creating binding object each time is too costly – Event filtering

25 DTrace

• Solaris, MacOSX FreeBSD and Linux has DTrace tracing features • Ruby support some events • See https://bugs.ruby- lang.org/projects/ruby/wiki/DTraceProbes – Not stable. Be careful this probe spec can be changed before and after Ruby 2.0 release.

Skip this section because I’m not an expert.

26 caller_locations

• caller() returns Backtrace strings array. – like ["t.rb:1:in `

'"] • caller_locations() retuns OO style backtrace information – caller_locations(0).each{|loc| p "#{loc.path}:#{loc.lineno}“} – No need to parse Backtrace string! • [advanced] caller and caller_locations support range and 2nd argument like Array#[] to specify how many backtrace information are needed

27 Debug inspection API

• Returns all bindings of current stack – Provided as C API – Debugger can use them

28 Memory inspection ObjectSpace.reachable_objects_from • ObjectSpace.reachable_objects_from(obj) returns reachable objects – Examples: (1) When obj is [“a”, “b”, “c”], returns [Array, “a”, “b”, “c”] (2) When obj is [“a”, “a”], returns [Array, “a”, “a”] (3) When obj is [a = “a”, a], returns [Array, “a”] (1) array (2) Array array (3) obj Array obj array Array obj “b” “c” “a” “a” “a” “a” 29 Memory inspection ObjectSpace.reachable_objects_from • You can analyze memory leak. ... Maybe. • Combination with ObjectSpace.memsize_of() (introduced at 1.9) is also helpful to calculate how many memories consumed by obj.

array Array obj 12 byte

“a” “a” DEMO 1 byte Total 14 bytes 1 byte

(this is fake example) 30 Memory inspection GC.stat[:total_allocated_object] • GC.stat returns implementation dependent GC (memory) usage by hash – :count means how many GC occurs • From 2.0, two information are added • :total_allocated_object • How many objects are allocated since interpreter launched • :total_freeed_object • How many objects are freed by GC. – Note that these numbers can be overflow. 31

Desirable behavior

100_000.times{|i| ""; # Generate an empty string h = GC.stat puts "#{i}¥t#{h[:total_allocated_object]}¥t#{h[:total_freed_object]}"} Leakey behavior

Live obj#

ary = [] 100_000.times{|i| ary << "" # generate an empty string and store (leak) h = GC.stat puts "#{i}¥t#{h[:total_allocated_object]}¥t#{h[:total_freed_object]}"} Internal Inspection features Again • Generally, you don’t need to use these inspection features • If you got a trouble, please remember inspection features

34 goto :next_topic

Change the title of this presentation to...

35 Lecture series of Computer Science How to make interpreter? #3 Method dispatch

Prof. Koichi Sasada (*1) Akihabara University (*2)

*1: Prof. means ... *2: Of course, joking. No such University  36 Review slide Requirement and Assumption • You need to finish “Ruby language basic” course • This course uses “Ruby” language/interpreter – One of the most popular languages – Used in world-wide programming • Web application • Text processing • and everything!! – CRuby • Ruby has many alternative implementations • CRuby has their own VM 37 Review slide How to implement virtual machine? • Execute instructions – Execute compiled instructions () – Pointed by “Program counter” (PC) • Stack machine architecture – All of values on the stack – Stack top is pointed by “Stack pointer” (SP) – V.S. Register machine architecture • Advantages and disadvantages • Yunhe Shi, et al: “Virtual machine showdown: stack versus registers” (2005) 38 Review slide Stack machine execution (basic)

Ruby Program b+ca a = b + c Compile b c YARV Instructions c getlocal b b+cb getlocal c local env YARV Stack send + setlocal a PC SP

39 Review slide [Advanced] Optimization techniques • Peephole optimizations (compiler technique) – Reduce instruction number • Make macro instructions – Operand unification – Instruction unification • Direct threading – Using GCC specific feature • Stack caching – n-level stack caching – Impact on CPU’s branch prediction

40 Review slide [Advanced] VM generator

VM generator enables VM Instrunction flexible VM building Description Dis-assembler

Assembler

Virtual Machine Documents

Compiler (Optimizer)

41 Today’s lecture: Method dispatch # Example recv.selector(arg1, arg2)

• recv: receiver • selector: method id • arg1, arg2: arguments

42 sp Before method dispatch

1. Evaluate `recv’ arg2 2. Evaluate `arg1’ and `arg2’ arg1 3. Method dispatch (`selector’) recv

Stack after #2 # Ruby’s disassembled bytecodes of Ruby 2.0 trunk 0016 getlocal recv, 0 # 1 receiver 0019 getlocal arg1, 0 # 2 arg1 0022 getlocal arg2, 0 # 2 arg2 0025 send

43 Method dispatch Overview 1. Get class of `recv’ (`klass’) 2. Search method `body’ named `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 3. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) 3. Store `PC’ and `SP’ to continue after method returning 4. Build `local environment’ 5. Set program counter 4. And continue VM execution

44 Overview Method search

• Search method from `klass’ BasicObject 1. Search method table of `klass’ 1. if method `body’ is found, return Kernel `body’ 2. `klass’ = super class of `klass’ and Object selector: body repeat it ... 2. If no method is given, Each Class has C1 exceptional flow method table C2 • In Ruby language, `method_missing’ will be called

45 Overview Cheking arity and visibility • Checking arity – Compare with given argument number and expected argument number • Checking visibility – In Ruby language, there are three visibilities (can you explain each of them ?:-p) • public • private • protected

46 Overview Building `local environment’ • How to maintain local variables? → Prepare `local variables space’ in stack → `local environment’ (short `env’) • Parameters are also in `env’

47 Overview Building `local environment’

sp

loc2 ep ep: env pointer sp loc1 env arg2 arg2 all local variables are accessible with arg1 arg1 `ep’ (ep[0], ep[-1], ep[-2], ...) recv recv Stack before Stack after method dispatch method dispatch 48 Method dispatch Overview (again) 1. Get class of `recv’ (`klass’) 2. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 3. Dispatch method with `body’ About 7 steps 1. Check visibility 2. Check arity (expected args # and given args #) 3. Store `PC’ and `SP’ to continue after method returning 4. Build `local environment’ 5. Set program counter It seems very easy 4. And continue VM execution and simple! and slow... 49 Method dispatch Ruby’s case • Quiz: How many steps in Ruby’s case? – Hint: More complex than I explained overview ① 8 steps ② 12 steps ③ 16 steps Answer is ④ ④ 20 steps About 20 steps

50 Method dispatch Ruby’s case

1. Check caller’s arguments 1. Check splat (*args) 2. Check block (given by compile time or block parameter (&block)) 2. Get class of `recv’ (`klass’) 3. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 4. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) and process 1. Post arguments 2. Optional arguments 3. Rest argument 4. Keyword arguments 5. Block argument 3. Push new control frame 1. Store `PC’ and `SP’ to continue after method returning 2. Store `block information’ 3. Store `defined class’ 4. Store info (iseq) ... simple? 5. Store recv as self 4. Build `local environment’ 5. Initialize local variables by `nil’ 6. Set program counter 5. And continue VM execution (*) Underlined items are additonal process 51 Ruby’s case 4. Dispatch method with `body’ • Previous explanation is for Ruby methods – `body’ (defined as rb_method_definition_t in method.h) has several types at least the following two types: • Method defined by Ruby code • Method defined by C function (in C-extension) • Quiz: How many method types in CRuby? – Hint: At least 2 types (Ruby method and C method) ① 3 types ② 6 types Answer is ③ 9 types ④ ④ 11 types About 11 types

52 Ruby’s case Method types 1. VM_METHOD_TYPE_ISEQ: Ruby method (using `def’ keyword) 2. VM_METHOD_TYPE_CFUNC: C method 3. VM_METHOD_TYPE_ATTRSET: defined by @attr_accessor 4. VM_METHOD_TYPE_IVAR: defined by @attr_reader 5. VM_METHOD_TYPE_BMETHOD: defind by `define_method’ 6. VM_METHOD_TYPE_ZSUPER: used in internal 7. VM_METHOD_TYPE_UNDEF: `undef’ed method 8. VM_METHOD_TYPE_NOTIMPLEMENTED: not implemet 9. VM_METHOD_TYPE_OPTIMIZED: optimization 10. VM_METHOD_TYPE_MISSING: method_missing type 11. VM_METHOD_TYPE_CFUNC_FRAMELESS: optimization two There are 11th different method dispatch procedure (dispatch by switch/case statement) 53 Ruby’s case

• Quiz: I introduce (virtual) registers `pc’, `sp’ and `ep’. How many registers in virtual machine (in Ruby 1.9.x)? ① 4 registers Answer is ② 6 registers About ④ 11 registers ③ 9 registers ↓ ④ 11 registers Need to store/restore 11 registers each method call

54 Ruby’s case Store registers • Introduce “control frame stack” to store registers – To store `pc’, `sp’, `ep’ and other information, VM has another stack named “control frame stack” – Not required structure, but it makes VM simple → Easy to maintain

11 regs 10 regs /* 1.9.3 */ /* 2.0 */ reduced, but many yet typedef struct { typedef struct { VALUE *pc; /* cfp[0] */ VALUE *pc; /* cfp[0] */ VALUE *sp; /* cfp[1] */ VALUE *sp; /* cfp[1] */ VALUE *bp; /* cfp[2] */ rb_iseq_t *iseq; /* cfp[2] */ rb_iseq_t *iseq; /* cfp[3] */ VALUE flag; /* cfp[3] */ VALUE flag; /* cfp[4] */ VALUE self; /* cfp[4] / block[0] */ VALUE self; /* cfp[5] / block[0] */ VALUE klass; /* cfp[5] / block[1] */ VALUE *lfp; /* cfp[6] / block[1] */ VALUE *ep; /* cfp[6] / block[2] */ VALUE *dfp; /* cfp[7] / block[2] */ rb_iseq_t *block_iseq; /* cfp[7] / block[3] */ rb_iseq_t *block_iseq; /* cfp[8] / block[3] */ VALUE proc; /* cfp[8] / block[4] */ VALUE proc; /* cfp[9] / block[4] */ const rb_method_entry_t *me;/* cfp[9] */ const rb_method_entry_t *me;/* cfp[10] */ } } rb_control_frame_t; 55 pushed values

locals rb_thread_t::cfp points current control frame spval SP PC self args BP me LFP pushed iseq DFP foo block values block iseq iseq def foo flags locals proc bar{ iseq SP .. PC spval BP } me end args flags self def bar block bar iseq yield pushed LFP iseq end values DFP proc foo locals SP PC BP self spval me LFP iseq args DFP block foo iseq iseq pushed flags proc values SP PC flipflo p locals BP self data me local[0] LFP per method $~ iseq spval DFP block iseq Ruby 1.9 VM stacks structure $_ args flags proc top iseq 56 Value stack Control frame Ruby’s case Complex parameter checking • “def foo(m1, m2, o1=..., o2=..., p1, p2, *rest, &block)” – m1, m2: mandatory parameter – o1, o2: optional parameter – p1, p2: post parameter – rest: rest parameter – block: block parameter • From Ruby 2.0, keyword parameter is supported

57 Method dispatch Ruby’s case

1. CHeck caller’s arguments 1. Check splat (*args) 2. Check block (given by compile time or block parameter (&block)) 2. Get class of `recv’ (`klass’) 3. Search method `body’ `selector’ from `klass’ – Method is not fixed at compile time – “Dynamic” method dispatch 4. Dispatch method with `body’ 1. Check visibility 2. Check arity (expected args # and given args #) and process 1. Post arguments 2. Optional arguments 3. Rest argument 4. Keyword arguments 5. Block argument 3. Push new control frame 1. Store `PC’ and `SP’ to continue after method returning Complex 2. Store `block information’ 3. Store `defined class’ 4. Store bytecode info (iseq) and 5. Store recv as self 4. Build `local environment’ Slow!!! 5. Initialize local variables by `nil’ 6. Set program counter 5. And continue VM execution 58 Method dispatch Method dispatch overhead is big Overhead especially on micro-benchmarks  Fib Pentomino

Others Others NotRuby NotRuby Others Others Cfunc Insn Cfunc CompileOthersBlockMMGC Obj

Insn MM Obj VM Method VM GC CompileInst

Block Method

OS: Linux 2.6.31 32-bit CPU: IntelCore2Quad 2.66GHz ruby 1.9.3dev (2010-05-26) Mem: 4GB C Compiler: GCC 4.4.1, -O3 Profiled by Mr. Shiba 59 Profiled by Oprofile Homework • Report about “Method Dispatch speedup techniques” 1. Analyze method dispatch overhead on your favorite application 2. Survey method dispatch speed-up techniques 3. Propose your optimization techniques to improve method dispatch performance 4. Implement techniques and evaluate their performance

• Deadline: 2012/12/23 (Sun) 23:59 JST • Submit to: Koichi Sasada • This report is important for your grade of this course!

60 Lecture was finished 

Presentation is not finished

61 Back to the presentation “Implementation Details of Ruby 2.0 VM”

Report “Optimization techniques for Ruby’s method dispatch”

Koichi Sasada

62 Speedup techniques for method dispatch 1. Specialized instructions Note that these optimizations 2. Method caching may not be my original. 3. Caching checking results 4. Frameless CFUNC method 5. Special path for `send’ and `method_missing’

Introduced techniques from Ruby 2.0 Today’s main subject  63 Method dispatch overheads 1.Check caller’s arguments 2.Search method `body’ `selector’ from `klass’ 3.Dispatch method with `body’ 1. Check visibility and arity 2. Push new control frame 3. Build `local environment’ 4. Initialize local variables by `nil’ 64

Optimization Specialized instruction (from 1.9) • Make special VM instruction for several methods – +, -, *, /, ...

def opt_plus(recv, obj) if recv.is_a(Fixnum) and obj.is_a(Fixnum) and Fixnum#+ is not redefined return Fixnum.plus(recv, obj) else return recv.send(:+, obj) # not prepared end end 65 Optimization Specialized instruction • Pros. – Eliminate all of dispatch cost (very effective) • Cons. – Limited applicability • Limited classes, limited selectors • Tradeoff of VM instruction numbers – Additional overhead when not prepared class

66 Optimization Method caching • Eliminate method search overhead – Reuse search result BasicObject – Invalidate cache entry with VM stat • Two level method caching Kernel – Inline method caching – Global method caching Object

method search class, id => body C1 class, id => body miss .... class => miss fill class, id => body C2 body return fill Inline cache Global cache naive search 1 element per call-site hash table 67 Optimization Caching checking results (from 2.0) • Idea: Visibility and arity check can be skipped after first checking – Store result in inline method cache 1. Check caller’s arguments

2. Search method `body’ `selector’ from `klass’

3. Dispatch method with `body’ First time First 1. Check visibility and arity

Second Second time 1. Cache result into inline method cache 2. Push new control frame 3. Build `local environment’

4. Initialize local variables by `nil’ 68

Optimization Frameless CFUNC (from 2.0) • Introduce “Frameless” CFUNC methods – Idea: Several CFUNC doesn’t need method frame • For example, String#length doesn’t need method frame. It only return the size of given String 1. Check caller’s arguments 2. Search method `body’ `selector’ from `klass’ 3. Dispatch method with `body’

1. Check visibility and arity

2. Push new control frame

3. Build `local environment’ Skip here Skip 4. Initialize local variables by `nil’ 69 Optimization Eliminate frame building (from 2.0) • Compare with specialized instruction – Pros. • You can define unlimited number of frameless methods – Cons. • A bit slow compare with specialized instruction • Note that evaluation result I will show you doesn’t include this technique Optimization Special path for `send’ and `method_missing’ (from2.0)

Before After

send(:foo) send(:foo)

invoke send method search `foo’ invoke `foo’ method (cfunc)

invoke `foo’ method

71 Evaluation result Micro benchmarks Faster than first date 2

vm1_attr_ivar*

1.5 vm1_attr_ivar_set* vm1_block* vm1_simplereturn* vm1_yield* 1 vm2_defined_method* vm2_method* vm2_method_missing* vm2_method_with_block* Speedup ratio Speedupratio 0.5 vm2_poly_method* vm2_send* vm2_super* 0 vm2_zsuper*

trunk 2012/10/13 trunk 2012/10/31 72 Evaluation results Applications Faster than first date 1.4

1.2

app_aobench 1 app_erb app_factorial 0.8 app_fib app_mandelbrot 0.6 app_pentomino app_raise

0.4 app_strconcat Speedup ratio Speedupratio app_tak 0.2 app_tarai app_uri 0

trunk 2012/10/13 trunk 2012/10/31 73 Future work

• Restructure “method frame” – Reduce required information per frame • Improve “yield” performance – Using something cached

74 Conclusion Method dispatch speed-up • Ruby’s method dispatch is nightmare – Too complex • Speedup upto 50% at simple method dispatch with new optimizations • Need more effort to achieve performance improvements

75 Other optimizations from 2.0

• Introducing Flonum (only on 64bit OSs) • Lightweight Backtrace capturing • Re-structure VM stacks/ISeq data

• Bitmap marking garbage collection (by nari3) • “require” performance (not by me)

76 Introducing Flonum (only on 64bit CPU) • Problem: Float objects are not immediate on Ruby 1.9 – It causes GC overhead problem • To speedup floating calculation, represent Float object as immediate object – Specified range Float objects are represented as immediate object (Flonum) like Fixnum • 1.72723e-77 < |f| < 1.15792e+77 (approximately) and +0.0 • Out of this range and all Floats on 32bit CPU are allocated in heap – No more GCs! (in most of case) – Flonum and old Float are also Float classes – Proposed by [K.Sasada 2008] – On 64bit CPU, object representation was changed

77 Benchmark results

3

2.5

2

1.5

1 clean/flonum clean/flonum.z 0.5

0

app_tak

so_fasta

so_sieve

loop_for

so_array

io_select

so_nbody

so_pidigits

vm2_case*

vm2_array*

vm1_block*

app_answer

vm2_super*

vm2_raise2*

vm1_length*

app_factorial

vm1_rescue*

io_file_create

vm2_method*

vm1_lvar_set*

so_nsieve_bits

so_mandelbrot

vm3_backtrace

loop_whileloop

app_pentomino

vm1_float_calc*

vm_thread_pass

so_count_words

vm3_thread_mutex vm_thread_mutex1 vm2_poly_method_ov*

78 Flonum: Float in Heap (1.9 or before)

All of Float object On 64bit CPU are allocated in heap HEAD - T_FLOAT - Float Data structure in heap - etc contains IEEE754/double VALUE IEEE754 Double 8B x 6w = - 48 byte for Float object - Flonum: Encoding IEEE754 double floating number 64bit double

b63 b62-b52 b51-b0 s: sign(1bit) e: exponent (11bit) m: mantissa (52bit)

80 Flonum: Range

IEEE754 double b63 b60-b62 b60-b52 b51-b0

Check if e (b52 to b62) is with-in 768 to 1279, then it can be represent in Flonum.

This check can be done with b60-b62.

(+0.0 (0x00) is special case to detect)

81 Flonum: Encoding

IEEE754 double b61 b63 b60-b52 b51-b0 b62

Only “rotate” and “mask”

Ruby’s Flonum

b60-b52 b51-b0 b63 1 0

Flonum representation bits (2 bits) #define FLONUM_P(v) ((v&3) == 2)

☆ +0.0 is special case (0x02)

82 Flonum: Object representation on VALUE Non Flonum Flonum Fixnum ...xxxx xxx1 ...xxxx xxx1 Flonum N/A ...xxxx xx10 Symbol ...xxxx 0000 1110 ...xxxx 0000 1100 Qfalse ...0000 0000 ...0000 0000 Qnil ...0000 0100 ...0000 1000 Qtrue ...0000 0010 ...0001 0100 Qundef ...0000 0110 ...0011 0100 Pointer ...xxxx xx00 ....xxxx x000

83 Lightweight Backtrace capturing

• Backtrace is Array of String objects – [“file:lineno method”, ...] • Idea: Capture only ISeqs and translate to String (file and line) only when it is accessed – Backtrace information may be ignored

iseq1 String

iseq2 String

・・・・・・

iseq_n String 84 After Ruby 2.0

What we should do?

85 Performance overhead rdoc Rails

Others Others Insn Insn Method CompileOthersBlock NotRuby VM GC VM Method Obj MM OthersBlock NotRuby Others Compile

Cfunc Obj GC Others

MM Cfunc OS: Linux 2.6.31 32-bit CPU: IntelCore2Quad 2.66GHz ruby 1.9.3dev (2010-05-26) Mem: 4GB C Compiler: GCC 4.4.1, -O3 Profiled by Mr. Shiba 86 Profiled by Oprofile VM techniques

• On Rails and other applications, VM is not an bottleneck • On Mathematic, Symbolic computation, VM is matter – To speedup then, we need compilation framework • 2.0?

87 Object Allocation and garbage collection • Lightweight object allocation – Sophisticate object creation – Create objects in non-GC managed area • Sophisticate Garbage collection – Per-type garbage collection – Generational garbage collection • Introduce write barriers with dependable techniques

88 Parallelization

• Multiple processes • Multiple VMs • Multiple Threads

No time and space to discuss about them!

89 Conclusion

90 Conclusion Our challenge has just begun!!

俺たちの戦いはまだ始まったばかりだ!

91 謝謝 Thank you for your attention

笹田 耕一 Koichi Sasada Heroku, Inc.

[email protected] @koichisasada 92