Specializing Ropes for Ruby
Total Page:16
File Type:pdf, Size:1020Kb
Specializing Ropes for Ruby Authors removed for review Abstract The choice of string representation must strike a balance Ropes are a data structure for representing character strings between potentially competing concerns. Even modest im- via a binary tree of operation-labeled nodes. Both the nodes provements to either string memory consumption or perfor- and the trees constructed from them are immutable, mak- mance of common string operations could have a tremen- ing ropes a persistent data structure. Ropes were designed dous effect on overall efficiency. to perform well with large strings, and in particular, con- The predominant string representation is as a thin veneer catenation of large strings. We present our findings in us- over a contiguous byte array. As a consequence, the runtime ing ropes to implement mutable strings in JRuby+Truffle, an performance of string operations in such systems follows implementation of the Ruby programming language using that of a byte array while the use cases may not. a self-specializing abstract syntax tree interpreter and dy- The Cedar programming language [4] introduced ropes [1] namic compilation. We extend ropes to support Ruby lan- as an alternative string representation in order to better match guage features such as encodings and refine operations to the design goals of the language. By modeling strings as an better support typical Ruby programs. We also use ropes to immutable binary tree where the leaves represent sequences work around underlying limitations of the JVM platform in of characters and interior nodes represent lazy operations, representing strings. Finally, we evaluate the performance of such as concatenation, the language was able to drastically our implementation of ropes and demonstrate that they per- alter the cost of common string operations. However, in or- form 0.9x – 9.4x as fast as byte array-based string represen- der to satisfy performance goals for different use cases, ropes tations in representative benchmarks. were introduced as a data type distinct from strings with lim- ited compatibility between the two. In summary, this paper Categories and Subject Descriptors D.3.4 [Programming makes the following contributions: Languages]: Processors—Run-time environments • We apply the ropes data structure to represent strings in the Ruby programming language and show how self- Keywords Virtual Machine, Interpreter, Ropes, Strings, optimizing AST interpreters facilitate their implementa- Truffle, Graal, Ruby, Java tion. • We introduce new lazy operations that can be represented 1. Introduction as nodes in a rope. Strings of character data are one of the most frequently used • We show how ropes allow further optimizations to meta- data types in general purpose programming languages. Pro- programming operations that are heavily used in id- viding a textual representation of natural language, they are a iomatic Ruby. convenient form for interacting with users at I/O boundaries. • We demonstrate that ropes, an immutable data structure, They are often used for internal system operations as well, have utility in languages with mutable string data types. particularly where program identifiers are conveniently rep- • We show that ropes can reduce the cost of string opera- resented as strings, such as in metaprogramming. Some lan- tions involving multi-byte characters. guages rely so heavily on strings that they are glibly referred to as “stringly typed.” 2. Background This section discusses traditional string representations and ropes in more detail, along with the complications intro- duced by the presence of multi-byte characters. We then present Ruby’s string semantics. Finally, we introduce our execution environment: JRuby+Truffle. 2.1 String Representation The classical representation of strings is as an array of bytes [Copyright notice will appear here once ’preprint’ option is removed.] with an optional terminal character. These strings require 1 2016/4/23 contiguous memory cells and optimize for compactness. other type of encoding, some of the advantages inherent Consequently, they have the same advantages arrays have, to byte arrays, such as constant-time character access, are such as constant-time element access and data pre-fetching. muted. By extending this simple representation with additional Ropes, as implemented in Cedar, suffer from some of metadata, such as string byte length or head offsets, the set the same encoding issues as their byte array counterparts. of fast operations available to strings can be expanded. E.g., E.g., accessing a character by index cannot be performed string truncation can simply update the length value rather by a simple binary search. Instead, a full treewalk must be than require a new array allocation and byte copy. performed so the byte sequences can be processed. The byte array representation is not well-suited for all applications, however. In some languages, such as Java, the 2.4 Ruby length of an array may have an upper-bound that is smaller Ruby1 is an object-oriented, dynamically-typed program- than addressable memory, placing a similar limit on the ming language with mutable, encoding-aware strings as a length of a string. Requiring contiguous memory may also core data type. Ruby strings also serve as the language’s byte prevent allocations if a free block cannot be found. Addition- array type via a special binary encoding. ally, operations that require byte copying, such as string con- Ruby strings can be interned to another data type known catenation, may lead to excessive memory fragmentation. as a symbol. Due to their performance advantages and more compact syntax, symbols are used frequently in idiomatic 2.2 Ropes Ruby. However, the symbol API is deliberately quite limited Ropes [1] are a persistent data structure [5] that can be used and symbols cannot be easily composed without construct- as an alternative implementation of strings. They were de- ing a string as a temporary buffer. Additionally, symbols signed for the Cedar programming language [4], with the were not eligible for garbage collection in Ruby versions explicit goal of improving runtime performance over large prior to 2.2.0. Subsequently, Ruby programs often consist strings. They were inspired by other immutable string repre- of a mixture of strings and symbols to perform similar tasks. sentations, but introduced the efficient sequencing of opera- The programmer must decide which type to use as a careful tions by representing them as operation-labeled nodes linked balance between code concerns, such as concision and value together in a binary tree. safety, and VM concerns, such as performance, garbage col- Since ropes are chained together via pointers, the upper- lection, and memory reuse. bound on the length of a rope is the total amount of address- able memory. Moreover, the nodes in a rope do not need to 2.5 JRuby+Truffle reside in contiguous memory. The price of this flexibility is Due to its rich features, such as metaprogramming, com- a heavier base data structure. Depending on the application, prehensive collection APIs, and a pure object-oriented de- however, it may be possible to recuperate that cost via de- sign, Ruby has been an attractive target for language im- duplication, as the immutable nature of ropes means they plementors and researchers. JRuby2 is an open source, al- can be reused by the runtime. ternative implementation of Ruby that uses its own com- Beyond the additional overhead of the data structure, pilation phases to apply Ruby-level optimizations before ropes suffer from some inefficiencies that mutable byte ar- generating Java bytecode, including extensive use of the rays do not. In the pathological case of transforming each invokedynamic instruction. JRuby+Truffle [7] is devel- character in a string, the rope would devolve to an allocated oped as part of the JRuby project, providing an alternative node for each character. The Cedar environment employed backend for the JRuby runtime that eschews the guest lan- heuristics to recognize a handful of such problematic usage guage compiler in favor of a self-optimizing AST interpreter patterns and provide specialized operations for them. written using the Truffle framework [9]. To achieve peak per- 2.3 Encodings formance, JRuby+Truffle must be paired with GraalVM [8] — a convenient distribution that bundles together OpenJDK The byte array representation of strings is optimal as long as and the Graal dynamic compiler. characters can be represented within the space of a byte. As JRuby+Truffle achieves much of its performance via spe- computers have broadened their reach to a more diverse au- cialized implementations of core Ruby methods. Most often dience, the set of characters that must be representable has these specializations are based upon argument types so that grown well beyond the 256 afforded by a byte. Encodings polymorphic calls can be distilled down to their constituent have been developed as a means of expanding the set of rep- cases without incurring the overhead of unused code paths. resentable characters by interpreting byte sequences, rather However, it can also specialize on attributes associated with than individual bytes, as characters. These byte sequences values. In our implementation, the various rope cases are can either be fixed-width or variable-width. distinct types allowing for type specialization, but they also When encodings are taken into account, traditional string representations can be seen as being optimized for fixed- 1 Ruby, Yukihiro Matsumoto and others, https://www.ruby-lang.org/ width encodings with a character width of 1 byte. For any 2 JRuby, Charles Nutter, Thomas Enebo and others, http://jruby.org/ 2 2016/4/23 track metadata that we can specialize on, such as character The reference Ruby implementation, MRI, is written in width. Being able to specialize on both structure and data C and uses a pointer and length field. In some cases the allows us to tailor the code generated for each call site. pointer can be copy-on-write shared by referring to the same character array, however this is only implemented for simple 3.