The Ruby Intermediate Language

The Ruby Intermediate Language

The Ruby Intermediate Language Michael Furr Jong-hoon (David) An Jeffrey S. Foster Michael Hicks Department of Computer Science University of Maryland College Park, MD 20742 {furr,davidan,jfoster,mwh}@cs.umd.edu ABSTRACT mantics that includes a significant amount of special case, Ruby is a popular, dynamic scripting language that aims to implicit behavior. While the resulting language is arguably \feel natural to programmers" and give users the \freedom to easy to use, its complex syntax and semantics make it hard choose" among many different ways of doing the same thing. to write tools that work with Ruby source code. While this arguably makes programming in Ruby easier, it In this paper, we describe the Ruby Intermediate Lan- makes it hard to build analysis and transformation tools that guage (RIL), a parser and intermediate representation de- operate on Ruby source code. In this paper, we present the signed to make it easy to extend, analyze, and transform Ruby Intermediate Language (RIL), a Ruby front-end and Ruby source code. As far as we are aware, RIL is the only intermediate representation that addresses these challenges. Ruby front-end designed with these goals in mind. RIL is Our system includes an extensible GLR parser for Ruby, and written in OCaml, which provides strong support for work- an automatic translation into RIL, an easy-to-analyze inter- ing with the RIL data structure, due to its data type lan- mediate form. This translation eliminates redundant lan- guage and pattern matching features. guage constructs, unravels the often subtle ordering among RIL provides four main advantages for working with Ruby side effecting operations, and makes implicit interpreter op- code. First, RIL's parser is completely separated from the erations explicit in its representation. Ruby interpreter, and is defined using a Generalized LR We demonstrate the usefulness of RIL by presenting a sim- (GLR) grammar, which makes it much easier to modify and ple static analysis and source code transformation to elimi- extend. In particular, it was rather straightforward to ex- nate null pointer errors in Ruby programs. We also describe tend our parser grammar to include type annotations, a key several additional useful features of RIL, including a pretty part of DRuby. (Section 2.) Second, RIL translates many printer that outputs RIL as syntactically valid Ruby code, redundant syntactic forms into one common representation, a dataflow analysis engine, and a dynamic instrumentation reducing the burden on the analysis writer. For example, library for profiling source code. We hope that RIL's fea- Ruby includes four different variants of if-then-else (stan- tures will enable others to more easily build analysis tools dard, postfix, and standard and postfix variants with unless), for Ruby, and that our design will inspire the creation of and all four are represented in the same way in RIL. Third, similar frameworks for other dynamic languages. RIL makes Ruby's (sometimes quite subtle) order of evalua- tion explicit by assigning intermediate results to temporary variables, making flow-sensitive analyses like data flow anal- 1. INTRODUCTION ysis simpler to write. Finally, RIL makes explicit much of Ruby is a popular, object-oriented, dynamic scripting lan- Ruby's implicit semantics, again reducing the burden on the guage inspired by Perl, Python, Smalltalk, and LISP. Over analysis designer. For example, RIL replaces empty Ruby the last several years, we have been developing tools that in- method bodies by return nil to clearly indicate their behav- volve static analysis and transformation of Ruby code. The ior. (Section 3.) most notable example is Diamondback Ruby (DRuby), a In addition to the RIL data structure itself, our RIL imple- system that brings static types and static type inference to mentation has a number of features that make working with Ruby [4, 3]. RIL easier. RIL includes an implementation of the visitor As we embarked on this project, we quickly discovered pattern to simplify code traversals. The RIL pretty printer that working with Ruby code was going to be quite chal- can output RIL as executable Ruby code, so that trans- lenging. Ruby aims to \feel natural to programmers" [9] by formed RIL code can be directly run. To make it easy to providing a rich syntax that is almost ambiguous, and a se- build RIL data structures (a common requirement of trans- formations, which often inject bits of code into a program), RIL includes a partial reparsing module [?]. RIL also has a dataflow analysis engine, and extensive support for run- Permission to make digital or hard copies of all or part of this work for time profiling. We have found that profiling dynamic feature personal or classroom use is granted without fee provided that copies are use and reflecting the results back into the source code is a not made or distributed for profit or commercial advantage and that copies good way to perform static analysis in the presence of highly bear this notice and the full citation on the first page. To copy otherwise, to dynamic features, such as eval [3]. (Section 4.) republish, to post on servers or to redistribute to lists, requires prior specific Along with DRuby [4, 3], we have used RIL to build permission and/or a fee. DLS ’09 Orlando, Florida USA DRails, a tool that brings static typing to Ruby on Rails Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. applications (a work in progress). In addition, several stu- Even though the assignment on line 3 will never be executed, dents in a graduate class at the University of Maryland used its existence causes Ruby's parser to treat x as a local vari- RIL for a course project. The students were able to build a able from there on. At run-time, the interpreter will initial- working Ruby static analysis tool within a few weeks. These ize x to nil after line 3, and thus executing x + 2 on line 4 experiences lead us to believe that RIL is a useful and ef- is an error. In contrast, if line 3 were removed, x + 2 would fective tool for analysis and transformation of Ruby source be interpreted as x() + 2, evaluating successfully to 6. (Pro- code. We hope that others will find RIL as useful as we have, grammers might think that local variables in Ruby must be and that our discussion of RIL's design will be valuable to initialized explicitly, but this example shows that the parsing those working with other dynamic languages with similar context can actually lead to implicit initialization.) features. RIL is available as part of the DRuby distribution As a second parsing challenge, consider the code at http://www.cs.umd.edu/projects/PL/druby. 6 f () do jxj x + 1 end 2. PARSING RUBY Here we invoke the method f, passing a code block (higher- The major features of Ruby are fairly typical of dynamic order method) as an argument. In this case the code block, scripting languages. Among other features, Ruby includes delimited by do ... end, takes parameter x and returns x + 1. object-orientation (every value in Ruby is an object, includ- It turns out that code blocks can be used by several dif- ing integers); exceptions; extensive support for strings, reg- ferent constructs, and thus their use can introduce potential ular expressions, arrays, and hash tables; and higher-order ambiguity. For example, the statement programming (code blocks). We assume the reader is famil- 7 for x in 1..5 do jxj puts x end iar with, or at least can guess at, the basics of Ruby. An introduction to the language is available elsewhere [10, 2]. prints the values 1 through 5. Notice that the body of for is The first step in analyzing Ruby is parsing Ruby source. also a code block|and hence if we see a call One option would be to use the parser built in to the Ruby interpreter. Unfortunately, that parser is tightly integrated 8 for x in f () do ... end ... with the rest of the interpreter, and uses very complex parser actions to handle the near-ambiguity of Ruby's syntax. We then we need to know whether the code block is being passed felt these issues would make it difficult to extend Ruby's to f() or is used as the body of the for. (In this case, the parser for our own purposes, e.g., to add a type annotation code block is associated with the for.) language for DRuby. Of course, such ambiguities are a common part of many Thus, we opted to write a Ruby parser from scratch. The languages, but Ruby has many cases like this, and thus using fundamental challenge in parsing Ruby stems from Ruby's standard techniques like refactoring the grammar or using goal of giving users the \freedom to choose" among many operator precedence parsing would be quite challenging to different ways of doing the same thing [11]. This philoso- maintain. phy extends to the surface syntax, making Ruby's grammar To meet these challenges and keep our grammar as clean highly ambiguous from an LL/LR parsing standpoint. In as possible, we built our parser using the dypgen general- fact, we are aware of no clean specification of Ruby's gram- ized LR (GLR) parser generator, which supports ambigu- mar.1 Thus, our goal was to keep the grammar specification ous grammars [8]. Our parser uses general BNF-style pro- as understandable (and therefore as extensible) as possible ductions to describe the Ruby grammar, and without fur- while still correctly parsing all the potentially ambiguous ther change would produce several parse trees for conflicting cases.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us