Ometa: an Object-Oriented Language for Pattern Matching

OMeta: an Object-Oriented Language for Pattern Matching Alessandro, Warth, Ian Piumarta VPRI Technical Report TR-2007-003 Viewpoints Research Institute, 1209 Grand Central Avenue, Glendale, CA 91201 t: (818) 332-3001 f: (818) 244-9761 OMeta: an Object-Oriented Language for Pattern Matching ∗ Alessandro Warth Ian Piumarta Computer Science Department Viewpoints Research Institute University of California, Los Angeles [email protected] and Viewpoints Research Institute [email protected] Abstract against a grammar—which itself is a collection of produc- This paper introduces OMeta, a new object-oriented lan- tions, or patterns—to produce parse trees. Several other guage for pattern matching. OMeta is based on a variant of tasks, such as constant folding and (na¨ıve) code generation, Parsing Expression Grammars (PEGs) [5]—a recognition- can be implemented by pattern matching on parse trees. based foundation for describing syntax—which we have Despite the fact that these are all instances of the same extended to handle arbitrary kinds of data. We show that problem, most compiler writers use a different tool or tech- OMeta’s general-purpose pattern matching provides a nat- nique (e.g., lex, yacc, and the visitor design pattern [6]) to ural and convenient way for programmers to implement implement each compilation phase. As a result, the skill of tokenizers, parsers, visitors, and tree transformers, all of programming language implementation has a steep learning which can be extended in interesting ways using familiar curve (because one must learn how to use a number of dif- object-oriented mechanisms. This makes OMeta particularly ferent tools) and is not widely understood. well-suited as a medium for experimenting with new designs Several popular programming languages—ML, for in- for programming languages and extensions to existing lan- stance—include support for pattern matching. Unfortu- guages. nately, while ML-style pattern matching is a great tool for processing structured data, it is not expressive enough on its Categories and Subject Descriptors D.1.5 [Programming own to support more complex pattern matching tasks such Techniques]: Object-Oriented Programming; D.3.3 [Pro- as lexical analysis and parsing. gramming Languages]: Language Constructs and Features Perhaps by providing programming language support for a more general form of pattern matching, many useful tech- General Terms Design, Languages niques such as parsing—a skill more or less exclusive to Keywords pattern matching, parsing, metacircular imple- “programming languages people”—might become part of mentation the skill-set of a much wider audience of programmers. (Consider how many Unix applications could be improved 1. Introduction if suddenly their implementors had the ability to process more interesting configuration files!) This is not to say that Many problems in computer science, especially in program- general-purpose pattern matching is likely to subsume spe- ming language implementation, involve some form of pat- cialized tools such as parser generators; that would be diffi- tern matching. Lexical analysis, for example, consists of cult to do, especially in terms of performance. But as we will finding patterns in a stream of characters to produce a stream show with various examples, general-purpose pattern match- of tokens. Similarly, a parser matches a stream of tokens ing provides a natural and convenient way to implement tokenizers, parsers, visitors, and tree transformers, which makes ∗ This material is based upon work supported by the National Science Foun- dation under Grant No. 0639876. Any opinions, findings, and conclusions it an unrivaled tool for rapid prototyping. or recommendations expressed in this material are those of the authors and This work builds on Parsing Expression Grammars (PEGs) do not necessarily reflect the views of the National Science Foundation. [5] (a recognition-based foundation for describing syntax) as a basis for general-purpose pattern matching, and makes the following technical contributions: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute 1. a generalization of PEGs that can handle arbitrary kinds to lists, requires prior specific permission and/or a fee. DLS’07, October 22, 2007, Montreal,´ Quebec,´ Canada. of data (i.e., not just streams of characters) and supports Copyright c 2007 ACM 978-1-59593-868-8/07/0010. $5.00 ! parameterized and higher-order productions (Section 2), expression meaning meta E { e1 e2 sequencing dig ::= ’0’ | ... | ’9’; e1 | e2 prioritized choice num ::= <dig>+; e∗ zero or more repetitions fac ::= <fac> ’*’ <num> e+ one or more repetitions (not essential) | <fac> ’/’ <num> e negation | <num>; <∼p> production application exp ::= <exp> ’+’ <fac> ’x’ matches the character x | <exp> ’-’ <fac> | <fac>; Table 1. Inductive definition of the language of parsing ex- } pressions (assumes that e, e , and e are parsing expressions, 1 2 Figure 1. A PEG, written in OMeta, that recognizes simple and that p is a non-terminal). arithmetic expressions. 2. a simple yet powerful extensibility mechanism for PEGs Semantic actions are specified using the => operator and (Section 3), written in OMeta’s host language, which is usually the 3. the design and implementation of OMeta, a programming language in which the OMeta implementation was writ- language with convenient BNF-like syntax that embodies ten. We currently have two implementations: one written 2 (1) and (2), and in COLA [11], and another in Squeak Smalltalk , both of which can be downloaded from http://www.cs.ucla. 4. a series of examples that demonstrate how our general- edu/ awarth/ometa/. purpose pattern matching facilities may be used in the ∼ domain of programming language implementation. Important: All of the examples in this paper were writ- The rest of this paper explores our notion of general-purpose ten for the COLA implementation of OMeta; if you are pattern matching in the context of OMeta. not familiar with COLA, try to think of it as a cross be- tween Scheme and Smalltalk. 2. OMeta: an extended PEG The syntax of the functional parts of the language is very An OMeta program is a Parsing Expression Grammar (PEG) similar to Scheme, e.g., (+ 1 2 (f 5)). that can make use of a number of extensions in order to handle arbitrary kinds of data (PEGs are limited to processing COLA message sends look similar to Smalltalk’s, but are streams of characters). This section begins by introducing written in square brackets, e.g., [jimmy eat: banana]. the features that OMeta and PEGs have in common, and then Note that each []-expression represents a single message describes some of OMeta’s extensions to PEGs. send; the COLA translation of the Smalltalk message Array new: x size is [Array new: [x size]]. 2.1 PEGs, OMeta Style Refer to http://piumarta.com/pepsi/coke.html Parsing Expression Grammars (PEGs) [5] are a recognition- for a more detailed description of COLA syntax. based foundation for describing syntax. A PEG is a collection of productions of the form non-terminal parsing- expression; the language of parsing expressions→is shown in Here is one way our grammar’s exp production might be Table 1. modified in order to create parse trees (the other productions To avoid ambiguities that arise from using a non-de- in our grammar would have to be modified accordingly): terministic choice operator (the kind of choice found in exp ::= <exp>:x ’+’ <fac>:y => ‘(+ ,x ,y) CFGs), PEGs only support prioritized choice. In other | <exp>:x ’-’ <fac>:y => ‘(- ,x ,y) words, choices are always evaluated in order. As a result, | <fac>; there is no such thing as an ambiguous PEG, and their be- havior is easy to understand.1 Note that the results of <exp> and <fac> are bound to iden- Figure 1 shows a PEG, written in OMeta syntax, that tifiers (with the : operator) and referenced by the seman- recognizes simple arithmetic expressions (it does not cre- tic actions to create parse tree nodes. Note also that the last ate parse trees). In order to create parse tree nodes and/or choice in this production, <fac>, does not specify a seman- do anything else upon successful matches, the programmer tic action. In the absence of a semantic action, the value re- must write semantic actions. turned by a production upon a successful match is the result of the last expression evaluated (hence <fac> is equivalent 1 Although the use of left-recursive productions should result in infinite re- to <fac>:x => x). cursion (because of prioritized choice), some PEG implementations, includ- ing OMeta, provide special support for left recursion as a convenience. 2 The Squeak port of OMeta was joint work with Yoshiki Ohshima. OMeta has a single built-in production from which every 2.3 PEG Extensions for Expressiveness other production is derived. The name of this production is OMeta’s productions, unlike those in PEGs, may take any (underscore), and it consumes exactly one element from the number of arguments. This feature can be used to imple- input stream. Even the end production, which detects the ment a lot of functionality that would otherwise have to be end of the input stream, is implemented in terms of : built into the language. As an example, consider regular- end ::= ~<_>; expression-style character classes, which traditional PEG implementations support in order to spare programmers In other words, we are at the end of the input stream if from the tedious and error-prone job of writing productions it is not possible to consume another element from it. As such as noted in [5], the ~ operator—used for negation—can also be used to provide unlimited look-ahead capability.

Load more