Oomatch: Pattern Matching As Dispatch in Java

OOMatch: Pattern Matching as Dispatch in Java by Adam Richard A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2007 c Adam Richard 2007 AUTHOR’S DECLARATION FOR ELECTRONIC SUBMISSION OF A THESIS I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract We present a new language feature, specified as an extension to Java. The feature is a form of dispatch, which includes and subsumes multimethods (see for example [CLCM00]), but which is not as powerful as general predicate dispatch [EKC98]. It is, however, intended to be more practical and easier to use than the latter. The extension, dubbed OOMatch, allows method parameters to be specified as patterns, which are matched against the arguments to the method call. When matches occur, the method applies; if multiple methods apply, the method with the more specific pattern overrides the others. The pattern matching is very similar to that found in the “case” constructs of many functional languages (ML [MTHM97], for example), with an important difference: functional languages normally allow pattern matching over variant types (and other primitives such as tuples), while OOMatch allows pattern matching on Java objects. Indeed, the wider goal here is the study of the combination of functional and object-oriented programming paradigms. Maintaining encapsulation while allowing pattern matching is of special impor- tance. Class designers should have the control needed to prevent implementation details (such as private variables) from being exposed to clients of the class. We here present both an informal “tutorial” description of OOMatch, as well as a formal specification of the language, and a proof that the conditions specified guarantee run-time safety. iii Acknowledgments Special thanks go to my supervisor, OndˇrejLhoták, for all his help in the production of this thesis, as well as to Gordon Cormack, Peter Buhr, Brad Lushman, Nomair Naeem, and Ghulam Lashari for their help and suggestions. iv Contents 1 Introduction and Motivation 1 2 Background 5 2.1 Pattern Matching . 5 2.2 Dispatch . 6 2.2.1 Multimethods . 8 2.2.2 Predicate Dispatch . 10 3 Using OOMatch - Informal Description 11 3.1 Pattern Matching . 11 3.2 Deconstructors . 15 3.3 Order of Deconstructors . 18 3.4 Null . 19 3.5 Undecidable Errors . 19 3.5.1 Interfaces . 20 3.5.2 Different deconstructors . 20 3.5.3 Non-deterministic deconstructors . 22 3.6 Where clause . 23 3.7 Matching Against the Values of Variables . 24 3.8 Cross-class Ambiguities and Final Methods . 26 3.9 Abstract Methods . 28 v 3.10 Overriding among Primitive Types . 28 3.11 Manual Overriding . 29 3.12 Let statement . 30 3.13 Static deconstructors . 31 3.14 Backwards Compatibility . 32 3.15 Multi-threaded Applications . 35 4 Formal Specification 37 4.1 Grammar . 37 4.2 Notation and Definitions . 38 4.3 Deconstructor disambiguation . 39 4.4 Method dispatch . 41 4.4.1 Applicable methods . 41 4.4.2 Preferred Methods . 42 4.4.3 Overall Method Dispatch . 44 4.5 Compile-time checks . 44 4.5.1 Parameter Intersection . 44 4.5.2 Conditions to be checked statically . 45 4.6 Absence of runtime ambiguities . 47 4.6.1 Undecidable equivalence . 48 4.6.2 Common descendent . 48 4.6.3 Deterministic deconstructors . 49 4.6.4 Claims of safety . 49 5 Implementation 55 5.1 Overview: how Polyglot works . 55 5.2 Overview: OOMatch compiler . 57 5.3 Preprocessing Visitor Passes . 57 5.4 Typechecking and Building Override DAGs . 62 vi 5.5 Transformation . 65 5.5.1 Rename methods . 65 5.5.2 Add dispatchers . 66 5.5.3 Transform deconstructors . 75 5.5.4 Passing null and primitive values . 76 5.5.5 Name Mangling . 77 5.5.6 Example Transformation . 78 6 Limitations 79 6.1 Patterns in Constructors . 79 6.2 Abstract Deconstructors . 80 6.3 Error Messages . 81 6.4 Extending Java Classes . 82 7 Use Cases 84 7.1 Polyglot . 85 7.2 Soot . 86 8 Other Related Work 91 8.1 Scala . 91 8.2 OCaml . 92 8.3 TOM . 93 8.4 Views . 93 8.5 JMatch . 94 8.6 JPred . 96 8.7 Other Approaches to Pattern Matching as Dispatch . 96 vii 9 Conclusion and Future Work 98 9.1 Conclusion . 98 9.2 Design Goals . 99 9.3 Future Work . 101 9.3.1 Other data types . 101 9.3.2 Disjunctive Patterns and Regular Expressions . 102 9.3.3 Partial Deconstructing . 103 A Example of typechecking algorithm 105 B Examples of uses of OOMatch 109 C Example dispatcher output 116 viii List of Figures 5.1 OOMatch compiler overview . 58 5.2 OOMatch Visitor Passes . 59 5.3 Code for example transformation . 78 A.1 Typecheck algorithm example code . 105 A.2 Typecheck algorithm example, step 1 . 106 A.3 Typecheck algorithm example, step 2 . 106 A.4 Typecheck algorithm example, step 3 . 106 A.5 Typecheck algorithm example, step 4 . 107 A.6 Typecheck algorithm example, step 5 . 108 ix List of Tables 4.1 Partial function u (parameter intersection) . 46 x Chapter 1 Introduction and Motivation Object-oriented programming languages have become a widespread means of writ- ing large programs (of millions of lines of code), and with good reason. Building large systems naturally lends itself to breaking down a task into components, and programmers must be provided with a simple interface to conceptualize and work with these components. A class-based type system is intended for such modular- ization. Functional programming languages, on the other hand, have gained a group of devoted followers not only for their beauty, but also because programs written in a functional style tend to have fewer bugs. Because of their declarative nature and lack of side effects, it is generally easier to avoid bugs in a functional language (once the software passes the compiler) than in an object-oriented language. As there is a greater trend toward security, as computers are used for more and riskier applications, and as programs get larger, it is becoming increasingly important to prevent bugs from the start. The strong static checking of languages like Standard ML [MTHM97] and Haskell [PJ03] provides this safety quite well. But most functional languages (with notable exceptions such as OCaml [CMP07] and Scala [OAC+06]) do not have built- in support for class-based object-oriented programming as found in languages like Java and C++. Each methodology - functional and OO programming - is useful for a wide class of problems, and embodies the programming style of a large group of people. It is very difficult to say whether one or the other methodology is ideal for all situations, or whether some mixture is ideal. Therefore, we believe that for the time being, it is best to provide a language that supports as many styles as possible. The 1 features embodying each style should further be composable with each other, so that different pieces of code written in different styles can be pieced together to form a program. One might counter that we could simply write separate components in separate languages and combine the components with a standard interface. A functional or object-oriented language could be selected for each component depending on which is most applicable to the programmer or situation. The problem with this approach is that we might still want to use elements of both styles at once. We might want a class, with all the power of an object-oriented class, that contains a function that can be used as a value; or we might want a function that contains a nested function as well as a class definition within it. In general, the combination of several styles may yield new and powerful idioms or code patterns that are not present in their mere sum, and by studying the combination of functional and object-oriented styles, we hope to discover what some of these idioms might be. This thesis presents a small step towards the goal of unifying object-oriented and functional programming. In particular, it considers how pattern matching – a common and useful feature of many functional languages – might be interwoven into the object-oriented tapestry. Pattern matching in, for example, ML, allows one to decompose algebraic types or tuples into their components, either in a case statement or in a set of functions. Though this pattern matching is useful in a functional context, simple algebraic types and tuples are not used much in object- oriented programming; classes are used much more. So we here present a means of deconstructing objects into their components and specifying patterns that match objects with certain properties. Further, and most importantly, our pattern matching is used in determining method dispatch. The patterns are specified as parameters to methods (as in ML), and the compiler decides on a natural order to check for a matching pattern, i.e. to check which methods override which. Methods with more specific parameters override methods with more general parameters. Since parameters of subclass type are considered more specific than those of superclass type, this feature subsumes polymorphic dispatch and multimethods. Important to this approach, it should be noted, is that information hiding of objects (a fundamental property of object- oriented systems) must be preserved; we must not allow clients to access the data of the object except in ways the class writer explicitly allows.

Load more