Functional Programming with Structured Graphs

Functional Programming with Structured Graphs Bruno C. d. S. Oliveira William R. Cook National University of Singapore University of Texas, Austin [email protected] [email protected] Abstract In impure functional languages, including ML or OCaml [24], This paper presents a new functional programming model for graph a combination of algebraic datatypes and mutable references can structures called structured graphs. Structured graphs extend con- be used to model sharing and cycles. However this requires ex- ventional algebraic datatypes with explicit definition and manip- plicit manipulation of mutable references, which precludes many ulation of cycles and/or sharing, and offer a practical and conve- of the benefits of functional programming and algebraic datatypes. nient way to program graphs in functional programming languages For example observing sharing via pointer/reference comparison like Haskell. The representation of sharing and cycles (edges) em- breaks referential transparency. Even if it is possible to encapsulate ploys recursive binders and uses an encoding inspired by para- the use of mutable references under a purely functional interface, metric higher-order abstract syntax. Unlike traditional approaches reasoning about implementations remains challenging [19, 35]. based on mutable references or node/edge lists, well-formedness of In call-by-need functional languages, like Haskell, it is possible the graph structure is ensured statically and reasoning can be done to construct true cyclic structures. For example: with standard functional programming techniques. Since the bind- ones =1: ones ing structure is generic, we can define many useful generic combinators for manipulating structured graphs. We give applications creates a cyclic list where the head contains the element 1 and the and show how to reason about structured graphs. tail is a reference to itself. However, sharing is not observable and, from a purely semantic perspective, ones is no different from an Categories and Subject Descriptors D.3.2 [Programming Lan- infinite list of 1’s. A drawback of this approach is that when an guages]: Language Classifications—Functional Languages; F.3.3 operation is applied to this list sharing is lost, even if it would be [Logics and Meanings of Programs]: Studies of Program Con- possible to preserve sharing. For example, structs twos = map (λx → x + 1) ones General Terms Languages creates an infinite list of 2’s instead of a cyclic list with a single 2. To deal with the need for observable sharing some researchers Keywords Graphs, parametric HOAS, Haskell. have proposed approaches that use recursive binders to model cycles and sharing [14, 15, 20]. For example, ones can be expressed 1. Introduction using a recursive binder (µ) as follows: Functional programming languages, including Haskell [31] and ones = µ x. (1 : x) ML [29], excel at manipulating tree structures. In those languages The idea is to be able to observe and manipulate the binders (µ) algebraic datatypes describe the structure of values, and pattern and variables (x), making sharing effectively explicit. matching is used to define functions on such tree structured val- However, several questions need to be answered for this model ues. These mechanisms provide a high-level declarative program- to become effective in practice: ming model, which avoids explicit manipulation of pointers or references. Additionally algebraic datatypes facilitate reasoning about 1. What programming language mechanisms are needed to sup- functions using standard proof methods, including structural in- port convenient representation and manipulation of such binders duction. and variables? Does the approach guarantee well-formedness of However, there are many kinds of data that are more natu- cyclic structures (no unbound variables or other types of junk)? rally represented as graph structures rather than trees. Some exam- ples include: typical compiler construction concerns including con- 2. Is the model expressive enough? Can it deal with general graph trol/data flow graphs or grammars [2]; entity-relational data mod- edges, including the back edges and cross edges which arise in els [8]; finite state machines; or transitions systems. Sadly, func- non-linear graph structures? tional programming languages do not have an equally good answer 3. Can the model deal with operations that require special treat- when it comes to manipulating graphs as they do for trees. ment of fixpoint computations? For example, is it possible to exploit monotonicity to ensure termination on all inputs for an operation that checks the nullability [7, 28] of a grammar or regular expression? Permission to make digital or hard copies of all or part of this work for personal or As far as we know no approach deals with #3. Furthermore all classroom use is granted without fee provided that copies are not made or distributed approaches provide only partial answers to #1 and #2 (a detailed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute discussion is given in Section 7), and fall short in providing a to lists, requires prior specific permission and/or a fee. practical programming model for cyclic structures. ICFP’12, September 9–15, 2012, Copenhagen, Denmark. This paper presents a new functional programming model for Copyright c 2012 ACM 978-1-4503-1054-3/12/09. $10.00 graph structures, called structured graphs, that builds on the idea of using recursive binders to model sharing and cycles and provides data PLambda a = an answer for all 3 questions. Structured graphs can be viewed as Var a an extension of algebraic datatypes that allow explicit definition and manipulation of cycles or sharing by using recursive binders | Int Int and variables to explicitly represent possible sharing points. To pro- | Bool Bool vide a convenient and expressive programming interface, structured | If (PLambda a)(PLambda a)(PLambda a) graphs use a binding representation based on parametric higher- | Add (PLambda a)(PLambda a) order abstract syntax (PHOAS) [9]. This representation not only | Mult (PLambda a)(PLambda a) ensures well-formedness of the binding structure, but it also allows | Eq (PLambda a)(PLambda a) using standard proofs methods, including structural induction, in | Lam (a → PLambda a) proofs for a large class of programs. To deal with cross edges we | App (PLambda a)(PLambda a) use a recursive multi-binder inspired by letrec expressions in functional programming. Furthermore, the expressiveness and flexibil- newtype Lambda = ↓{↑::∀a.PLambda a } ity of our PHOAS-based representation allows us to define operations that require special treatment of fixpoint computations. Figure 1. PHOAS-encoded lambda calculus with integers, Since the binding structure is generic, it is possible to de- booleans and some primitives. fine many useful generic combinators for manipulating structured graphs. By employing some lightweight datatype-generic programming [17] techniques, we also propose a datatype-generic formula- with classic HOAS (and other higher-order approaches) many op- tion of structured graphs. This formulation enables the definitions erations are non-trivial to define in languages like Haskell, whereas of useful combinators like generic folds and transformations. Us- with PHOAS various operations are generally easier to define. This ing such combinators it is often possible to write programs for unique combination of features makes PHOAS a particularly at- processing graphs that are no more difficult to write than programs tractive foundation for our work. on conventional algebraic datatypes. To illustrate the advantages of PHOAS in more detail, we use The programming model also allows for transformations requir- the lambda calculus (with standard extensions) presented in Fig- ing complex manipulation of the binding structure, which are less ure 1. Lambda terms are encoded by the newtype Lambda, which easily captured in combinators. Those operations can always be de- is defined in terms of the datatype PLambda a. fined by direct pattern matching on the binding structure (variables Well-scopedness The type argument a in PLambda a is sup- and binders). posed to be abstract: it should not be instantiated to a concrete type To summarize, our contributions are: when constructing lambda terms. To enforce this, a universal quan- tifier (∀a.PLambda a) is used in the definition of Lambda. Note • Structured graphs: a new programming model extending the that, in Haskell, the following type synonym: classic notion of algebraic datatypes with cycles and sharing. This model supports the same benefits as algebraic datatypes type Lambda = ∀a.PLambda a and facilitates reasoning over cyclic structures. The binding can be problematic to encode Lambda. This is because in Haskell infrastructure is conveniently defined and manipulated using a all universal quantifications are pushed to the left-most position af- PHOAS-based representation. ter expansion of the type synonym. This sometimes makes types • Generic combinators and infrastructure: Additional conve- less polymorphic than expected. The use of a newtype circum- nience is provided through the use of generic combinators for vents this problem, at the cost of introducing explicit embedding folds and transformations. Such generic combinators can also and projection functions ↓(hide) and ↑(reveal). encapsulate the use of special fixpoints for certain operations. Using a as an abstract type ensures that only variables bound • by a constructor Lam can be used in the constructor Var. For Recursive binders using PHOAS: We also show how to de- example, the identity function can be defined as: fine recursive binders with PHOAS. The recursive multi-binder presented in Section 3.2 is particularly relevant, since it enables idLambda = ↓ (Lam (λx → Var x)) the definition of cross edges. However the following terms are not valid, and are rejected by the type system: The presentation of our work uses Haskell. Occasionally we use some common extensions implemented in the GHC compiler. The invalid1 = ↓ (Var 1) code for this paper is available online at http://ropas.snu.ac.

Load more