Transformation of Structure-Shy Programs Applied to Xpath Queries and Strategic Functions
Total Page:16
File Type:pdf, Size:1020Kb
Transformation of Structure-Shy Programs Applied to XPath Queries and Strategic Functions Alcino Cunha Joost Visser DI-CCTC, Universidade do Minho, Portugal {alcino,joost.visser}@di.uminho.pt Abstract focusing on the essence of the algorithm rather than oozing with Various programming languages allow the construction of structure- boilerplate code. This reduces development time and improves shy programs. Such programs are defined generically for many understandability. Also, structure-shy programs are only loosely different datatypes and only specify specific behavior for a few rel- bound to the data structures on which they operate. As a result, evant subtypes. Typical examples are XML query languages that they do not necessarily need adaptation when those data structures allow selection of subdocuments without exhaustively specifying evolve, and they may be reusable for different data structures. intermediate element tags. Other examples are languages and li- The flip side of these benefits is that structure-shy programs braries for polytypic or strategic functional programming and for have potentially worse space and time behaviour than equivalent adaptive object-oriented programming. structure-sensitive programs. A source of performance loss, gener- In this paper, we present an algebraic approach to transforma- ally by a factor linear in the input size, are dynamic checks em- tion of declarative structure-shy programs, in particular for strate- ployed in the execution of structure-shy programs to determine at gic functions and XML queries. We formulate a rich set of alge- each data node whether to apply specific or generic behaviour. An- braic laws, not just for transformation of structure-shy programs, other source of inefficiency is that algorithmic optimizations, such but also for their conversion into structure-sensitive programs and as cutting off traversal into certain substructures, cannot be ex- vice versa. We show how subsets of these laws can be used to con- pressed without to some extent sacrificing structure-shyness and struct effective rewrite systems for specialization, generalization, its benefits. In fact, manual optimization of structure-shy programs and optimization of structure-shy programs. We present a type- typically involves such sacrifice. safe encoding of these rewrite systems in Haskell which itself uses For adaptive programming and polytypic programming, sub- strategic functional programming techniques. stantial effort has been invested in the development of optimizing compilers. Compilation schemes for Generic Haskell and Generic Categories and Subject Descriptors D.1.1 [Programming Tech- Clean specialize and optimize polytypic input programs for specific niques]: Applicative (Functional) Programming; F.3 [Logics and types [28, 1, 2]. Adaptive programs are compiled to plain object- Meanings of Programs]: Semantics of Programming Languages oriented programs with optimized navigation behaviour [19]. General Terms Languages, Performance, Theory In this paper, we present an approach to transformation of structure-shy programs that encompasses typed strategic program- Keywords Algebraic program transformation, Strategic func- ming and XML programming. Our approach builds on the pio- tional programming, XML query languages, Point-free program neering work of Backus [3] and the ensuing tradition of algebraic calculation, Type specialization, Type generalization transformation of point-free functional programs [9, 6]. Algebraic program transformation laws can be formulated for structure-shy 1. Introduction strategic programs and XML processors, just as they have been formulated for structure-sensitive point-free functional programs. Structure-shy programming techniques have been introduced for Further laws can be formulated that mediate between structure- dealing with highly structured data such as terms, semi-structured shy and structure sensitive programs by type-specialization and documents, and object graphs in a largely generic manner. A generalization. Such laws can be leveraged, not only for optimiza- structure-shy program specifies type-specific behaviour for a se- tion of structure-shy programs, but also conversely for increasing a lected set of data constructors only. For the remaining structure, program’s degree of structure-shyness, which may have its use in generic behaviour is provided. Prominent flavours of structure-shy program understanding, refactoring, or re-engineering. programming are adaptive programming [20], strategic program- We show that the various algebraic laws involving structure-shy ming [24, 17, 18, 26], polytypic or type-indexed programming, and programs can be harnessed in type-safe, type-directed rewriting several XML programming languages and APIs [29]. systems. We employ the functional language Haskell, extended Structure-shy programming offers various clear benefits [15, with generalized algebraic datatypes, to implement such systems. 27]. A structure-shy program can be significantly more concise, In Section 2, we briefly motivate our work with some basic examples. In Section 3, we recapitulate algebraic laws for structure- sensitive point-free functional programs, and we complement them Permission to make digital or hard copies of all or part of this work for personal or with laws for structure-shy programs and for mediation between classroom use is granted without fee provided that copies are not made or distributed structure-sensitive and structure-shy programs. In Section 4, we for profit or commercial advantage and that copies bear this notice and the full citation explain the encoding in Haskell of rewrite systems that harness on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. the algebraic laws. In Section 5, we discuss various application PEPM 2007 January 15-16, 2007, Nice, France. scenarios of our rewrite systems. Section 6 discusses related work, Copyright c 2007 ACM 978-1-59593-620-2/07/0001 . $5.00 and Section 7 concludes. 11 imdb Strategic programming was first supported in non-typed setting * * movie actor in the Stratego language [24]. A strongly-typed combinator suite * * * was introduced as a Haskell library by the Strafunski system [17, year title director review box office name played 18]. This suite was generalized into the so-called ‘scrap-your- * boilerplate’ approach to generic functional programming [14]. We country value year title role award focus on a limited set of combinators that convey the essence of strategic programming [16]. Combinators for type-preserving Figure 1. A movie database schema, inspired by IMDb (http: generic functions (transformations): //www.imdb.com/). nop :: T -- identity (.) :: T → T → T -- sequence mapT :: T → T -- map over children The following syntax describes essential parts of XPath: mkT A :: (A → A) → T -- creation location := ’/’ ?(step (’/’ step)∗) apT A :: T → (A → A) -- application step := axis ’::’ test pred ∗ For readability we put single-letter type constants in sans serif font. axis := ’child’ | ’descendant’ | ’self’ | Combinators for type-unifying generic functions (queries): ’descendant-or-self’ test := name | ’*’ | ’text()’ | ’node()’ ∅ :: Q R -- empty result pred := ’[’ expr ’]’ (∪) :: Q R → Q R → Q R -- union of results name := any document tag mapQ :: Q R → Q R -- fold over children mkQ A :: (A → R) → Q R -- creation The full syntax is available in the XPath language reference [29]. apQ :: Q R → (A → R) -- application Abbreviated syntax is available and heavily used where for instance A // expands to /descendant-or-self::node()/ and an element The result type R is assumed to be a monoid, with a zero element name without preceding axis modifier expands to /child::name. and associative plus operator. Typical derived combinators: Figure 2. Summary of XPath. everywhere :: T → T everywhere f = f . mapT (everywhere f ) everything :: Q R → Q R 2. Motivating examples everything f = f ∪ mapQ (everything f ) Structure-shy programming allows concise formulation of queries See the running text for examples of usage. and transformations on rich data formats. Consider as an example the XML schema shown in Figure 1 for documents that hold infor- Figure 3. Strategic functional programming. mation about movies. Let’s consider some queries and transforma- tions for this schema of varying degrees of structure-shyness. Retrieve all movie directors from a document In the XPath query Truncate reviews to 100 characters Using strategic functional language, this query can be formulated as follows: programming, this transformation can be expressed as follows: //movie/director everywhere (mkT Review take100 ) where take100 (Review r) = Review (take 100 r) director In particular, this query asks to retrieve elements that are The everywhere combinator applies its generic argument function direct children of a movie element, where the movie element can in topdown fashion to every node in a term. The mkT A combinator appear at any depth in the document structure. See Figure 2 for applies its type-specific argument function to a given node if it is a summary of XPath constructs. The query is structure-shy in the of type A, otherwise it returns the node untouched. A summary of sense that it does not explicitly specify the structural elements that strategic functional programming is provided in Figure 3. movie occur between the document root and the element. This structure-shy transformation suffers from performance The structure-shyness of the