Using Prolog for Transforming XML Documents
Total Page:16
File Type:pdf, Size:1020Kb
Using Prolog for Transforming XML Documents Rene´ Haberland Technical University of Dresden, Free State of Saxony, Germany Saint Petersburg State University, Saint Petersburg, Russia email: [email protected] (translation into English from 2007) Abstract—Proponents of the programming language Prolog all or with severe restrictions into hosting languages, like Java, share the opinion Prolog is more appropriate for transforming C++ or Pascal. XML documents than other well-established techniques and In order to resolve this problem, two strategies can be languages like XSLT. This work proposes a tuProlog-styled interpreter for parsing XML documents into Prolog-internal identified as most promising. First, integrate new features. The lists and vice versa for serialising lists into XML documents. language gets extended. However, this can only succeed if lingual concepts are universal enough w.r.t. lexemes, idioms. Based on this implementation, a comparison between XSLT Second, choose a federated approach. Depending on the imple- and Prolog follows. First, criteria are researched, such as con- mentation, the hosting language is simulated by introspection. sidered language features of XSLT, usability and expressibility. These criteria are validated. Second, it is assessed when Prolog Solely concepts remain untouched. distinguishes between input and output parameters towards The reasons against the first approach are a massive rise in reversible transformation. complexity and a notoriously incompatible paradigm. Hence, the federated approach is chosen to implement the transfor- I. INTRODUCTION mation language with Prolog being visible to the user. A. Preliminaries A transformation within XML is a mapping from XML onto B. Motivation XML. W.l.o.g. only XML as output is considered in this work. Prolog has two key features, which make it very powerful. Unlike programming, no program but document data is being Those two features are unification and backtracking – both acquired from some sources and outputted later on. The output of which are not present in XSLT. Unification allows terms is a result of some transformation process. Templates are to be composed and described easily. However, the handling documents with some parts being slots, which are filled with may become cumbersome when terms reach a specific size. data from documents as requested. The document obtained is Backtracking may trace multiple solutions in a tree-structured called the target document. Templates are sometimes called search space effectively if applied wisely. Prolog is also well stylesheets. known for concise programs solving rather complex tasks. It Examples of template-based transformation languages are is expected, unification, backtracking, and additional features Xduce [12], Xact [36] and XSLT [41]. Transformation lan- may improve expressibility. However, Prolog is suspected to guages are often markup languages. A markup (tag) has a be inappropriate due to its minimalistic language features on meaning dedicated to a certain domain. For instance, tags can some numeric problems. The non-distinction between input be categorised by command, directive and output informa- and output parameters also may indicate a flexible express- tion. Markups encapsulate text sections representing altogether ibility. Since there is almost no previous work on this topic united a corresponding document. Markups are recursively available, new characteristics on Prolog as transformation defined over text. XML consists of markups. language are expected. arXiv:1912.10817v2 [cs.PL] 23 Apr 2021 Due to its focus on documents, transformation languages do not have much in common with programming languages. C. Related Work and Foundations Despite this circumstance, some concepts are still seamlessly XML-processing with SWI-Prolog interchangeable: Seipel [27] introduced purely experimental transformation • typing language FNPath as a subset of SWI-Prolog. Since Prolog is • backtracking good at dealing with symbolic terms, it may also be considered • pattern matching by Seipel for transforming XML documents. • monads XML documents are represented in FNPath as terms. • unification Queries are composed of navigational operators. For dis- • higher-order functions tinguishing monotone from non-monotone operators, three • non-strict functions classes were introduced: FNTree, AssignmentTree and • modules SelectionTree. Those classes are sorted in ascending All of the mentioned and even most of the unmentioned order by abstraction level. FNTree is the most generalised languages have in common that they cannot be integrated at class. SelectionTree is the most specialised class. This work is licensed under the Creative Commons Attribution License (CC BY-NC-SA 4.0). The FNPath-expression O*[ˆa:5,ˆc:6] denotes that in ing expressions, they still introduce the event-based navigation a subtree of O attribute a is replaced by 5, and then attribute language SSAX. c is replaced by 6. In fig.1, the SXSLT-function is shown that traverses an XML Since there are no templates foreseen in FNPath, a direct tree. The result of the function pre-post-order is an event comparison with XSLT is a little concerned. However, some tree generated from bindings by successive application of questions still arise. For instance, whether all introduced templates. When the function is called with an element node operators are complete w.r.t. a transformation language? Are and a traversal function as arguments, the latter is tried. If there any improvement in usability, and is the representation traversal fails, then the current node is traversed in pre-order. chosen adequate? Child nodes are handled similarly. In general, each node in an XML tree is reachable from any other node with FNPath. However, access may still be very (define (pre-post-order tree bindings) hard due to bloated representations, numerous overloadings (cond and too complex accessor functions. Another remark on FN- ((nodeset? tree) Path is both Parse/Serialiser operators are bound tightly to the (map (lambda (a) SWI-Prolog framework and are by far incompatible otherwise. (pre-post-order a bindings)) All critical operations are written in C and are not part of ISO- tree)) Prolog. Platform independence is violated regardless Prolog programs are interpreted. These are serious concerns. ((not (pair? tree)) Seipel proposes Prolog or another declarative language for (let (trigger ’*text*) transformations due to its high expected abstraction level (cond ([27], p.12). The transformation language should be embedded ((or (assq trigger bindings) in a conventional programming language. Because of the (assq ’*default* bindings)) potential non-termination of recursive clauses, a DATALOG- (lambda based evaluation manager should be used instead. (b) Scheme-based XSLT-processor (((procedure? (cdr b)) Kiselyov and Krishnamurthi [16] summarise design discrep- (cdr b) (cddr b)) ancies and flaws on XSLT. The most important of which are: trigger tree))) • A few very essential functions require some extraordinary complex templates. (error "Unknown binding")))) • XSLT is not appropriate for invertible transformations because “templates are not higher-order” ([16], p.1) (error handle-children-nodes...))) • XSLT is a closed system, with no extensions possible. • Operators are not complete at all. User-defined operators Fig. 1. Scheme-function pre-post-order (after [16]) are hardly available. Apart from the flaws, expressibility and poor readability In the next step both, Kiselyov and Krishnamurthi want to are also caused by markups. At this point, the citation from integrate additional features into SXSLT like context propaga- [16] from page 4 should be mentioned in [20]: “The really tion, additional traversal strategies and a type system. bad thing is that the designers of XSLT [...] failed to include Hypothetical XML-transformation processor in Haskell fundamental support for basic functional programming idioms. Meijer and Shields [19] proposed XMλ for typing transfor- Without such support, many trivial tasks become hell.”. The mation languages. third point addresses the same problem as was already men- Typing was considered too often dropped in favour of tioned by [27]. a shorter and easier notation. That is why both designed SXSLT is a new implementation and an extension of XSLT, the language XMλ. XMλ is based on Haskell, so it is a which is written in Scheme. In Scheme introspection allows statically typed transformation language and provides higher- on invoke programs on runtime (so, also templates). order functions, type polymorphism, pattern matching, type SXSLT offers the following features for free as a result of constructors and monads ([19], p.6). Transformation directives Scheme as embedded language: are modelled as tags, which are evaluated in Haskell. So the • higher-order functions are handled as so-called S- transformation script uniquely consists of tags encapsulating expressions. It allows calling an associated function by element constructors and Haskell expressions internally. name during runtime. Fig.2 shows typing and definition of the example function • local templates getPara. The tag in paragraph <P> contains the bound vari- • flexible iteration ordering able p, which occurs on the lambda-term right-hand side. The • access to a resulting tree call getPara <P>Hello World!</P> returns Hello Although the authors criticise both the syntactic discrepancy World!. and operations ([16], p.3) between XPath and XSLT in match- Higher-order functions in XSLT 2 getPara::P->String