<<

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

Processing XPath/XQuery to be Aware of XLink

LULE AHMEDI MENTOR ARIFAJ University of Prishtina, Computer Engineering Kodra e diellit p.n. 10000 Prishtinë, Kosova {lule.ahmedi | mentor.arifaj}@fiek.uni-pr.edu

Abstract: - XML Linking Language (XLink) as a W3C recommendation represents the standard way how to use links within XML documents. With these links, it is possible to select arbitrarily narrow content, and not restricted to selecting the whole document as it is the case with HTML hyperlinks. There are few implementations for XLink despite the fact that it is a recommendation since 2001. Most of these implementations are focused on how to handle XLink links while browsing rather then query them. There is an implementation using LDAP that uses other data model than XML and another one that is an extension of eXist a XML database system to support XLinks. We extended another well-known and widely used XPath/XQuery open source system named Saxon-B. The Saxon-B is an in-memory system and is capable to be used with programming languages to support the XML features implemented on it. We describe our extension in several steps and illustrate it with algorithms and samples. We complement the discussion by comparing our system with existing systems (eXist and LDAP), and confirm that our system is a complement and not a duplication of these systems.

Keywords: - XML, Querying XML, XLink hyperlinks, XML Applications, LDAP.

1 Introduction 2 XLink Hyperlinks and Current Linking is an important feature of the Web, and with it Bottlenecks the web becomes the single and complex system [14]. Hyperlinks or links are very important part of the web XML Linking Language (XLink) is an XML based [6] and today they are very popular and widely used on language which specifies the standard way how to the Web. XLink presents the standard way of incorporate and use hyperlinks within XML documents. incorporating hyperlinks into XML documents. XLink These XML documents could be located into intranets links are incorporated at XML elements within XML and into the Internet. XLink is a documents. An XML element could be an XLink links Consortium (W3C) recommendation since year 2001 if it consists of the type attribute of the namespace . [1]. Based on our investigation, there are few XLink presents two types of links, simple links with implementations of this recommendation when attribute xlink:type = “simple” and extended links with comparing with other W3C recommendations, attribute value xlink:type=”extended”[1]. especially XPath/XQuery [2][3]. For XLink these XLink simple links are similar with the implementations could be partially dedicated only one (HTML) hyperlinks. But at simple part of XLink or dedicated for browsers and from the links every element could be a link not only a querying point of view there are very few predefined “a” element as it is for HTML hyperlinks. implementations such as using eXist [5] and With XLink extended links is possible to create more Lightweight Directory Access Protocol (LDAP) [4]. complicated linking structure [1]. At extended links the On the section 2 we describe more details about XLinks XML element containing the attribute xlink:type = and the need for query XML documents that have “extended” serves as a container for other nested XLink links. Also we describe the similarities and elements which should have set the xlink:type attribute differences between our implementation and eXist and to one of these values “locator”, “resource”, “arc” or LDAP implementations. On the section 3 we describe “title”[1]. our steps towards the implementation. And at the end There are numerous systems implementating section 4, we discuss about related work and conclusion XPath/XQuery technology, whereas for XLink, there of our implementation that it is not a duplication of the are only few implementations [3, 2]. For XPath/XQuery existing implementations but a complementary there exists wide range of tools in different platforms implementation for query interlinking XML documents. and build with different programming languages. Based on [2] more of these implementations for XLink are for browsers and few regarding querying of the links. During our investigations we have found that there exist

ISSN:1790-5109 217 ISBN: 978-960-474-002-4 2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

two implementations for query XML documents in XPath defines 13 axis [17] and the mostly used axis are presence of XLink links. Extension of LDAP [4] to child, attribute, ancestor and descendent [8]. support XPath query at the XLink links is one of them. We base our implementation on the following logical This system is based on another data model than XML model: The XML model is not a tree, but rather a graph data model and it maps XPath query to that data model. with the major part of nodes inter-connected via And the other is extension of eXist system which is a child/attribute arcs, and an additional type of arc native XML database system [5]. The eXist extension is allowed referring into local or remote portions of graph based on a logical model proposed from Wolfgang May nodes. in [7]. From the query (or database) point of view there An XML element declared as XLink link can contain an are not any official drafts [7] or recommendation how to href attribute. This attribute in case of simple links handle interlinked XML documents and these could be specifies the location where the XML element points to, seen as an open issue from the XLink recommendations. and in general it has the form as in Expression 1. So based on facts that XPath/XQuery is widely used and Expression 1: href = uri-reference [4] the need to handle links into XML documents, on the The uri-reference could be expressed as in expression 2. other hand very few existences of the implementations Expression 2: uri-reference = uri#fragment [4] regarding querying these documents, we have Where uri represents the Internet resource and could be implemented an extension of “open-source”[10] system expressed with expression 3 and the fragment is express named Saxon-B. Saxon-B is an XPath 2.0, XQuery 1.0 with XPointer (a W3C recommendation [11]) pointers. and XSLT 2.0 [8], system developed in Java platform Expression 3: uri = sheme://host:port/path [4] and is a main memory XQuery processor or in-memory As our aim is to extend an existing XPath/XQuery processor [9]. For Saxon-B we can say that it is a native system this lead us to a condition that fragment is an XML processor together with eXist processor whereas XPath expression and we based this on the already LDAP is not native XML. The difference between LDAP extension [4]. LDAP and Saxon-B is more explicit as LDAP is based Condition 1: fragment = xpathexpression on another data model than XML based model. The Example 1: xpathexpression = //country/cities/city difference between Saxon-B and eXist could be seen as We have taken the saxon-resources9-0-0-1[10] and “how to use these tools”. Saxon-B can be use as an explore it using NetBeans 6.0.1[12]. extension to the application build with different Example 2: part of the adopted distributed mondial programming tools (Java, C#, etc.) whereas eXist is an database[13] from countries. file. XML database system. First, we are aware of possibilities to use Saxon-B in Albania top of another application and then via the application … to execute XPath expressions. an implementation rather than a new extension of an 3 Extending XPath/XQuery to Model XPath models the XML document [15] as a tree of

ISSN:1790-5109 218 ISBN: 978-960-474-002-4 2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

xlink:href="http://localhost/Mondial- B = False Distributed//countries.xml#//countries/country[@car_co de='MK']" If ( S contains (@xlink:href) ) borderlength="151"/> Let S1 = S (part before @xlink:href with End If-Else If ( B == True ) First, in order to be handled elements’ attributes derived Let A be the array of results from processing of S1, from XLink recommendation the processors should which could have 0, 1 or more members with values recognize the xlink namespace which most of the time of @xlink:href attribute, in the XML documents it is declared at the beginning of /* the document with expression 4. Array A content could be like: Expression 4: xlink namespace A[0]=“http://.. /file0.xml#xpathexpression[0]” xmlns:xlink="http://www.w3.org/1999/xlink". A[1]=“http://..//file1.xml#xpathexrpession[1]” In order to be made querying possibility for files like in … example 2 we found another condition. A[n-1]=“http://..//file(n-1).xml#xpathexpression[n-1]” Condition 2: in order to query the XML documents with n is the total number of members of A) Xlink links the nodes that contents href attribute should or in general A[i] = uri[i]#xpathxpression[i] be presented with that attribute in form of [@xlink:href] */ into the query expression. For (i = 0, i < n, i++) We based our condition 2 on Logical Model, in order to A[i] = A[i] & S2 connect remote resources we use reference attribute A[i] = Separate(A[i]) during query formulation in order to tell the processor ProcessingXLink (A[i]) that for further query needs to use attribute axis. End For The condition 2 can be illustrated with example 3. End If Example 3: our XPath query to query the Figure 1: Query Processing Algorithm with the countries.xml file in order to find all cities of Albania Extended Saxon-B (ProcessingXLink Procedure) which are pointed using xlink:href to the cities.xml file: //country[@car_code='AL']/cities[@xlink:href]/name Algorithm Separate(C) Example 3 could be generalized by expression 5. Let C be a string of the form C = uri#xpathexpression Expression 5: Let U= uri // first part of argument C //xpath1/xpath2/…/xpathk[@xlink:href]/…/xpathn Let X = xpathexpression // second part of argument The condition 2 will tell the Saxon-B processor the way return X how to find the road to other document in our case Figure 2: Algorithm for Separating XPath Expression cities.xml from countries.xml via href value from the URI part (Separate Procedure) xlink:href="http://localhost/Mondial- Distributed/Cities/cities-AL.xml#//cities/city"/> The algorithm presented in figure 1 is only the logical Into the Saxon-B the condition 2 we have achieved by way how to handle the query with Xlink links. checking the presence of @xlink:href into the query. In order to implement this we have added a procedure We are going to comment step by step the algorithm into the class QueryParser of the net.sf.saxon.query with our query in example 3 and using adopted mondial package; database from example 2. In general we can describe this by the algorithm As from example 3 the S for us is expression 6: presented in figure 1. Expression 6: Algorithm (ProcessingXLink(S)) S = Let S be a query expression “//country[@car_code=’AL’]/cities[@xlink:href]/name Let B be an indicator of the presence of (@xlink:href) ” // Initialize B to False

ISSN:1790-5109 219 ISBN: 978-960-474-002-4 2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

As it contains @xlink:href the S query will be divided xmlns:dbxlink="http://dbis.informatik.uni- into two parts and that goettingen.de/linxis">Vlore Expression 7: Elbasan So, now the S1 part is processed like XPath fashion Korce In our case it will give the result like in example 4. Example 4: The results from query file in example 2 The expression 4 can be expand to more general format with expression 6 Expression 9: http://localhost/Mondial-Distributed/Cities/cities- //xpath1/xpath2/…/xpathk[@xlink:href]/…/xpathl[@xlin AL.xml#//cities/city k:href]/…/xpathm[@xlink:href]/…/xpathn And then for every member of the array A we add on it This is also executable in our implementation. and left part S2 as in example 5. Example 5: 4 Implementation A[i] = “//cities/city/name” We have used the above modeling framework to extend The uri part from example 4 will tell us the location of the existing Saxon-B XPath/XQuery engine [10] for the xml document that the query is going to be done. handling XLink hyperlinks when querying. The In example 5 the query is done in another file stated at prototype of this extension, called Bota (in albanian uri part or the reference attributes. We have used local means “world”) is implemented on top of Saxon-B, with uri in order to test the system and we do not see and additional code in several existing classes such as difficulties to query XML documents if the files are QueryParser in package net.sf.saxon.query, distributed into the Web. XQueryExpression in package net.sf.saxon.query, and The achieve results is presented into example 6. abstract class Expression in package net.sf.saxon.expr. Example 6: result from the query in example 5. Initial tests with Bota show the prototype is working properly, and will be a subject of further considerations Tirane 5 Related Work and Conclusion Several frameworks for handling XLinks while Shkoder into LDAP referrals needs to take place prior to Durres distinction in being implicit when following links. We

ISSN:1790-5109 220 ISBN: 978-960-474-002-4 2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

Our system extends Saxon-B, an open-source in- [5] W. May, E. Behrends and O. Fritzen, Integrating and memory XML processor, i.e., non-server-based [16], as Querying Distributed XML Data via XLink, Information opposed to eXist [5] which falls into the category of the Systems, to appear, 2008. so-called XML database systems. The later offers the [6] M. Eckert, Processing Hypertext Links after XLink, functionality required for typical data-intensive “off- Project Work within the frames of the Advanced line” XML applications with linkbases [5], whereas our Practical Work, Institute of Informatics, Ludwig in-memory system is flexible in dealing with interlinked Maximilians Universitat, Munchen, 2004. XML data as originally coined to serve as the exchange [7] W. May, Querying Linked XML Document format of the future hyperlinked Web. Typical usage Networks in the Web, 11th International World Wide scenarios might involve resolving advanced Web feeds, Web Conference (WWW 2002), CDROM/online at like RSS or Atom [18], which cumulate selected http://www2002.org/CDROM/alternate/166/, Honolulu, portions of the source data through the explicitly stated Hawaii, May 2002. XLink links. Further, Web Services [19] can make use [8] M. Kay, XPath™ 2.0 Programmer’s Reference, of these advanced links with our system plugged in as a eISBN: 0-764-57756-5, Wiley Publishing, Inc., 2004. lightweight XLink processor for Web Services [9] E. Behrends, Evaluation of Queries on Linked integration. Distributed XML Data, PhD dissertation from As we mentioned in section 2, it was possible to build Mathematical-Natural Science of George-August, an application and with it try to execute queries in XML University of Gottingen, 2006. documents containing the XLink hyperlinks. But we [10] Saxon-B, http://saxon.sourceforge.net/ have seen this as a simulation of an existing [11] P. Grosso, E. Maler, J. Marsh and N. Walsh, XPath/XQuery system rather the extension of the XPointer Framework W3C Recommendation, 2003. system to handle queries in presence of XLink links. [12] NetBeans 6.0.1, www.netbeans.org An open issue to consider from the modeling point of [13] Distributed Mondial Database for XLink, view remain the pathological cases such as “Link http://www.dbis.informatik.uni-goettingen.de/ Bomb”, “Oscillator” and “Infinite Horizontal Growth” Mondial/#XML [5]: how to react if these cases occur, or maybe prevent [14] E. Wilde and D. Lowe, XPath, XLink, XPointer, the data from allowing these cycles as expressed in [5]. and XML: A Practical Guide to Web Hyperlinking and We aim to experiment the system for such cases as a , Addison Wesley, ISBN: 0-201-703440, further work. Also, a performance comparison of our 2002 system with the existing XLink systems, LDAP-based [15] S. Holzner, XPath: Navigating XML with XPath 1.0 and the eXist extension is the subject of future work. and 2.0 Kick Start, Sams Publishing, ISBN: 0-672- 32411-3, 2003. [16] O. Fritzen, Modeling and Querying of Distributed References: XML Data in Presence of 3rd Party Links, PhD [1] S. DeRose, E. Maler, D. Orchard, XML Linking dissertation from Mathematical-Natural Science of Language (XLink) Version 1.0 W3C Recommendation, George-August, University of Gottingen, 2007. 2001. [17] J. Simpson, XPath and XPointer, O’Reilly, [2] XLink implementations: http://www.w3.org/XML/ ISBN:0-596-0029-2, 2002 Linking [18] G. Vossen, S. Hagemann, Unleashing Web 2.0: [3] XPath/XQuery implementations: http://www.w3.org From Concepts to Creativity, Morgan Kaufmann, /XML/Query/ 2007. [4] L. Ahmedi, Making XPath Reach for the Web- [19] Alonso, G., F. Casati, H. Kuno, V. Machiraju: Web Wide Links, The 20th ACM International Symposium on Services. Concepts, Architectures and Applied Computing (SAC 2005) - Web Technologies and Applications. Springer Verlag, Berlin, 2004. Applications Track, March 13-17, 2005, Santa Fe, New Mexico, USA.

ISSN:1790-5109 221 ISBN: 978-960-474-002-4