The Active XML Project: an Overview
Total Page:16
File Type:pdf, Size:1020Kb
The VLDB Journal manuscript No. (will be inserted by the editor) Serge Abiteboul · Omar Benjelloun · Tova Milo The Active XML project: an overview the date of receipt and acceptance should be inserted later Abstract This paper provides an overview of the Ac- 1 Introduction tive XML project developed at INRIA over the past five years. Active XML (AXML, for short), is a declarative Since the 60’s, the database community has developed framework that harnesses Web services for distributed the necessary science and technology to manage data data management, and is put to work in a peer-to-peer in central repositories. From the early days, many ef- architecture. forts have been devoted to extending these techniques to The model is based on AXML documents, which are the management of distributed data as well, and in par- XML documents that may contain embedded calls to ticular to its integration, e.g., [42,78,71]. However, the Web services, and on AXML services, which are Web Web revolution is setting up new standards, primarily services capable of exchanging AXML documents. An because of (i) the high heterogeneity and autonomy of AXML peer is a repository of AXML documents that data sources, and (ii) the scale of the Web. The goal of acts both as a client by invoking the embedded service this work is to propose a new framework for distributed calls, and as a server by providing AXML services, which data management, which addresses these concerns, and are generally defined as queries or updates over the per- is therefore suitable for data management on the Web. sistent AXML documents. The approach gracefully combines stored information XML and Web services Our approach is based on XML with data defined in an intensional manner as well as and Web services. XML [82] is a data model and rep- dynamic information. This simple, rather classical idea resentation format for semi-structured data [9], which leads to a number of technically challenging problems, raised a considerable interest among the data manage- both theoretical and practical. ment community as a standard for data exchange be- In this paper, we describe and motivate the AXML tween remote applications, notably over the Web. Most model and language, overview the research results ob- importantly for data management, XML comes equipped tained in the course of the project, and show how all the with high level query languages such as XQuery [84], and pieces come together in our implementation. flexible schema languages such as XML Schema [83]. The publication of XML data and the access to it are fa- cilitated by Web services, which are network-accessible Keywords Data exchange, Intensional information, programs taking XML parameters and returning XML Web services, XML results. The WSDL [79] and SOAP [69] standards en- able describing and calling these remote programs seam- lessly over the Internet. In our approach, XML and Web The first and third authors were partially funded by the Eu- ropean Project Edos. services are effectively leveraged for complex data man- agement tasks. S. Abiteboul INRIA-Futurs & LRI [email protected] Peer-to-peer architecture Centralized architectures for O. Benjelloun data integration somehow contradict the essence of the Stanford University Web, which is based on a loose-coupling of autonomous [email protected] systems. Furthermore, centralized architectures hardly T. Milo scale up to the large size of the Web. Peer-to-peer archi- Tel-Aviv University tectures propose a credible alternative, and are already [email protected] spreading, notably in the context of file-sharing (e.g., see 2 Serge Abiteboul et al. [52,40,21]). Peer-to-peer architectures capture the au- certain extent, the aforementioned material is not new. tonomous nature of the participating systems, and their However, only facets of it were previously presented in ability to act both as producers of information (i.e., as conference articles. We believe that it is important to servers) and as consumers of information produced by now present the general picture. others (i.e., as clients). In our approach, a peer-to-peer AXML documents and services fit nicely in a peer-to- architecture forms the basis for scalable distributed data peer architecture, where each system is a persistent store management. of AXML documents, and may act both as a client, by invoking the service calls embedded in its AXML docu- We present Active XML (AXML in short) [15], a lan- ments, and as a server, by providing services over these guage that leverages Web services for distributed data documents. Building on this paradigm, we implemented management and is put to work in a peer-to-peer ar- a system (called the AXML peer) that is devoted to the chitecture. AXML documents are XML documents with management of AXML documents and services. The sec- embedded calls to Web services. Such documents are en- ond contribution of the paper is a detailed presentation riched by the results of invocations of the service calls of the AXML Peer system, both in terms of its func- they contain. Since service invocations rely exclusively on tionality and its implementation. We cover extensions Web services standards, AXML documents are portable to the system. In particular, we mention bridges to a and can be exchanged. The AXML model also defines workflow system (namely BPEL4WS) and P2P proto- AXML services, which are Web services that exchange cols (namely JXTA) that considerably extend the prac- AXML documents. The ability of participating systems ticality of AXML. We also discuss some AXML peers to exchange AXML documents leads to powerful data- with persistent storage (provided by the eXist [37] and oriented schemes for distributed computation, where sev- Xyleme [86] systems), that we developed to support the eral systems dynamically collaborate to perform a spe- management of large volumes of AXML documents. cific data management task, possibly discovering new rel- The paper is organized as follows. The AXML model evant data sources at run-time. is presented and motivated in Section 2, where tech- One should note that data with embedded calls to niques for schema-based exchange control and lazy query operations is not a new idea, and has even already been evaluation are also overviewed. The functionality of the considered in the context of XML and Web services. For AXML peer, both as a client and a server is the topic instance, in Microsoft Office XP, SmartTags within Of- of Section 3. The implementation of the AXML peer is fice documents can be linked to Microsoft’s .NET plat- discussed in Section 4. Our experience with the AXML form for Web services [66]. However, to our knowledge, peer in terms of building applications, and the lessons AXML is the first proposal that actually turns calls to we learned are presented in Section 5. Systems of AXML Web services embedded in XML documents into a power- peers are considered in Section 6. After a study of related ful tool for distributed data management. By seamlessly work in Section 7, we conclude in a last section. combining extensional data (expressed in XML) and in- tensional data (the service calls, which provide means to get data) our approach notably captures different styles 2 The Active XML model of data integration, such as warehousing and mediation, and allows for flexible combinations of both. In this section, we introduce the fundamental compo- We give here a comprehensive presentation of a many- nents of the Active XML language, namely AXML doc- year project, which developed and used Active XML as uments and AXML services. We illustrate them through its foundation. We thoroughly discuss motivations for examples, and give motivations for this new model. We AXML. Although such motivations appeared in shorter overview at the end of this section two important is- forms in previous papers, they evolved greatly with our sues raised by the AXML model, namely AXML data understanding of the issues. So, we thought it was useful exchange and the optimization of queries over AXML to include them here in detail. We also present the key documents. technical issues raised by the management of AXML doc- uments and services. While previous papers (e.g., [57,5, 14]) addressed these issues, none of them explained how 2.1 Active XML documents the various pieces fit together to form the big picture of data management using AXML. The presentation will Active XML documents are based on the simple idea not be detailed and the reader is referred to previous pa- of embedding calls to Web services inside XML docu- pers for detailed descriptions of particular aspects. We ments. More precisely, we define an XML syntax to de- also discuss a variety of applications that we developed note service calls, and allow elements conforming to this using AXML, and lessons we learned from them. syntax to appear anywhere in a document. Intuitively, A first contribution of the paper is therefore a com- these calls represent some (XML) information that is prehensive presentation of Active XML, the language not given extensionally, but intensionally, by providing and the general project, insisting on motivations. To a means to get the corresponding data. Service calls can The Active XML project: an overview 3 be materialized, which means that the associated Web the service getEvents provided by TimeOut.com. For the service is invoked (using the SOAP protocol), and the sake of clarity, the service attribute syntax that is used results it returns enrich the document, at the location of in the paper, is actually a simplification of the full syn- the service call. tax, which accounts for all the information needed to Figure 1 is a simple AXML document, which repre- invoke Web services using the SOAP protocol.