Evaluation of Xpath Queries Against XML Streams Dan Olteanu

Total Page:16

File Type:pdf, Size:1020Kb

Evaluation of Xpath Queries Against XML Streams Dan Olteanu Evaluation of XPath Queries against XML Streams Dan Olteanu Dissertation zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften an der Fakult¨at fur¨ Mathematik, Informatik und Statistik der Ludwig{Maximilians{Universit¨at Munc¨ hen vorgelegt von Dan Olteanu Munc¨ hen, Dezember 2004 Erstgutachter: Fran¸cois Bry Zweitgutachter: Dan Suciu (University of Washington) Tag der mundlic¨ hen Prufung:¨ 11. Februar 2005 To my wife Flori iv v Abstract XML is nowadays the de facto standard for electronic data interchange on the Web. Available XML data ranges from small Web pages to ever-growing repositories of, e.g., biological and astronomical data, and even to rapidly changing and possibly unbounded streams, as used in Web data integration and publish-subscribe systems. Animated by the ubiquity of XML data, the basic task of XML querying is becoming of great theoretical and practical importance. The last years witnessed efforts as well from practitioners, as also from theoreticians towards defining an appropriate XML query language. At the core of this common effort has been identified a navigational approach for information localization in XML data, comprised in a practical and simple query language called XPath [46]. This work brings together the two aforementioned \worlds", i.e., the XPath query eval- uation and the XML data streams, and shows as well theoretical as also practical relevance of this fusion. Its relevance can not be subsumed by traditional database management systems, because the latter are not designed for rapid and continuous loading of individual data items, and do not directly support the continuous queries that are typical for stream applications [17]. The first central contribution of this work consists in the definition and the theoretical investigation of three term rewriting systems to rewrite queries with reverse predicates, like parent or ancestor, into equivalent forward queries, i.e., queries without reverse predicates. Our rewriting approach is vital to the evaluation of queries with reverse predicates against unbounded XML streams, because neither the storage of past fragments of the stream, nor several stream traversals, as required by the evaluation of reverse predicates, are affordable. Beyond their declared main purpose of providing equivalences between queries with reverse predicates and forward queries, the applications of our rewriting systems shed light on other query language properties, like the expressivity of some of its fragments, the query minimization, or even the complexity of query evaluation. For example, using these systems, one can rewrite any graph query into an equivalent forward forest query. The second main contribution consists in a streamed and progressive evaluation strategy of forward queries against XML streams. The evaluation is specified using compositions of so-called stream processing functions, and is implemented using networks of deterministic pushdown transducers. The complexity of this evaluation strategy is polynomial in both the query and the data sizes for forward forest queries and even for a large fragment of graph queries. The third central contribution consists in two real monitoring applications that use directly the results of this work: the monitoring of processes running on UNIX comput- ers, and a system for providing graphically real-time traffic and travel information, as broadcasted within ubiquitous radio signals. vi Zusammenfassung Heutzutage ist XML der de facto Standard fur¨ den Datenaustausch im Web. Dabei reicht die Spanne an verfugbaren¨ XML Daten von kleinen Webseiten bis hin zu immer gr¨oßer werdenden Sammlungen, beispielsweise an biologischen oder anstronomischen Daten und sogar, m¨oglicherweise unbegrenzte, Datenstr¨ome mit schnellem Datenaufkommen, wie sie in publish-subscribe Systemen verwendet werden. Getrieben durch die weite Verbreitung von XML Daten, bekommt die Anfragebear- beitung an XML Daten zunehmend gr¨oßere theoretische und praktische Bedeutung. In den letzten Jahren konnten Initiativen sowohl von Seiten der Industrie als auch aus der Forschung beobachtet werden, die darauf abziehen eine angemessene XML Anfragesprache zu definieren. Das Kernergebnis dieser Initiativen ist die Identifikation eines navigationalen Ansatzes zur Lokalisierung von Informationen in XML Daten in der benutzer-orientierten Anfragesprache XPath. Diese Arbeit bringt die zwei oben genannten Welten, die XPath Anfragebearbeitung und XML Str¨ome, zusammen und zeigt die sowohl praktische als auch theoretische Rele- vanz dieser Verbindung. Der erste Hauptbeitrag dieser Arbeit besteht in der Definition und der theoretischen Untersuchung von drei Termersetzungssystemen, um Anfragen mit sogenannten \reverse" Predikaten, wie beispielsweise parent oder ancestor, in equivalente Anfragen, die keine solche Predikate enthalten, umzuschreiben. Unser Ansatz ist essentiell fuer die Auswertung von Anfragen mit \reverse" Predikaten gegen unbegrenzte XML Str¨ome, da weder die Speicherung von bereits verarbeiteten Stromfragmenten noch mehrere Durchl¨aufe ub¨ er den XML Strom erforderlich sind. Neben diesem Hauptziel, die Anwendungen unserer Umschreibungssysteme werfen ein neues Licht auf andere Eigenschaften der Anfragesprache, wie die Ausdruckskraft einiger Fragmente, die Minimierung von Anfragen, und sogar die Komplexit¨at der Anfrageauswer- tung. Man kann beispielsweise unter Nutzung dieser Umschreibungssysteme beliebige Graphanfragen in equivalente Waldanfragen ohne \reverse" Predikate umschreiben. Der zweite Hauptbeitrag besteht in einer strom-basierten, progressiven Auswertungsstrate- gie fur¨ Waldanfragen ohne \reverse" Predikate gegen XML Str¨ome. Die Auswertung wird spezifiziert durch die Komposition von sogenannten Stromverarbeitungsfunktionen und implementiert unter Verwendung von Netzwerken aus deterministischen Kellerautomaten. Die Komplexit¨at dieser Auswertungsstrategie ist polynomiell sowohl in der Gr¨osse der An- frage als auch der Daten fuer Waldanfragen ohne \reverse" Predikate und sogar fur¨ viele Graphanfragen. Der letzte Hauptbeitrag besteht aus zwei praktisch verwendbaren Ub¨ erwachungssystemen, die direkt auf den Resultaten dieser Arbeit aufsetzen: die Ub¨ erwachung von auf einem UNIX System laufenden Prozessen und ein System, das Verkehrsinformationen aus Ra- diosignalen in Echtzeit ub¨ erwacht und graphisch aufbereitet. vii Acknowledgments During the last three years, many people have contributed directly or indirectly to the development of this dissertation. I would like to express my gratitude to them. First of all I am deeply indebted to my advisor Fran¸cois Bry, for his continuing trust and support during the evolution of this thesis. Further, I am grateful to Dan Suciu, whose work on XML query processing influenced constantly my research directions. This thesis and its author further benefitted from long and very useful discussions with two of my best supporters Tim Furche and Holger Meuss. Without their active commitment, this disser- tation would not have been possible. I thank the students, whose theses I co-supervised, for their interest in my work and for bringing new relevant ideas to surface: Fatih Coskun, Serap Durmaz, Tim Furche, Tobias Kiesling, Sebastian Schaffert, Dominik Schwald, and Markus Spannagel. I thank also the members of our teaching and research group for creat- ing a stimulating environment at the office and a pleasant stay in Munich: among others, Slim Abdennadher, Sacha Berger, Tim Geisler, Martin Josko, Michael Kraus, Ellen Lilge, Bernhard Lorenz, Hans Jurgen¨ Ohlbach, Paula P˘atr^anjan, Stephanie Spranger, and Felix Weigel. I especially want to mention Norbert Eisinger for his always competent advises on various subjects ranging from easy ones, like confluence of rewriting systems, to complex ones, like teaching computer science topics. Last, but definitely not least, I thank my wife, Flori, for her love and non-interrupting support, my parents and my brother for enduring the physical distance that separated us for such a long time, and all my friends for the weekends we spent together doing no research. viii Contents 1 Introduction 1 1.1 Data Streams: Use, Concepts, and Research Issues . 2 1.2 Thesis Contributions and Overview . 6 2 Preliminaries 9 2.1 XML Essentials . 9 2.2 Example Scenarios . 11 3 LGQ (Logic Graph Query): An Abstraction of XPath 15 3.1 Data Model . 16 3.2 Syntax . 19 3.3 Semantics . 22 3.4 Digraph Representations . 25 3.5 Path, Tree, DAG, Graph Formulas and Queries . 26 3.6 Forward Formulas and their Specializations . 28 3.7 Measures for Formulas . 29 3.8 LGQ versus XPath . 31 3.8.1 XPath . 31 3.8.2 Conciseness of LGQ over XPath . 36 3.8.3 XPath=LGQ Forests . 38 4 Source-to-source Query Transformation: From LGQ to Forward LGQ 45 4.1 Problem Description . 48 4.2 A Taste of Term Rewriting Systems . 52 4.3 Rewrite Rules preserving LGQ Equivalence . 56 4.3.1 Rules adding single-join DAG-Structure . 57 4.3.2 Rules preserving Tree-Structure . 59 4.3.3 Rules removing DAG-Structure . 67 4.3.4 Rules for LGQ Normalization . 69 4.3.5 Rules for LGQ Simplification . 70 4.4 Three Approaches to Rewrite LGQ to Forward LGQ Forests . 72 4.4.1 Rewriting Examples . 73 4.4.2 Soundness and Completeness . 76 x Contents 4.4.3 Termination . 79 4.4.4 Confluence . 80 4.5 Complexity Analysis . 81 4.6 Related Work . 89 5 Evaluation of Forward LGQ Forest Queries against XML Streams 95 5.1 Problem Description . 96 5.2 Specification . 101 5.2.1 Stream Messages . 102 5.2.2 Stream Processing Functions . 103 5.2.3 From LGQ to Stream
Recommended publications
  • Describing Media Content of Binary Data in XML W3C Working Group Note 2 May 2005
    Table of Contents Describing Media Content of Binary Data in XML W3C Working Group Note 2 May 2005 This version: http://www.w3.org/TR/2005/NOTE-xml-media-types-20050502 Latest version: http://www.w3.org/TR/xml-media-types Previous version: http://www.w3.org/TR/2004/WD-xml-media-types-20041102 Editors: Anish Karmarkar, Oracle Ümit Yalçınalp, SAP (formerly of Oracle) Copyright © 2005 W3C ® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. > >Abstract This document addresses the need to indicate the content-type associated with binary element content in an XML document and the need to specify, in XML Schema, the expected content-type(s) associated with binary element content. It is expected that the additional information about the content-type will be used for optimizing the handling of binary data that is part of a Web services message. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/. This document is a W3C Working Group Note. This document includes the resolution of the comments received on the Last Call Working Draft previously published. The comments on this document and their resolution can be found in the Web Services Description Working Group’s issues list and in the section C Change Log [p.11] . A diff-marked version against the previous version of this document is available.
    [Show full text]
  • XML Specifications Growth of the Web
    Web Services Standards Overview Dependencies Messaging Specifications SOAP 1.1 SOAP 1.2 Interoperability Business Process Specifications Management Specifications Presentation SOAP Message Transmission Optimization Mechanism WS-Notification the trademarks of their respective owners. of their respective the trademarks Management Using Web Management Of WS-BaseNotification Issues Business Process Execution WS-Choreography Model Web Service Choreography Web Service Choreography WS-Management Specifications Services (WSDM-MUWS) Web Services (WSDM-MOWS) Language for Web Services 1.1 Overview Interface Description Language AMD, Dell, Intel, Microsoft and Sun WS-Topics (BPEL4WS) · 1.1 · BEA Systems, IBM, (WSCI) · 1.0 · W3C 1.0 1.0 1.0 · W3C (CDL4WS) · 1.0 · W3C Microsystems Microsoft, SAP, Sun Microsystems, SAP, BEA Systems WS-BrokeredNotification Working Draft Candidate Recommendation OASIS OASIS Published Specification Web Services for Remote Security Resource Basic Profile Siebel Systems · OASIS-Standard and Intalio · Note OASIS-Standard OASIS-Standard Metadata Portlets (WSRP) WS-Addressing – Core 1.1 ̆ ̆ ̆ ̆ ̆ ̆ ̆ 2.0 WS-I Business Process Execution Language for Web Services WS-Choreography Model Overview defines the format Web Service Choreography Interface (WSCI) describes Web Service Choreography Description Language Web Service Distributed Management: Management Using Web Service Distributed Management: Management Of WS-Management describes a general SOAP-based WS-Addressing – WSDL Binding 1.1(BPEL4WS) provides a language for the formal
    [Show full text]
  • What Is XML Schema?
    72076_FM 3/22/02 10:39 AM Page i XML Schema Essentials R. Allen Wyke Andrew Watt Wiley Computer Publishing John Wiley & Sons, Inc. 72076_AppB 3/22/02 10:47 AM Page 378 72076_FM 3/22/02 10:39 AM Page i XML Schema Essentials R. Allen Wyke Andrew Watt Wiley Computer Publishing John Wiley & Sons, Inc. 72076_FM 3/22/02 10:39 AM Page ii Publisher: Robert Ipsen Editor: Cary Sullivan Developmental Editor: Scott Amerman Associate Managing Editor: Penny Linskey Associate New Media Editor: Brian Snapp Text Design & Composition: D&G Limited, LLC Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. This book is printed on acid-free paper. Copyright © 2002 by R. Allen Wyke and Andrew Watt. All rights reserved. Published by John Wiley & Sons, Inc. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copy- right Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850- 6008, E-Mail: PERMREQ @ WILEY.COM.
    [Show full text]
  • Command Injection in XML Signatures and Encryption Bradley W
    Command Injection in XML Signatures and Encryption Bradley W. Hill Information Security Partners, July 12, 2007 [email protected] Abstract. The XML Digital Signature1 (XMLDSIG) and XML Encryption2 (XMLENC) standards are complex protocols for securing XML and other content. Among its complexities, the XMLDSIG standard specifies various “Transform” algorithms to identify, manipulate and canonicalize signed content and key material. Unfortunately, the defined transforms have not been rigorously constrained to prevent their use as attack vectors, and denial of service or even arbitrary code execution are probable in implementations that have not specifically guarded against such risks. Attacks against the processing application can be embedded in the KeyInfo portion of a signature, making them inherently unauthenticated, or in the SignedInfo block. Although tampering with the SignedInfo should be detectable, a defective implied order of operations in the specification may still allow unauthenticated attacks here. The ability to execute arbitrary code and perform file system operations with a malicious, invalid signature has been confirmed by the researcher in at least two independent XMLDSIG implementations, and other implementations may be similarly vulnerable. This paper describes the vulnerabilities in detail and offers advice for remediation. The most damaging attack is also likely to apply in other contexts where XSLT is accepted as input, and should be considered by all implementers of complex XML processing systems. Categories and Subject Descriptors Primary Classification: K.6.5 Security and Protection Subject: Invasive software Unauthorized access Authentication Additional Classification: D.2.3 Coding Tools and Techniques (REVISED) Subject: Standards D.2.1 Requirements/Specifications (D.3.1) Subject: Languages Tools General Terms: Security, Reliability, Verification, and Design.
    [Show full text]
  • Describing Media Content of Binary Data in XML W3C Working Group Note 4 May 2005
    Table of Contents Describing Media Content of Binary Data in XML W3C Working Group Note 4 May 2005 This version: http://www.w3.org/TR/2005/NOTE-xml-media-types-20050504 Latest version: http://www.w3.org/TR/xml-media-types Previous version: http://www.w3.org/TR/2005/NOTE-xml-media-types-20050502 Editors: Anish Karmarkar, Oracle Ümit Yalçınalp, SAP Copyright © 2005 W3C ® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. > >Abstract This document addresses the need to indicate the content-type associated with binary element content in an XML document and the need to specify, in XML Schema, the expected content-type(s) associated with binary element content. It is expected that the additional information about the content-type will be used for optimizing the handling of binary data that is part of a Web services message. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/. This document is a W3C Working Group Note. This document includes the resolution of the comments received on the Last Call Working Draft previously published. The comments on this document and their resolution can be found in the Web Services Description Working Group’s issues list. There is no technical difference between this document and the 2 May 2005 version; the acknowledgement section has been updated to thank external contributors.
    [Show full text]
  • XML Information Set
    XML Information Set XML Information Set W3C Recommendation 24 October 2001 This version: http://www.w3.org/TR/2001/REC-xml-infoset-20011024 Latest version: http://www.w3.org/TR/xml-infoset Previous version: http://www.w3.org/TR/2001/PR-xml-infoset-20010810 Editors: John Cowan, [email protected] Richard Tobin, [email protected] Copyright ©1999, 2000, 2001 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Abstract This specification provides a set of definitions for use in other specifications that need to refer to the information in an XML document. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C. This is the W3C Recommendation of the XML Information Set. This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the file:///C|/Documents%20and%20Settings/immdca01/Desktop/xml-infoset.html (1 of 16) [8/12/2002 10:38:58 AM] XML Information Set Web. This document has been produced by the W3C XML Core Working Group as part of the XML Activity in the W3C Architecture Domain.
    [Show full text]
  • THE TEXT ENCODING INITIATIVE Edward
    THE TEXT ENCODING INITIATIVE Edward Vanhoutte & Ron Van den Branden Centre for Scholarly Editing and Document Studies Royal Academy of Dutch Language and Literature - Belgium Koningstraat 18 9000 Gent Belgium [email protected] [email protected] KEYWORDS: text-encoding, markup, markup languages, XML, humanities Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 1 ABSTRACT The result of community efforts among computing humanists, the Text Encoding Initiative or TEI is the de facto standard for the encoding of texts in the humanities. This article explains the historical context of the TEI, its ground principles, history, and organisation. Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 2 INTRODUCTION The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. This standard is the collaborative product of a community of scholars, chiefly from the humanities, social sciences, and linguistics who are organized in the TEI Consortium (TEI-C <http://www.tei-c.org>). The TEI Consortium is a non-profit membership organisation and governs a wide variety of activities such as the development, publication, and maintenance of the text encoding standard documented in the TEI Guidelines, the discussion and development of the standard on the TEI mailing list (TEI-L) and in Special Interest Groups (SIG), the gathering of the TEI community on yearly members meetings, and the promotion of the standard in publications, on workshops, training courses, colloquia, and conferences.
    [Show full text]
  • 5241 Index 0939-0964.Qxd 29/08/02 5.30 Pm Page 941
    5241_index_0939-0964.qxd 29/08/02 5.30 pm Page 941 INDEX 941 5241_index_0939-0964.qxd 29/08/02 5.30 pm Page 942 Index 942 Regular A Alternatives, 362 Analysis Patterns: Reusable Expression ABSENT value, 67 Object Models, 521 Symbols abstract attribute, 62, 64–66 ancestor (XPath axis), 54 of complexType element, ancestor-or-self (XPath axis), . escape character, 368, 369 247–248, 512, 719 54 . metacharacter, 361 of element element, 148–149 Annotation, 82 ? metacharacter, 361, 375 mapping to object-oriented defined, 390 ( metacharacter, 361 language, 513–514 mapping to object-oriented ) metacharacter, 361 Abstract language, 521 { metacharacter, 361 attribute type, 934 Microsoft use of term, } metacharacter, 361 defined, 58 821–822 + metacharacter, 361, 375 element type, 16, 17, 18, 934 properties of, 411 * metacharacter, 361, 375 object, corresponding to docu- annotation content option ^ metacharacter, 379 ment, 14 for schema element, 115 \ metacharacter, 361 uses of term, 238, 931–932 annotation element, 82, 83, | metacharacter, 361 Abstract character, 67 254, 260, 722, 859 \. escape character, 366 Abstract document attributes of, 118 \? escape character, 366 document information item content options for, 118–119 \( escape character, 367 view of, 62 example of use of, 117 \) escape character, 367 infoset view of, 62 function of, 116, 124, 128 \{ escape character, 367 makeup of, 59 nested, 83–84 \} escape character, 367 properties of, 66 Anonymous component, 82 \+ escape character, 367 Abstract element, 14–15 any element, 859 \- escape character,
    [Show full text]
  • Towards a Content-Based Billing Model: the Synergy Between Access Control and Billing
    TOWARDS A CONTENT-BASED BILLING MODEL: THE SYNERGY BETWEEN ACCESS CONTROL AND BILLING Peter J. de Villiers and Reinhardt A. Botha Faculty of Computer Studies, Port Elizabeth Technikon, Port Elizabeth [email protected], [email protected] Abstract In the internet environment the importance of content has grown. This has brought about a change in emphasis to provide content-based ser- vices in addition to traditional Internet transactions and services. The introduction of content-based services is the key to increase revenues. To e®ectively increase revenues through content-based services e±cient billing systems need to be implemented. This paper will investigate a presumed synergy between access control and billing. The investigation identi¯es three phases. Subsequently the similarities and di®erences be- tween access control and billing in each of these phases are investigated. This investigation assumes that information delivery takes place in an XML format. Keywords: access control, billing, XML 1. INTRODUCTION The global economy has made a transition from an economy with an industrial focus to being an economy based on knowledge and infor- mation. This new paradigm is enabled by the use of Information and Communication Technologies (ICT). The increasing pace of technological innovations in the ¯eld of ICT has given rise to new ways of communicating, learning and conducting business. The Internet has facilitated the establishment of a "borderless" environment for communications and the electronic delivery of certain services. This is known as electronic business (e-business). Electronic business is the key to doing business in the new global economy that is based on knowledge and information [15].
    [Show full text]
  • 3.4 the Extensible Markup Language (XML)
    TEI BY EXAMPLE MODULE 0: INTRODUCTION TO TEXT ENCODING AND THE TEI Edward Vanhoutte Ron Van den Branden Melissa Terras Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium, Gent, 9 July 2010 Last updated September 2020 Licensed under a Creative Commons Attribution ShareAlike 3.0 License Module 0: Introduction to Text Encoding and the TEI TABLE OF CONTENTS 1. Introduction....................................................................................................................................................................1 2. Text Encoding in the Humanities...............................................................................................................................2 3. Markup Languages in the Humanities.......................................................................................................................3 3.1 Procedural and Descriptive Markup.................................................................................................................... 3 3.2 Early Attempts......................................................................................................................................................... 3 3.3 The Standard Generalized Markup Language (SGML)...................................................................................... 4 3.4 The eXtensible Markup Language (XML)............................................................................................................5 4. XML: Ground Rules........................................................................................................................................................5
    [Show full text]
  • SWAD-Europe Deliverable 5.1: Schema Technology Survey
    Sat Jun 05 2004 23:47:40 Europe/London SWAD-Europe Deliverable 5.1: Schema Technology Survey Project name: Semantic Web Advanced Development for Europe (SWAD-Europe) Project Number: IST-2001-34732 Workpackage name: 5. Integration with XML Technology Workpackage description: ☞http://www.w3.org/2001/sw/Europe/plan/workpackages/live/esw-wp-5.html Deliverable title: SWAD-Europe: Schema Technology Survey URI: ☞http://www.w3.org/2001/sw/Europe/reports/xml_schema_tools_techniques_report Authors: Stephen Buswell, Dan Brickley, Brian Matthews Abstract: This report surveys the state of schema annotation and mapping technology. It takes a practical approach by targeting the work to the needs of developers, providing background to support our attempts to answer frequently asked questions on this subject. The report first reviews previous work on 'bridging languages', giving an overview of the major approaches and uses that to motivate further technical work to progress the state of the art in this area. Status: Snapshot release for discussion and editorial work. Further revisions are planned during WP4. Comments on this document are welcome and should be sent to the ☞[email protected] list. An archive of this list is available at ☞http://lists.w3.org/Archives/Public/public-esw/ This report is part of ☞SWAD-Europe ☞Work package 5: Integration with XML Technology and addresses the topic of Schema annotation, and the relationship(s) between RDF and XML technologies. The variety of so-called 'schema languages' for the Web has caused some confusion. This document attempts to place them in context, and explore the state of the art in tools for mapping data between the different approaches.
    [Show full text]
  • XML Tutorial Description
    Introduction to XML Tutorial Description With your HTML knowledge, you have a solid foundation for working with markup languages. However, unlike HTML, XML is more flexible, Bebo White allowing for custom tag creation. This course [email protected] introduces the fundamentals of XML and its related technologies so that you can create your own markup language. InterLab 2006 FermiLab October 2006 Topics* What Is Markup? • XML well-formed documents • Information added to a text to make its structure • Validation concepts comprehensible • DTD syntax and constructs • Pre-computer markup (punctuational and presentational) • W3C Schema syntax and constructs • Word divisions • XSL(T) syntax and processing • Punctuation • XPath addressing language • Copy-editor and typesetters marks • Development and design considerations • Formatting conventions • XML processing model • XML development and processing tools * Tutorial plus references Computer Markup (1/3) Computer Markup (2/3) • Any kind of codes added to a document • Declarative markup (cont) • Typesetting (presentational markup) • Names and structure • Macros embedded in ASCII • Framework for indirection • Commands to define the layout • Finer level of detail (most human-legible signals are • MS Word, TeX, RTF, Scribe, Script, nroff, etc. overloaded) • *Hello* Æ Hello • Independent of presentation (abstract) ••/Hello//Hello/ Æ Hello • Often called “semantic” • Declarative markup • HTML (sometimes) ••XMLXML Computer Markup (3/3) Markup – ISO-Definitions • Semantic Markup • Markup – Text
    [Show full text]