The XML Framework and Its Implications for Corpus Access and Use

Total Page:16

File Type:pdf, Size:1020Kb

The XML Framework and Its Implications for Corpus Access and Use The XML Framework and Its Implications for Corpus Access and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, NY 12604-0520 USA [email protected] Abstract The eXtensible Markup Language (XML) (Bray, et al., 1998) is the emerging standard for data representation and exchange on the World Wide Web. The XML Framework includes very powerful mechanisms for accessing and manipulating XML documents that are likely to significantly impact the way annotated corpora are created and accessed. This paper outlines a few of the possibilities. Introduction in the document itself), and allows addressing text The eXtensible Markup Language (XML) (Bray, et al., fragments within a particular element by providing 1998) is the emerging standard for data representation and predicates for manipulating chains of characters; • exchange on the World Wide Web. At its most basic level XPointer (DeRose, Daniel, & Maler, 1999), which XML is a document markup language directly derived extends XPath syntax to allow addressing points and from SGML (i.e., allowing tagged text (elements), ranges as well as nodes, locating information by element nesting, and element references). As such, string matching, and use of addressing expressions in translation of an SGML encoded document into XML is URI-references as fragment identifiers. relatively trivial. However, various features and For example, the Xpath expression extensions of XML make it a far more powerful tool for /div/p[2]/s[3] specifies the third <s> (sentence) data representation and access than SGML. The following element within the second <p> (paragraph) element within outlines some of these mechanisms and suggests ways in each <div> (text division) element; /descendant::p which they can be used for creation and exploitation of specifies all <p> elements in the document. In addition, annotated corpora. Xpath allows addressing text fragments within a particular element by providing predicates for manipulating chains XML links of characters. The expression The recommended practice in encoding annotated substring(/p/s[2]/text(),6) corpora is to maintain all or most annotations in separate selects the string "one would expect that the whole sky documents, each of which references appropriate locations would be as bright as the sun, even at night." from the in the document containing the original data (Ide & Brew, following text: 2000). This strategy yields, in essence, a finely linked <p><s id="d3p13s4">The difficulty hypertext format where the links specify a semantic role is that in an infinite static universe rather than navigational options. That is, links signify the nearly every line of sight would end on the surface of a star.</s><s location(s) where markup contained in a given annotation id="d3p13s5">Thus one would expect document would appear in the document to which it is that the whole sky would be as bright linked. As such, annotation information comprises remote as the sun, even at night.</s></p> or "stand-off" markup that is virtually added to the The expression document to which it is linked. In principle, the original data could contain no markup at all (or, more likely, substring(/p/s[2]/text(), 10, 12) markup for gross logical structure only); all markup could • selects "would expect". Thus the reference is made by be retained in separate documents with links into the specifying (1) the address (absolute or relative) of the original based on offsets. element closest to the substring to be referred to, and The standoff scheme, then, requires addressing XML (2) the substring within this element. elements, as well as characters and chains of characters within those elements. It also requires that elements and The Xlink mechanism can be used to link corresponding characters can be addressed both within the same segments of two or more primary texts (As for alignment document and in other XML documents. XML provides of text or speech), or to link annotation documents to a the following linking mechanisms, which are substantially base document containing the primary text. For example, more powerful than the mechanisms provided in SGML, in the following, annotation information (e.g., morpho- which satisfy these requirements: syntactic information) about a specific token (<tok>) is • XLink (DeRose, et al., 2000), a mechanism for linked to the string of characters in the original text to specifying a link (uni-directional or more complex which it applies: linking structures) between two or more resources or <tok portions of resources; xlink:href= • the XML Path Language (XPath) (Clark & DeRose, "substring(/p/s[2]/text(),10,12)"> 1999), an extended addressing syntax that defines a Although this example shows linkage for text, the concise notation for element localization in the mechanism provides for linking resources in any medium document tree (as defined by the nesting of elements (audio, video, etc.), which allows for linking speech, external images, video, applets, form-processing A suite of documents representing a base text (or programs, style sheets, etc. texts) and its annotations can be manipulated to serve any In addition to specifying the target location for application that relies on part or all of its contents. Thus, information in the same or external documents, XLink XSLT is likely to have the most to offer for manipulation attributes can be used to specify the role of the link, i.e., of and access to annotated corpora. how the link should be activated (by hand, or XSLT is relatively complex and will not be described automatically by the browser) and what to do with the in detail here.1 A short example can provide some idea of target fragment (replace it or insert it into the source the possibilities. Using as input a document containing document). morpho-syntactic information (e.g., a document In XML, annotated fragments are referenced by the containing the fragment in Figure 12), the XSLT URI (remote or local) of the target resource, and an document in Figure 2 can be used to create an HTML extended pointer identifying a element and, where document that displays a text in "word | lemma | pos" necessary, the selected substring of that element's content, form. When the resulting HTML document is loaded into as in the following: a browser, it will display the following: <tok It|it|PPER3 was|be|PAST3 a|a|DINT xlink:href= bright|bright|ADJE cold|cold|ADJE "http://www.loria.fr/doc.xml#xptr day|day|NN… (substring(/p/s[2]/text(), 10, 12))"> Annotation resulting from automatic processing <?xml version="1.0"> (marking of sentence boundaries, tokens, links between <chunk type="BODY" lang="en" parallel texts, etc.) often includes thousands of links to the xml:base= same external document. Repetition of the document "http://www.cs.vassar.edu/~ME/Oen.xcesDoc#"> <par xlink:href="xptr(substring(//p[1]"> name on every relevant element in an annotation <s xlink:href="xptr(substring(//p/s[1]"> document would obviously significantly multiply its size. <tok type="WORD" XML includes an attribute xml:base (Marsh, 2000) that xlink:href= can be used to specify inheritance of an attribute. For "xptr(substring(//p/s[1]/text(),1,2"> example, in the following text: <orth>It</orth> <disamb> <chunk <base>it</base> xml:base= <msd>Pp3ns</msd> "http://www.mysite.edu/doc.xml#"> <ctag>PPER3</ctag></lex> <tok <lex> xlink:href="xptr(substring <base>it</base> (/p/s[2]/text(), 10, 12))"/> <msd>Pp3ns</msd> <tok <ctag>PPER3</ctag></lex></tok> xlink:href="xptr(substring <tok type="WORD" (/p/s[2]/text(), 24, 4))"/> xlink:href= </chunk> "xptr(substring(//p/s[1]/text(),4,2"> the value of the attribute xml:base specified on the <orth>was</orth> <disamb> <chunk> element is inherited by the two <tok> elements <base>be</base> that are its children, and therefore need not be re- <msd>Vmis3s</msd> specified. The inclusion of xml:base in the XML <ctag>PAST3</ctag></lex> specification ensures that conformant XML processors <lex> <base>be</base> will handle it (unlike SGML). <msd>Vais1s</msd> <ctag>AUX1</ctag></lex> XML transformations <lex> <base>be</base> The Extensible Style Language (XSL) is a part of the <msd>Vais3s</msd> XML framework, consisting of two parts: the best known <ctag>AUX3</ctag></lex> is the XSL formatting or "style sheet" language; and a <lex> powerful tree-traversal language, XSLT (Clark, 1999), <base>be</base> <msd>Vmis1s</msd> that can be used to convert any XML document into <ctag>PAST1</ctag></lex> another document in any form (e.g., XML, well-formed <lex> HTML, plain text, etc.) by selecting, rearranging, and/or <base>be</base> adding information to it. The transformed documents may <msd>Vmis3s</msd> or may not be intended for rendering data on a computer <ctag>PAST3</ctag></lex></tok>… screen, but may be used simply to move data from one computer system or program to another (e.g., to transduce between encoding and/or annotation formats, etc.). Figure 1 : Fragment of a cesAna document XSLT supports the following kinds of document manipulation: • selection of elements or portions of element content using the XPath syntax; 1 Full documentation is available at http://www.w3.org/TR/xslt. • rearrangement or transformation of extracted 2 Note that this document, encoded according to the xcesAna information (including not only text content but also specifications (Ide, Bonhomme, & Romary, 2000), contains full element names, etc.) in the target document; sementation and annotation information, including full morpho- • addition of information in the target document. syntactic specifications for all potential annotations and the results of automatic disambiguation. different content models and attributes in order to provide <xsl:stylesheet version="1.0" for tight validation of form in each context. For example, xmnls:xsl= in the SGML-based CES (Ide, 1998a, b), header elements "http://www.w3.org/1999/XSL/Transform"> are prefixed with "h." so that names in headers are tagged <xsl:template match= “/”> with <h.name>, whose content model is different from <html> that of the <name> element that can appear in the body of <body> the text.
Recommended publications
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • RDF/XML: RDF Data on the Web
    Developing Ontologies • have an idea of the required concepts and relationships (ER, UML, ...), • generate a (draft) n3 or RDF/XML instance, • write a separate file for the metadata, • load it into Jena with activating a reasoner. • If the reasoner complains about an inconsistent ontology, check the metadata file alone. If this is consistent, and it complains only when also data is loaded: – it may be due to populating a class whose definition is inconsistent and that thus must be empty. – often it is due to wrong datatypes. Recall that datatype specification is not interpreted as a constraint (that is violated for a given value), but as additional knowledge. 220 Chapter 6 RDF/XML: RDF Data on the Web • An XML representation of RDF data for providing RDF data on the Web could be done straightforwardly as a “holds” relation mapped according to SQLX (see ⇒ next slide). • would be highly redundant and very different from an XML representation of the same data • search for a more similar way: leads to “striped XML/RDF” – data feels like XML: can be queried by XPath/Query and transformed by XSLT – can be parsed into an RDF graph. • usually: provide RDF/XML data to an agreed RDFS/OWL ontology. 221 A STRAIGHTFORWARD XML REPRESENTATION OF RDF DATA Note: this is not RDF/XML, but just some possible representation. • RDF data are triples, • their components are either URIs or literals (of XML Schema datatypes), • straightforward XML markup in SQLX style, • since N3 has a term structure, it is easy to find an XML markup. <my-n3:rdf-graph xmlns:my-n3="http://simple-silly-rdf-xml.de#"> <my-n3:triple> <my-n3:subject type="uri">foo://bar/persons/john</my-n3:subject> <my-n3:predicate type="uri">foo://bar/meta#name</my-n3:predicate> <my-n3:object type="http://www.w3.org/2001/XMLSchema#string">John</my-n3:object> </my-n3 triple> <my-n3:triple> ..
    [Show full text]
  • XML for Java Developers G22.3033-002 Course Roadmap
    XML for Java Developers G22.3033-002 Session 1 - Main Theme Markup Language Technologies (Part I) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 1 Course Roadmap Consider the Spectrum of Applications Architectures Distributed vs. Decentralized Apps + Thick vs. Thin Clients J2EE for eCommerce vs. J2EE/Web Services, JXTA, etc. Learn Specific XML/Java “Patterns” Used for Data/Content Presentation, Data Exchange, and Application Configuration Cover XML/Java Technologies According to their Use in the Various Phases of the Application Development Lifecycle (i.e., Discovery, Design, Development, Deployment, Administration) e.g., Modeling, Configuration Management, Processing, Rendering, Querying, Secure Messaging, etc. Develop XML Applications as Assemblies of Reusable XML- Based Services (Applications of XML + Java Applications) 2 1 Agenda XML Generics Course Logistics, Structure and Objectives History of Meta-Markup Languages XML Applications: Markup Languages XML Information Modeling Applications XML-Based Architectures XML and Java XML Development Tools Summary Class Project Readings Assignment #1a 3 Part I Introduction 4 2 XML Generics XML means eXtensible Markup Language XML expresses the structure of information (i.e., document content) separately from its presentation XSL style sheets are used to convert documents to a presentation format that can be processed by a target presentation device (e.g., HTML in the case of legacy browsers) Need a
    [Show full text]
  • Odata-Csdl-Xml-V4.01-Os.Pdf
    OData Common Schema Definition Language (CSDL) XML Representation Version 4.01 OASIS Standard 11 May 2020 This stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.pdf Previous stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.pdf Latest stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.pdf Technical Committee: OASIS Open Data Protocol (OData) TC Chairs: Ralf Handl ([email protected]), SAP SE Michael Pizzo ([email protected]), Microsoft Editors: Michael Pizzo ([email protected]), Microsoft Ralf Handl ([email protected]), SAP SE Martin Zurmuehl ([email protected]), SAP SE Additional artifacts: This prose specification is one component of a Work Product that also includes: • XML schemas: OData EDMX XML Schema and OData EDM XML Schema.
    [Show full text]
  • XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département D©Informatique Et De Recherche Opérationnelle Université De Montréal
    XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 Abstract This tutorial gives a high-level overview of the main principles underlying some XML technologies: DTD, XML Schema, RELAX NG, Schematron, XPath, XSL stylesheets, Formatting Objects, DOM, SAX and StAX models of processing. They are presented from the point of view of the computer scientist, without the hype too often associated with them. We do not give a detailed description but we focus on the relations between the main ideas of XML and other computer language technologies. A single compact pretty-print example is used throughout the text to illustrate the processing of an XML structure with XML technologies or with Java programs. We also show how to create an XML document by programming in Java, in Ruby, in Python, in PHP, in E4X (Ecmascript for XML) and in Swift. The source code of the example XML ®les and the programs are available either at the companion web site of this document or by clicking on the ®le name within brackets at the start of the caption of each example.
    [Show full text]
  • Integración De XML En Páginas Web Dinámicas DOM
    Integración de XML en páginas Web dinámicas DOM Jose Emilio Labra Gayo Departamento de Informática Universidad de Oviedo Motivación: Computación Dinámica Páginas Web estáticas vs. Dinámicas Computación Dinámica = Contenido se genera en el momento en que se hace la petición 2 Posibilidades: Cliente vs Servidor Internet Cliente Servidor GET http://servidor.com/hola.html http:/1.0 200 OK <html> <body> Enlace a <a href =“otro.html”>Otro</a> </body> </html> Computación Dinámica en Servidor Al solicitar ciertas páginas, el servidor genera el contenido Ejemplos: CGI's, Servlets, JSP, ASP, PHP, etc. El proceso es trasparente para el cliente El cliente recibe únicamente código HTML No tiene porqué haber problemas de usabilidad Desventajas La interactividad requiere comunicación entre cliente/servidor Mayor carga de la red y del servidor Desperdicio de las capacidades del cliente Computación dinámica en Cliente Se pueden incluir objetos computacionales que son interpretados por el cliente Varias posibilidades: Applets, Javascript, ... Se ejecutan en el entorno que ofrece el navegador DOM ofrece una API que permite acceder/manipular los elementos del documento También se puede acceder a otros elementos (eventos, barra de estado, etc.) Por seguridad no se permite escribir/leer ficheros JavaScript Un poco de historia Netscape 2 (Dic. 1995) incorporó Javascript (diseñado por Brendan Eich) Permite modificar contenido de páginas Web interactivamente JavaScript no tiene nada que ver con Java Se llamaba LiveScript (cambio de nombre por marketing) Microsoft
    [Show full text]
  • OWL 2 Web Ontology Language XML Serialization W3C Proposed Recommendation 22 September 2009
    OWL 2 Web Ontology Language XML Serialization W3C Proposed Recommendation 22 September 2009 OWL 2 Web Ontology Language XML Serialization W3C Proposed Recommendation 22 September 2009 This version: http://www.w3.org/TR/2009/PR-owl2-xml-serialization-20090922/ Latest version: http://www.w3.org/TR/owl2-xml-serialization/ Previous version: http://www.w3.org/TR/2009/CR-owl2-xml-serialization-20090611/ (color-coded diff) Editors: Boris Motik, Oxford University Computing Laboratory Bijan Parsia, University of Manchester Peter F. Patel-Schneider, Bell Labs Research, Alcatel-Lucent Contributors: (in alphabetical order) Sean Bechhofer, University of Manchester Bernardo Cuenca Grau, Oxford University Computing Laboratory Achille Fokoue, IBM Corporation Rinke Hoekstra, University of Amsterdam This document is also available in these non-normative formats: PDF version. Copyright © 2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The OWL 2 Web Ontology Language, informally OWL 2, is an ontology language for the Semantic Web with formally defined meaning. OWL 2 ontologies provide classes, properties, individuals, and data values and are stored as Semantic Web documents. OWL 2 ontologies can be used along with information written in RDF, and OWL 2 ontologies themselves are primarily exchanged as RDF documents. The OWL 2 Document Overview describes the overall state of OWL 2, and should be read before other OWL 2 documents. Page 1 of 35 http://www.w3.org/TR/2009/PR-owl2-xml-serialization-20090922/ OWL 2 Web Ontology Language XML Serialization W3C Proposed Recommendation 22 September 2009 This document specifies an XML serialization for OWL 2 that mirrors its structural specification.
    [Show full text]
  • Xml Xpath, Xslt
    More XML XPATH, XSLT CS 431 – February 27, 2006 Carl Lagoze – Cornell University XPath • Language for addressing parts of an XML document –XSLT –Xpointer • Tree model similar to DOM • W3C Recommendation (1999) – http://www.w3.org/TR/xpath Remember to think in terms of DOM trees DOCUMENT type=Element <?xml version="1.0" name="book" encoding="UTF-8"?> <book> <title lang='"en"'>"XML Basics"</title> type=Element name="title" </book> type=Text type=Attribute data="XML name="lang" Basics" data="en" Xpath Concepts •Context Node – current node in XML document that is basis of path evaluation – Default to root (remember that root is “Document”) • Location Steps – selection from context node – Axis – sub-tree(s) selection from context node – Node Test – select specific elements or node type(s) – Predicates – predicate for filtering after axis and node tests Context, Axis, Node Test, Predicate Document Root Location Path Specification • /step/step/…. – absolute from document root • step/step …. – relative from context • //step/step – anywhere in document tree • where step is: axis::node-test[predicate] axis::node-test[predicate] • child:: all children of context • descendant:: all children, grandchildren, … • parent:: parent of context • ancestor:: all nodes on path to root from context axis::node-test[predicate] • Element name: e.g. “Book” – make sure to pay attention to namespaces!!!! •Wildcard: * • Type(): where type is “node”, “text”, etc. – Remember in DOM that everything is a node axis::node-test[predicate] • Boolean and comparative operators
    [Show full text]
  • Best Practices Managing XML Data
    ® IBM® DB2® for Linux®, UNIX®, and Windows® Best Practices Managing XML Data Matthias Nicola IBM Silicon Valley Lab Susanne Englert IBM Silicon Valley Lab Last updated: January 2011 Managing XML Data Page 2 Executive summary ............................................................................................. 4 Why XML .............................................................................................................. 5 Pros and cons of XML and relational data ................................................. 5 XML solutions to relational data model problems.................................... 6 Benefits of DB2 pureXML over alternative storage options .................... 8 Best practices for DB2 pureXML: Overview .................................................. 10 Sample scenario: derivative trades in FpML format............................... 11 Sample data and tables................................................................................ 11 Choosing the right storage options for XML data......................................... 16 Selecting table space type and page size for XML data.......................... 16 Different table spaces and page size for XML and relational data ....... 16 Inlining and compression of XML data .................................................... 17 Guidelines for adding XML data to a DB2 database .................................... 20 Inserting XML documents with high performance ................................ 20 Splitting large XML documents into smaller pieces ..............................
    [Show full text]
  • XML Projects in Japan and Fujitsu's Approach to Xlink/Xpointer
    UDC 621.395.74:651.55:681.32 XML Projects in Japan and Fujitsu’s Approach to XLink/XPointer VToshimitsu Suzuki VMasatomo Goto (Manuscript received June 2, 2000) The Extensible Markup Language (XML)1) is a markup language developed in response to a recommendation by the World Wide Web Consortium (W3C).2) It is a meta lan- guage used to make an information structure. XML’s original specification is the Stan- dard Generalized Markup Language (SGML).3) Now, XML is used not only as a format language but also as a framework in various areas beyond the SGML field. This paper describes the XML technology trends, the current state of XML technology, and some case studies in Japan and at Fujitsu. This paper also describes HyBrick,4) which is an XML/SGML browser that was demonstrated at SGML’97, and the XLink5)/XPointer6) technology. 1. Introduction verified through these experiments. The specification of XML has been in use by Up to now, XML has been related to infor- the W3C since 1996. XML version 1.0 was released mation technology (IT), and because of the IT as a W3C recommendation in February 1998. revolution it is becoming a secure part of the foun- XML is a meta language for marking up informa- dation of new IT systems. Last year, a lineup of tion. XML’s original specification is the Standard XML parsers and other XML basic software ap- Generalized Markup Language (SGML), which is peared, establishing a development environment used to mark up documents for archiving and re- for application software.
    [Show full text]
  • XHTML Rdfa Modules XHTML Rdfa Modules
    XHTML RDFa Modules XHTML RDFa Modules XHTML RDFa Modules Modules to support RDF annotation of elements W3C Editor’s Draft 11 August 2007 This version: http://www.w3.org/MarkUp/2007/ED-xhtml-rdfa-20070811 Latest version: http://www.w3.org/TR/xhtml-rdfa Previous Editor’s Draft: http://www.w3.org/MarkUp/2007/ED-xhtml-rdfa-20070705 Diff from previous Editor’s Draft: xhtml-rdfa-diff.html Editors: Mark Birbeck, x-port.net Ltd. Shane McCarron, Applied Testing and Technology, Inc. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The XHTML RDFa (RDF using attributes) modules define a collection of elements and attributes that enhance a document authors ability to annotate the relationships of content within and among documents. These modules can be integrated into any markup language based upon XHTML Modularization [XHTMLMOD [p.45] ]. Status of This Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/. - 1 - Table of Contents XHTML RDFa Modules This document is an internal editors draft for development purposes. However, its content is based upon mature materials from [XHTML2 [p.45] ] and is therefore considered nearly complete.
    [Show full text]