Parsing and Querying XML Documents in SML

Total Page:16

File Type:pdf, Size:1020Kb

Parsing and Querying XML Documents in SML Parsing and Querying XML Documents in SML Dissertation zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften am Fachbereich IV der Universitat¨ Trier vorgelegt von Diplom-Informatiker Andreas Neumann Dezember 1999 The forest walks are wide and spacious; And many unfrequented plots there are. The Moor Aaron, in1: The Tragedy of Titus Andronicus by William Shakespeare Danksagung An dieser Stelle mochte¨ ich allen Personen danken, die mich bei der Erstellung meiner Arbeit direkt oder indirekt unterstutzt¨ haben. An erster Stelle gilt mein Dank dem Betreuer meiner Arbeit, Helmut Seidl. In zahlreichen Diskussionen hatte er stets die Zeit und Geduld, sich mit meinen Vorstellungen von Baumautomaten auseinanderzusetzen. Seine umfassendes Hintergrundwissen und die konstruktive Kritik, die er ausubte,¨ haben wesent- lich zum Gelingen der Arbeit beigetragen. Besonderer Dank gilt Steven Bashford und Craig Smith fur¨ das muhevol-¨ le und aufmerksame Korrekturlesen der Arbeit. Sie haben manche Unstim- migkeit aufgedeckt und nicht wenige all zu “deutsche” Formulierungen in meinem Englisch gefunden. Allen Mitarbeitern des Lehrstuhls fur¨ Program- miersprachen und Ubersetzerbau¨ an der Universitat¨ Trier sei hiermit auch fur¨ die gute Arbeitsatmosphare¨ ein Lob ausgesprochen, die gewiss nicht an jedem Lehrstuhl selbstverstandlich¨ ist. Nicht zuletzt danke ich meiner Freundin Dagmar, die in den letzten Mo- naten viel Geduld und Verstandnis¨ fur¨ meine haufige¨ – nicht nur korperliche¨ – Abwesenheit aufbrachte. Ihre Unterstutzung¨ und Ermutigungen so wie ihr stets offenes Ohr waren von unschatzbarem¨ Wert. 1Found with fxgrep using the pattern ==SPEECH[ (==”forest”) ] in titus.xml [Bos99] Abstract The Extensible Markup Language (XML) is a language for storing and exchang- ing structured data in a sequential format. Though originally designed for the use in document processing, it is by now employed for representation of data in virtually all areas of computing, especially on the Internet. The basis of all XML processing software is an XML parser, which parses a document in XML syntax and exposes it as a document tree to the applica- tion. Document processing then basically reduces to tree manipulation. Mod- ern functional programming languages like SML and HASKELL with their tree- structured data-types are therefore well-suited for implementing such applica- tions. Nonetheless, the area of XML processing software is dominated by JAVA. Functional programming languages play a minor role, partly due to the lack of a complete implementation of an XML parser. One of the most important tasks in document processing is querying, that is the extraction of parts of a document that match a structural condition and are in a specific context. Due to the tree-like view of documents, querying XML can be implemented with techniques from tree language and automata theory. These techniques must, however, be adapted to the needs of XML. One spe- cific requirement is that even extremely large documents must be processed. It must therefore be possible to perform the querying algorithm in a single pass through the document, without the need of constructing a copy of the docu- ment tree in memory. This work is divided into two parts: The first part presents fxp, an XML parser written completely in SML. We describe the implementation of fxp and discuss our experiences with SML. We analyze the run-time behavior of fxp and com- pare it to XML parsers written in imperative and object-oriented programming languages. The second part presents an XML querying algorithm based on forest au- tomata theory. The algorithm locates all matches of a query in at most two passes through the document. For an important subclass of queries even a sin- gle run suffices. Moreover, we discuss the implementation of the algorithm based on fxp. For each of the two parts a separate, more detailed introduction is given. v Zusammenfassung XML (Extensible Markup Language) ist ein sequentielles Format zur Speiche- rung und Ubermittlung¨ strukturierter Daten. Obwohl es ursprunglich¨ fur¨ die Dokumentenverarbeitung entwickelt wurde, findet XML heute Verwendung in nahezu allen Bereichen der Datenverarbeitung, insbesondere aber im Internet. Jede XML-Dokumentenverarbeitungs-Software basiert auf einem XML- Parser. Der Parser liest ein Dokument in XML-Syntax ein und stellt es als Dokumentbaum der eigentlichen Anwendung zur Verfugung.¨ Dokumenten- verarbeitung ist dann im wesentlichen die Manipulation von Baumen.¨ Mo- derne funktionale Programmiersprachen wie SML und HASKELL unterstutzen¨ Baume¨ als Basis-Datentypen und sind daher besonders gut fur¨ die Implemen- tierung von Dokumentenverarbeitungs-Systemen geeignet. Um so erstaunli- cher ist es, dass dieser Bereich zum großten¨ Teil von JAVA-Software dominiert wird. Dies ist nicht zuletzt darauf zuruckzuf¨ uhren,¨ dass noch keine vollstandi-¨ ge Implementierung der XML-Syntax als Parser in einer funktionalen Program- miersprache vorliegt. Eine der wichtigsten Aufgaben in der Dokumentenverarbeitung ist Query- ing, d.h. die Lokalisierung von Teildokumenten, die eine angegebene Struktur- bedingung erfullen¨ und in einem bestimmten Kontext stehen. Die baumartige Auffassung von Dokumenten in XML erlaubt die Realisierung des Querying mithilfe von Techniken aus der Theorie der Baumsprachen und Baumauto- maten. Allerdings mussen¨ diese Techniken an die speziellen Anforderungen von XML angepasst werden. Eine dieser Anforderungen ist, dass auch extrem große Dokumente verarbeitet werden mussen.¨ Deshalb sollte der Querying- Algorithmus in einem einzigen Durchlauf durch das Dokument ausfuhrbar¨ sein, ohne den Dokumentbaum explizit im Speicher aufbauen zu mussen.¨ Diese Arbeit besteht aus zwei Teilen. Der erste Teil beschreibt den XML-Parser fxp, der vollstandig¨ in SML programmiert wurde. Insbesondere werden die Erfahrungen mit SML diskutiert, die wahrend¨ der Implementierung von fxp gewonnen wurden. Es folgt eine Analyse des Laufzeit-Verhaltens von fxp und ein Vergleich mit anderen XML-Parsern, die in imperativen oder objekt- orientierten Programmiersprachen entwickelt wurden. Im zweiten Teil beschreiben wir einen Algorithmus zum Querying von XML-Dokumenten, der auf der Theorie der Waldautomaten fundiert ist. Er findet alle Treffer einer Anfrage in hochstens¨ zwei Durchlaufen¨ durch das Do- kument. Fur¨ eine wichtige Teilklasse von Anfragen kann das Querying sogar in einem einzelnen Durchlauf realisiert werden. Außerdem wird die Imple- mentierung des Algorithmus in SML mit Hilfe von fxp dargestellt. vi Contents I An XML Parser Written in SML 1 Introduction 3 Einf ¨uhrung 4 1 Document Processing with XML 6 1.1 Introduction to XML ......................... 6 1.1.1 Elements, Attributes, and the Document Type . 7 1.1.2 Entities . 9 1.1.3 Public Identifiers and Catalogs . 10 1.1.4 Miscellaneous Markup . 11 1.1.5 UNICODE ........................... 12 1.1.6 Well-Formedness, Validity and Compatibility . 12 1.2 XML Software . 13 1.2.1 Areas of XML Processing . 15 1.2.2 Commercial software . 16 1.2.3 XML Application Interfaces . 16 1.2.4 The Information Set . 17 1.2.5 Further Characteristics of XML Parsers . 18 1.2.6 Existing XML Parsers . 19 2 Implementation of fxp 21 2.1 The Implementation Language – SML . 21 2.2 System Architecture . 23 2.3 The UNICODE Frontend . 24 2.3.1 Basic Types . 26 2.3.2 Character Classes . 28 2.3.3 Access to the File System and the Network . 29 2.3.4 Byte-Stream Operations . 31 2.3.5 Decoding into UNICODE ................... 32 2.3.6 Representing Encoded Files . 34 2.3.7 Auto-Detection of Character Encodings . 36 2.4 Errors and Options . 37 2.5 The Entity Manager . 39 2.5.1 Internal Entities . 39 2.5.2 External Entities . 40 2.5.3 Implementing Look-Ahead . 42 2.5.4 Other Important Entity Manager Functions . 43 2.6 The DTD Manager . 44 vii viii CONTENTS 2.6.1 Dynamic Dictionaries . 45 2.6.2 Data Types for Declarations . 46 2.6.3 The DTD structure . 49 2.6.4 Adding Declarations . 50 2.6.5 Processing Attribute Values . 52 2.7 The Parser Modules . 54 2.7.1 No Tokenization . 54 2.7.2 Syntax Errors and Error Recovery . 55 2.7.3 Parsing Simple Components . 57 2.7.4 Entity References . 59 2.7.5 Character Data in Content . 62 2.7.6 Validation of Element Content . 63 2.8 The Programming Interface . 65 2.8.1 Hooks - A Functional Variant of Event Handlers . 65 2.8.2 Functorizing the Parser . 68 2.8.3 Implications on the Implementation of the Parser . 69 2.8.4 Functor Dependencies . 70 2.8.5 Building Applications with fxp . 74 2.8.6 Implementing a Tree-Based Interface . 75 2.8.7 Catalog Support . 77 3 Analysis, Comparison and Discussion 79 3.1 Run-Time Analysis and Comparison . 79 3.1.1 Profiling the Parser . 79 3.1.2 Imperative Optimizations . 80 3.1.3 Comparison to XML Parsers Written in Other Languages 82 3.2 Discussion of the Implementation Language . 85 II Querying Structured Documents 87 Introduction 89 Einf ¨uhrung 91 4 Preliminaries 94 4.1 Regular Expressions and Word Languages . 94 4.1.1 The Berry-Sethi Construction . 95 4.2 Trees and Forests . 97 4.2.1 Relation to Ranked Trees . 98 4.2.2 Structural Induction . 99 4.2.3 Path Induction . 100 5 Regular Forest Languages 102 5.1 Forest Grammars . 102 5.2 Closure Properties of Regular Forest Languages . 104 5.3 Bibliographic Notes . 106 6 Forest Automata 108 6.1 Bottom-Up Forest Automata . 108 6.1.1 Regularity . 109 CONTENTS ix 6.1.2 Deterministic and Right-to-Left Forest Automata . 111 6.2 Pushdown Forest Automata . 112 6.2.1 Regularity . 115 6.2.2 Deterministic Pushdown Forest Automata . 116 6.2.3 Right-to-Left Pushdown Forest Automata . 119 6.3 Decision Problems . 120 6.3.1 Bottom-Up Forest Automata . 120 6.3.2 Pushdown Forest Automata . 121 6.4 Matching Structural Conditions . 123 6.4.1 Matching Structure with Bottom-Up Automata . 123 6.4.2 Matching Structure with Pushdown Automata . 126 6.5 Bibliographic Notes . 130 7 Locating Matches in Context 132 7.1 Contextual Conditions .
Recommended publications
  • 73 Metadata Interchange in Service Based Architecture
    UDC:681.324 Review paper METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE Alma Butkovi Tomac Nagravision Kudelski group, Cheseaux / Lausanne [email protected] Dražen Tomac Cambridge Technology Partners, Genève [email protected] Abstract: Metadata is important factor for understanding, controlling and planning in complex information systems both from technical and business perspective. However its usage is often limited due to opposing or non-existent standards. The article discusses current state of meta model standards and shows new possibilities for meta data interchange based on XML as de facto standard for data interchange and increasing acceptance of web services architecture. Keywords: metadata, web services, XMI, CWM. 1. INTRODUCTION Term “meta” comes from old Greek work representing after, next, with; Latin word denoting something transcendental or beyond nature. Applying the prefix meta denotes more comprehensive or transcendent then the base itself. Meta data transcends the data in a way that it describes or references more than individual data instance: it provides domain, format, context of the data, data source, business related rule etc. Most common short definitions on metadata say: Meta data is data about data. This definition does not describe meta data in its whole scope. To quote [4]:” Meta data is all physical data (contained in software and other media) and knowledge (contained in employees and various media) from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation.” 2. METADATA SIGNIFICANCE Although metadata significance is widely recognized and important in information systems, it gains on its importance for data warehouse projects, business intelligence and semantic web.
    [Show full text]
  • Unicode and Code Page Support
    Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical
    [Show full text]
  • Extensible Markup Language (XML) 1.0 (Fifth Edition) W3C Recommendation 26 November 2008
    Extensible Markup Language (XML) 1.0 (Fifth Edition) W3C Recommendation 26 November 2008 This version: http://www.w3.org/TR/2008/REC-xml-20081126/ Latest version: http://www.w3.org/TR/xml/ Previous versions: http://www.w3.org/TR/2008/PER-xml-20080205/ http://www.w3.org/TR/2006/REC-xml-20060816/ Editors: Tim Bray, Textuality and Netscape <[email protected]> Jean Paoli, Microsoft <[email protected]> C. M. Sperberg-McQueen, W3C <[email protected]> Eve Maler, Sun Microsystems, Inc. <[email protected]> François Yergeau Please refer to the errata for this document, which may include some normative corrections. The previous errata for this document, are also available. See also translations. This document is also available in these non-normative formats: XML and XHTML with color-coded revision indicators. Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
    [Show full text]
  • A Logic-Based Service for Verifying Use Case Models
    COMPUTATION TOOLS 2017 : The Eighth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking A Logic-based Service for Verifying Use Case Models Fernando Bautista, Carlos Cares Computer Science and Informatics Department, University of La Frontera (UFRO) Temuco, Chile Email: [email protected], [email protected] Abstract—Use cases are a modeling means to specify the required modular and scalable solution in order to initially implement use of software systems. As part of UML (Unified Modeling some types of verifications and then another group of them Language), it has become the de facto standard for functional under an incremental development. specifications, mainly in object-oriented design and development. In order to check these models, we propose a theoretical solution In this paper, we present a first Prolog prototype, as proof by adapting a general quality of models framework (SEQUAL), of concept, of a tool that can aid the quality assessment of and, following our approach, a rule-based solution that includes a use case diagram. Moreover, we have implemented it as a both expert-based and definition-based rules. In order to promote Web service in order to illustrate that logic-based solutions can a distributed set of quality assessment services, a Web service has also be part of key quality assessment process in cloud-based been developed. It works on XMI (XML Metadata Interchange) software development environment. files which are parsed and verified by Prolog clauses. In order to check use case models and their application, Keywords–Rule-based quality; UseCase verification; Logic- based services; XMI; Prolog.
    [Show full text]
  • Using Tree Automata and Regular Expressions to Manipulate
    Using Tree Automata and Regular Expressions to Manipulate Hierarchically Structured Data Nikita Schmidt and Ahmed Patel University College Dublin∗ Abstract Information, stored or transmitted in digital form, is often structured. Individual data records are usually represented as hierarchies of their elements. Together, records form larger structures. Information processing applications have to take account of this structuring, which assigns different semantics to different data elements or records. Big variety of structural schemata in use today often requires much flexibility from applications—for example, to process information coming from different sources. To ensure application interoperability, translators are needed that can convert one structure into another. This paper puts forward a formal data model aimed at supporting hierarchical data pro- cessing in a simple and flexible way. The model is based on and extends results of two classical theories, studying finite string and tree automata. The concept of finite automata and regular languages is applied to the case of arbitrarily structured tree-like hierarchical data records, represented as “structured strings.” These automata are compared with classical string and tree automata; the model is shown to be a superset of the classical models. Regular grammars and expressions over structured strings are introduced. Regular expression matching and substitution has been widely used for efficient unstruc- tured text processing; the model described here brings the power of this proven technique to applications that deal with information trees. A simple generic alternative is offered to replace today’s specialised ad-hoc approaches. The model unifies structural and content transforma- tions, providing applications with a single data type.
    [Show full text]
  • Introduction to Unicode
    Introduction to Unicode Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Tutorial basics 2 2. Introduction 3 3. Unicode-based multilingual form development 5 4. Going interactive: Adding the Perl CGI script 10 5. Conclusions and resources 13 6. Feedback 15 Introduction to Unicode Page 1 Presented by developerWorks, your source for great tutorials ibm.com/developerWorks Section 1. Tutorial basics Is this tutorial right for you? This tutorial is for anyone who wants to understand the basics of Unicode-based multilingual Web page development. It is an introduction to how multilingual characters, the Unicode encoding standard, XML, and Perl CGI scripts can be integrated to produce a truly multilingual Web page. The tutorial explains the concepts behind this process and lays the groundwork for future development. Navigation Navigating through the tutorial is easy: * Use the Next and Previous buttons to move forward and backward. * Use the Menu button to return to the tutorial menu. * If you'd like to tell us what you think, use the Feedback button. * If you need help with the tutorial, use the Help button. What is required: Hardware and software needed No additional hardware is needed for this tutorial; however, in order to view some of the files online, Unicode fonts must be loaded. See Resources for information on Unicode fonts if you don't already have one loaded. (Note: Many Unicode fonts are still incomplete or in development, so one particular font may not contain every character; however, for the purposes of this tutorial, the Unicode font downloaded by the user will ideally contain all or most of the language characters used in our examples.
    [Show full text]
  • Xml Schema Production Rules
    Xml Schema Production Rules Protozoan Fremont sometimes advertized his administrations offhandedly and redeploy so unthinkingly! Polymorphic Sanders cogitate very continuously while Cesar remains effete and documentary. Vick still stabilises sneeringly while apprehensive Glenn recombines that unwisdom. Tax jurisdiction in discussions of the study description for each metadata for all xml schema and rural, credit that is used to xml schema production rules A mistake between the XML Schema Definition XSD lan- guage and text-based. XML schema files version 22 Search & Match the Share. XML schema validation using parsing expression PeerJ. To be used to observe data to ESMA in the production environment MiFIR Transparency RequirementsMiFIR introduces rules with respect to. In most common configurable options are a lot of this specification defines a new resource denoted by other variables such a necessary to give a new aggregate component. Introduction to XML Metadata Interchange XMI Eclipse Wiki. Keywords Schema validation XMLDTD parsing expression. Therefore obtain an xscomplexType element that is built using the rules described in. This is no approach followed by rule based XML schema languages which. ANNEX V Implementation Rules and Guidelines WIPO. It among a third-sizes-fit-third solution put a combination of business rules. All changes to the XML schema since the production of the 2006 normative. Fully integrated schema in a production environment and reduce processing time divide each. Databases needing to produce RSS feeds producing and storing valid. Not your computer Use offset mode to vocation in privately Learn more Next savings account Afrikaans azrbaycan catal etina Dansk Deutsch eesti. In production rule, then to offer options available for frb holidays.
    [Show full text]
  • OMG Meta Object Facility (MOF) Core Specification
    Date : October 2019 OMG Meta Object Facility (MOF) Core Specification Version 2.5.1 OMG Document Number: formal/2019-10-01 Standard document URL: https://www.omg.org/spec/MOF/2.5.1 Normative Machine-Readable Files: https://www.omg.org/spec/MOF/20131001/MOF.xmi Informative Machine-Readable Files: https://www.omg.org/spec/MOF/20131001/CMOFConstraints.ocl https://www.omg.org/spec/MOF/20131001/EMOFConstraints.ocl Copyright © 2003, Adaptive Copyright © 2003, Ceira Technologies, Inc. Copyright © 2003, Compuware Corporation Copyright © 2003, Data Access Technologies, Inc. Copyright © 2003, DSTC Copyright © 2003, Gentleware Copyright © 2003, Hewlett-Packard Copyright © 2003, International Business Machines Copyright © 2003, IONA Copyright © 2003, MetaMatrix Copyright © 2015, Object Management Group Copyright © 2003, Softeam Copyright © 2003, SUN Copyright © 2003, Telelogic AB Copyright © 2003, Unisys USE OF SPECIFICATION - TERMS, CONDITIONS & NOTICES The material in this document details an Object Management Group specification in accordance with the terms, conditions and notices set forth below. This document does not represent a commitment to implement any portion of this specification in any company's products. The information contained in this document is subject to change without notice. LICENSES The companies listed above have granted to the Object Management Group, Inc. (OMG) a nonexclusive, royalty-free, paid up, worldwide license to copy and distribute this document and to modify this document and distribute copies of the modified version. Each of the copyright holders listed above has agreed that no person shall be deemed to have infringed the copyright in the included material of any such copyright holder by reason of having used the specification set forth herein or having conformed any computer software to the specification.
    [Show full text]
  • Session 3: XML Information Modeling (Part I)
    XML for Java Developers G22.3033-002 Session 3 - Main Theme XML Information Modeling (Part I) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 1 Agenda Q Summary of Previous Session Q XML Physical Entities Q Logical Structure of XML Documents Q XML Document Navigation Q Java APIs Q Custom Markup Languages Q Readings Q Assignment #1b (1 week) Q Assignment #2a+2b (2 weeks) 2 1 Summary of Previous Session Q History and Current State of XML Standards Q Advanced Applications of XML Q XML’s eXtensible Style Language (XSL) Q Character Encodings and Text Processing Q XML and DBMSs Q Course Approach ... Q XML Application Development Q XML References and Class Project Q Readings Q Assignment #1a (reminder?) / Assignment #1b (1 week) 3 XML Physical and Logical Structure Q Physical Structure Q Governs the content in a document in form of storage units Q Storage units are referred to as entities Q See http://www.w3.org/TR/REC-xml#sec-physical-struct Q Logical Structure Q What elements are to be included in a document Q In what order should elements be included Q See http://www.w3.org/TR/REC-xml#sec-logical-struct 4 2 XML Physical Entities Q Allow to assign a name to some content, and use that name to refer to it Q Eight Possible Combinations: Q Parsed vs. Unparsed Q General vs. Parameter Q Internal vs. External Q Five Actual Categories: Q Internal parsed general Q Internal parsed parameter Q External parsed general Q External parsed parameter Q External unparsed general 5 Logical Structure: Namespaces Q See Namespaces 1.0 Q Sample Element: <z:a z:b="x" c="y" xmlns:z="http://www.foo.com/"/> Q Corresponding DTD Declaration <!ELEMENT z:a EMPTY> <!ATTLIST z:a z:b CDATA #IMPLIED c CDATA #IMPLIED xmlns:z CDATA #FIXED "http://www.foo.com"> 6 3 Logical Structure: DTDs Q Shortcomings Q Separate Syntax <!ELEMENT Para (#PCDATA)*> <Para>Some paragraph</Para> vs.
    [Show full text]
  • Custom XML Attribute
    ISO/IEC JTC 1/SC 34/WG 4 N 0050 IS 29500:2008 Defect Report Log Rex Jaeschke (project editor) ([email protected]) 2009-05-27 DRs 08-0001 through 08-0015 DRs 09-0001 through 09-0230 IS 29500:2008 Defect Report Log Introduction ........................................................................................................................................................................ 1 Revision History ................................................................................................................................................................... 3 DR Status at a Glance .......................................................................................................................................................... 5 1. DR 08-0001 — DML, Framework: Removal of ST_PercentageDecimal from the strict schema .............................. 6 2. DR 08-0002 — Primer: Format of ST_PositivePercentage values in strict mode examples .................................... 9 3. DR 08-0003 — DML, Main: Format of ST_PositivePercentage values in strict mode examples ........................... 12 4. DR 08-0004 — DML, Diagrams: Type for prSet attributes ................................................................................... 14 5. DR 08-0005 — PML, Animation: Description of hsl attributes Lightness and Saturation ..................................... 20 6. DR 08-0006 — PML, Animation: Description of rgb attributes Blue, Green and Red ........................................... 22 7. DR 08-0007 — DML, Main: Format
    [Show full text]
  • Taxonomy of XML Schema Languages Using Formal Language Theory
    Taxonomy of XML Schema Languages using Formal Language Theory MAKOTO MURATA IBM Tokyo Research Lab DONGWON LEE Penn State University MURALI MANI Worcester Polytechnic Institute and KOHSUKE KAWAGUCHI Sun Microsystems On the basis of regular tree grammars, we present a formal framework for XML schema languages. This framework helps to describe, compare, and implement such schema languages in a rigorous manner. Our main results are as follows: (1) a simple framework to study three classes of tree languages (“local”, “single-type”, and “regular”); (2) classification and comparison of schema languages (DTD, W3C XML Schema, and RELAX NG) based on these classes; (3) efficient doc- ument validation algorithms for these classes; and (4) other grammatical concepts and advanced validation algorithms relevant to XML model (e.g., binarization, derivative-based validation). Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design—Schema and subschema; F.4.3 [Mathematical Logic and Formal Languages]: Formal Languages— Classes defined by grammars or automata General Terms: Algorithms, Languages, Theory Additional Key Words and Phrases: XML, schema, validation, tree automaton, interpretation 1. INTRODUCTION XML [Bray et al. 2000] is a meta language for creating markup languages. To represent an XML based language, we design a collection of names for elements and attributes that the language uses. These names (i.e., tag names) are then used by application programs dedicated to this type of information. For instance, XHTML [Altheim and McCarron (Eds) 2000] is such an XML-based language. In it, permissible element names include p, a, ul, and li, and permissible attribute names include href and style.
    [Show full text]
  • Unicode Cree Syllabics for Windows and Macintosh
    Unicode Cree Syllabics for Windows and Macintosh 37th Algonquian Conference, Ottawa 2005 Bill Jancewicz SIL International and Naskapi Development Corporation ABSTRACT Submitted as an update to a presentation made at the 34th Algonquian Conference (Kingston). The ongoing development of the operating systems has included increased support for cross-platform use of Unicode syllabic script. Key improvements that were included in Macintosh's OS X.3 (Panther) operating system now allow direct keyboarding of Unicode characters by means of a user-defined input method. A summary and comparison of the available tools for handling Unicode on both Windows and Macintosh will be discussed. INTRODUCTION Since 1988 the author has been working in the Naskapi language at Kawawachikamach with the primary purpose of linguistic analysis and Bible translation, sponsored by Wycliffe Bible Translators and SIL International. Related work includes mother tongue translator training, Naskapi literature and curriculum development. Along with the language work the author also developed methods of production for Naskapi language materials in syllabic script by means of the computer. With the advent of high quality publishing capabilities in newer computers, procedures were updated to keep pace with the improving technology. With resources available from SIL International computer services department, a very satisfactory system of keyboarding syllabic texts in Naskapi was developed. At the urging of colleagues working in related languages, the system was expanded to include the wider inventory of standard Eastern and Western Cree syllabics. While this has pushed the limit of what is possible with the current technology, Unicode makes this practical. Note that the system originally developed for keyboarding Naskapi syllabics is similar but not identical to the Cree system, because of the unique local orthography in use at Kawawachikamach.
    [Show full text]