XML Family of Languages Overview and Classification of W3C Specifications
Total Page:16
File Type:pdf, Size:1020Kb
XML Family of Languages Overview and Classification of W3C Specifications Airi Salminen 05 February 2003 This version: http://www.cs.jyu.fi/~airi/xmlfamily-20030205.html Latest version: http://www.cs.jyu.fi/~airi/xmlfamily.html Previous version: http://www.cs.jyu.fi/~airi/xmlfamily-20021022.html Table of Contents 1. Introduction 2. Classification of the Languages 3. XML 4. XML Accessories 5. XML Transducers 6. XML Applications 6.1 Non-textual Data 6.2 Web Publishing 6.3 Semantic Web 6.4 Web Communication and Services About this report 1. Introduction XML is a markup language for presenting information as structured documents. The language has been developed from SGML (Standard Generalized Markup Language, ISO 8879) as an activity of the World Wide Web Consortium (W3C). Within W3C there is going on a number of other XML-related language development activities where the intent is to specify syntactic and semantic rules either for some specific kind of XML data or for data to be used together with XML data for a specific purpose. Together with XML, we call this group of languages the XML family of languages. The purpose of this report is to give a concise overview of the languages in the family and the current state of their development at W3C. The document introduces a classification for the languages and also serves as a portal to the specifications of the languages. Results of W3C development activities are published as W3C Technical Reports. The process of developing technical reports is described in the W3C Process Document. This overview is based on the analysis of current technical reports of four types: Working Drafts, Candidate Recommendations, Proposed Recommendations, and Recommendations. The four types differ in their maturity from lower to higher: · A Working Draft (WD) represents work in progress, it is a draft document and may be updated, replaced or obsoleted by other document any time. · A Candidate Recommendation (CD) has received significant review from its immediate technical community. The document is an explicit call for implementation and technical feedback. · A Proposed Recommendation (PR) represents consensus within the group that produced it and has been proposed by the Director to the Advisory Committee for review. · A Recommendation (R) represents consensus within W3C. W3C makes every effort to maintain its Recommendations (e.g., by tracking errata, providing testbed applications, helping to create test suites, etc.) and to encourage widespread implementation. The practice in W3C is to collect all known errors in a Recommendation into an errata document referred to in the Recommendation. 2. Classification of the Languages Considering the purpose of the XML-related languages developed at W3C, four main categories can be identified. The first category consists of the different versions of XML itself. XML is intended for representing information as structured documents on the Web and for defining special languages for special purposes. The other three categories are called XML Accessories, XML Transducers, and XML Applications: XML Accessories are languages which are intended for wide use to extend the capabilities specified in XML. Examples of XML accessories are the XML Schema language extending the definition capability of XML DTDs and the XML Names extending the naming mechanism to allow in a single XML document element and attribute names that are defined for and used by multiple software modules. XML Transduces are languages which are intended for transducing some input XML data into some output form. Examples of XML transducers are the style sheet languages CSS and XSL intended to produce an external presentation from some XML data and XSLT intended for transforming XML documents into other XML documents. A transducer language is associated with some kind of processing model which defines the way output is derived from input. XML Applications are languages which define constraints for a class of XML data for some special application area, often by means of a DTD. Examples of XML applications are MathML defined for mathematical data or XML- Signature intended for digital signatures. XML accessories and XML transducers are often XML-based languages and thus also XML applications. In this report a language is however classified as an XML application only if it has not been included in the accessories or transducers. The languages in the XML applications category can be further divided into four subcategories according to the application area: l Non-textual forms of data like mathematical data or voice. l Web publishing, to replace HTML by XML-based representation format. l Semantic web. l Web communication and services. The following sections introduce the languages according to the classification given above. The sections include tables listing the specification documents and those W3C Technical Reports which are closely related to the specifications. In the tables there are links to the specifications and other reports as they were at the date of this report. In cases were the target of a link in this overview document would outdated, a link to the latest version of the W3C document is provided in the target. The tables also show the current phase of the specification process (WD = Working Draft, CR = Candidate Recommendation, PR = Proposed Recommendation, or R = Recommendation). As a reminder of the emergent nature of the W3C specifications and their continuing redevelopment, the links to Recommendations (R) are associated with links to their errata documents. It has to be noticed that all specifications described by Working Drafts (WD) are work in progress and any changes in them may happen. [Introduction | XML | XML Accessories | XML Transducers | XML Applications] 3. XML The XML development started in 1996. The use of HTML (HyperText Markup Language) as the publishing language of the Internet had quickly expanded in the begin of 1990’s. The capabilities of the HTML to encode information were however very limited and there was a need to find an agreement about a generic markup language straightforwardly usable over the Internet. SGML (Standard Generalized Markup Language), published as an ISO standard in 1986, had been widely accepted as a generic markup language for digital documents, but the large collection of rules in SGML and the number of different optional features caused problems in the implementation and utilization of SGML. The goal in the XML development was to restrict the rules of SGML and thus to ease the writing of programs for processing documents. The first W3C Recommendation for XML 1.0 was published in February 1998, the second in October 2000. The Second Edition of XML 1.0 incorporates the changes dictated by the first edition errata. The second edition does not specify a new version of XML. Work has started for XML version 1.1 (formerly known as XML Blueberry). Table 1 includes links to the different XML specifications and also to those W3C documents which describe an abstract model for XML documents. Table 1. Specifications for XML Document, Phase (R, PR, CR, WD), Month, Year - Extensible Markup Language (XML) 1.0, R, Feb. 1998 XML 1.0 Specification Errata - Extensible Markup Language (XML) 1.0 (Second Edition), R, Oct. 2000 XML 1.0 Second Edition Specification Errata - XML 1.1, CR, Oct. 2002 - XML Blueberry Requirements, WD, Sept. 2001 Abstract models for XML documents: - XML Information Set, R, Oct. 2001 - XML Path Language (XPath) Version 1.0, R, Nov. 1999 - Document Object Model (DOM) Level 1 Specification Version 1.0, R, Oct. 1998 - Document Object Model (DOM) Level 2 Core Specification Version 1.0, R, Nov. 2000 - XQuery 1.0 and XPath 2.0 Data Model, WD, Nov. 2002 The XML specifications describe the concrete syntax of XML documents, and partially the behaviour of an XML processor, i.e., a software module used to read XML documents and to provide access to their content and structure. Slightly different abstract models for information available in XML documents have been introduced at W3C: · The XML Information Set specification defines an abstract data set called XML Information Set (Infoset). The definitions in the specification are intended for other specifications that need to refer to information in a well- formed XML document. · The XPath Data Model is included in the XML Path Language (XPath) specification to allow the specification of addressing parts of an XML document. · DOM (Document Object Model) is an application programming interface for XML and HTML documents. It defines the way data in a document is structured, accessed and manipulated. The DOM Level 1 Specification was published in 1998, the DOM Level 2 specifications published in November 2000 extend and update the Level 1 specification. The Level 2 consists of five parts: Core, Views, Events, Style, and Traversal and Range. The underlining data structure of XML documents is in the Core specification. The specification of the DOM Level 3 Core has started. · XQuery 1.0 and XPath 2.0 Data Model is intended to define the information contained in the input to an XSLT or XQuery processor. All of the four models describe an XML document as a tree structure but there are differences in the trees and in the information available in the trees. XML is intended to be universal format for data on the Web. To support references to Internet resources, the use of different character sets, and the use of different natural languages of the world, the XML specification uses a set of specifications introduced by other development authorities than W3C. These specifications are listed in Table 2. Unicode has replaced several different character encoding systems by a uniform encoding where a unique number is provided for every character, to be used in different platforms, by different programs, and for different languages of the world. In principle, it allows data to be transported through different systems without corruption.