Describing Namespaces with GRDDL
Total Page:16
File Type:pdf, Size:1020Kb
Describing Namespaces with GRDDL Erik Wilde, ETH Zurich¨ (Swiss Federal Institute of Technology) ABSTRACT combine the human-readability of HTML documents which Describing XML Namespaces is an open issue for many users machine-readable semantics, the Resource Directory Descrip- of XML technologies, and even though namespaces are one tion Language (RDDL) was invented. RDDL used the un- of the foundations of XML, there is no generally accepted popular XLink standard, and thus the Gleaning Resource and widely used format for namespace descriptions. We Descriptions from Dialects of Languages (GRDDL) [3] lan- present a framework for describing namespaces based on guage was developed, which defines a way how to expose the GRDDL using a controlled vocabulary. Using this frame- machine-readable information as RDF [5]. work, namespace descriptions can be easily generated, har- The GRDDL model assumes that the machine readable vested and published in human- or machine-readable form. information of an XHTML page is extracted by using an XSLT program that transforms the GRDDL into RDF state- ments. Even though this adds little to the expressiveness 1. DESCRIBING SCHEMAS of RDDL, it is a little bit easier to handle (because the While an XML Schema describes constraints for a class machine-readable information can be encoded at the users of XML documents, but it does not convey any information discretion, as long as it can be transformed into RDF), and about the semantics. For most application scenarios, the may gain more popularity because it uses RDF rather than semantics of a schema are described in some informal way, XLink. in most cases using simple prose. It would be useful to have A namespace description is a set of machine-readable in- some standard mechanism how to associate this human- formation about how other resources are related to the name- readable description with the XML Schema in a standard space, such as schemas defining the namespace’s vocabulary, way. There is no established way for doing this, and there or documentation for the namespace application. In order to are two main approaches to solve this problem: (1) It is pos- make GRDDL work, this vocabulary of how linked resources sible to embed additional information within the schema. related to the namespace must be well-known. (2) Since the targetNamespace of a schema defines a name- space name according to the XML Namespaces [2] recom- 2.1 Description Roles mendation, it is possible to associate additional information GRDDL namespace descriptions serve as a supplement to for a schema through the namespace name. the documentation, providing machine-readable documen- While both approaches solve the problem, we chose to tation that can be used to compile a directory of schemas. adapt the second approach, because it better separates the To make this directory as rich as possible, a number of roles schema implementation from the schema description. Schema that must or can be used to describe schemas has to be de- descriptions as described here are generic enough to not only fined. These roles describe how a particular resource relates serve as XML Schema descriptions, but as descriptions for to the namespace being described. any vocabulary associated with a namespace name. There- The description roles thus constitute that fraction of name- fore, we subsequently refer to them as namespace descrip- space description that should be available in a machine- tions. The Web architecture [4] does not define a data readable way, so that it can be collected and processed. format for namespace descriptions, but recommends that Technically, the resource roles are defined in a simple XML namespace names should point to some kind of description. document, which serves as configuration for defining the GRDDL namespace description format described in the fol- 2. NAMESPACE DESCRIPTIONS lowing section, and for the RDF Schema used for the infor- Despite the fact that the Namespaces specification itself mation extraction process described in Section 3. does not require any resource to be available at a name- space’s URI, it is convenient if this is the case. The approach 2.2 Creating Descriptions of having HTML pages serving as namespace descriptions is Schema authors must create GRDDL descriptions for their useful for humans, but makes it very hard to process this schemas, and they must follow the guidelines requiring cer- information automatically. In many cases, it would be ben- tain kinds of roles to be present. However, schema authors eficial to have machine-readable namespace descriptions, be- are free how they create the GRDDL. While many choose to cause these could be collected and compiled into a database write their GRDDL by hand, others don’t want to “learn” of namespaces and related resources. GRDDL, even though it is very easy to learn. In an effort to create a namespace description language to For these users, we provide a simple XML Schema that can be used to capture all the information required for a GRDDL document, and an XSL Transformations (XSLT) Copyright is held by the author/owner. WWW 2005, May 10–14, 2005, Chiba, Japan. program to generate the GRDDL that then serves as name- ACM 1-59593-051-5/05/0005. space description. This schema for namespace descriptions is generated from the list of resource roles described in the able way of validating a schema that is tightly integrated previous section. with a host language, and that XSLT 1.0 does not provide A second schema — also generated from the list of re- any support for checking datatypes. The result of the XSLT- source roles described in the previous section — is required based validation is a report containing a list of warnings or for describing the attributes (this schema does not define errors raised during the validation process. any elements) that convey the machine-readable information within the GRDDL document. Some of the attributes are 4. PUBLISHING DESCRIPTIONS defined to appear on XHTML a elements, augmenting the After the harvesting and validation process, GRDDL doc- link with well-defined semantics. Other attributes appear on uments are processed using XSLT, which after joining the XHTML div elements and are used to enclose textual de- individual RDF graphs results in a single RDF graph de- scriptions. Using these attributes, namespace descriptions scribing all harvested namespace descriptions. This aggre- contain all the information that we want to make accessible gated set of descriptions is published as XML and XHTML. in machine-readable form. XHTML is published as heavily crosslinked pages, en- abling users to retrieve all the information present in the RDF using a regular browser. Using search features, it is xmlns:... possible to search for specific text in all literal informa- xsi:schemaLocation xsd:target .html .xml tion, so that access through the XHTML pages is provided Namespace through search-based retrieval as well. .xsd For users interested in a machine-readable description of the collected data, the data is published as XML. This XML .html uses more traditional structures using standard ID/IDREF references rather than being based on RDF. Even though GRDDL uses an RDF-based data model, it was decided that an application-specific XML Schema is better suited for representing the namespace descriptions. The reason for this is that RDF is not well-suited for processing it with This figure shows the complete model of how namespace XPath/XSLT, whereas a suitably designed XML can be pro- descriptions are used. We assume that the namespace de- cessed very simply by users with relatively little XPath or scription is generated, so that the actual GRDDL document XSLT experience. located at the namespace URI of the schema is an XHTML document generated by XSLT. The generated XHTML con- tains information about associated resources (the schema 5. CONCLUSIONS described, documentation, and transformations), and some Our namespace description approach implements the idea of these associated resources may be namespace descriptions of a light-weight Semantic Web, searching for the middle themselves (versioning and other dependencies). ground between the considerable effort necessary to create machine-readable descriptions for many details of an IT en- 3. HARVESTING DESCRIPTIONS vironment, and the absence of any machine-readable de- scription in the plain namespace handling defined by the As pointed out in Section 2, namespace descriptions are recommendation. GRDDL documents, which means they are XHTML with In an effort to implement the light-weight Semantic Web embedded semantic information. This makes automated as easily, standards-compliant and future-proof as possible, processing of namespace descriptions easy, because only the we employed a variety of Web technologies such as XML, semantic information has to be extracted. This task is best XML Schema, XSLT, GRDDL, and RDF. Using these tech- done by XSLT, which means that harvesting namespace de- nologies and combining them in the most efficient way en- scriptions means collecting GRDDL documents, and then abled us to implement what we consider to be a gap in the using XSLT to generate RDF from these documents. XML landscape of today without too much effort. As pointed out in Section 2.2, we do not require schema authors to generate their namespace description. They can generate it, in which case the schema and the XSLT for the 6. REFERENCES generation will guarantee an error-free namespace descrip- [1] Jonathan Borden and Tim Bray. Resource Directory tion. However, if the namespace description is generated by Description Language (RDDL), February 2002. hand, errors may be introduced, so that the harvesting also [2] Tim Bray, Dave Hollander, and Andrew Layman. Namespaces in XML. World Wide Web Consortium, needs to validate the harvested documents. Recommendation REC-xml-names-19990114, January 1999. Validity in our context means that the harvested descrip- [3] Dominique Hazael-Massieux¨ and Dan Connolly.