XML Support for Tcl
Total Page:16
File Type:pdf, Size:1020Kb
The following paper was originally published in the Proceedings of the Sixth Annual Tcl/Tk Workshop San Diego, California, September 14–18, 1998 XML Support for Tcl Steve Ball Zveno Pty Ltd For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL:http://www.usenix.org/ XML Support For Tcl Steve Ball Zveno Pty Ltd http://www.zveno.com/ [email protected] document. These elements are defined using a Abstract Document Type Definition (DTD), along with XML is emerging as a significant technology for use allowed general entities. The DTD includes rules on both the World Wide Web and in many other which govern what elements or text are allowed to application areas, such as network protocols. occur inside each element. In contrast, HTML gives Documents written in XML have a rich, hierarchical the author a fixed set of tags to use, and those tags structure, the document tree. An application which is have fixed semantics and behaviour. to process XML documents must be able to access and manipulate the document tree in order to be able XML has been designed with the goal of being easier to examine and change the structure. for tool developers to write software for processing documents, so that there will be plentiful applications The DOM is a language-independent specification of available for handling XML. The syntax is far more how an application accesses and manipulates the restricted than SGML, with most optional features document structure. TclDOM is a Tcl language removed. For example, tag minimisation and binding for the DOM. The TclDOM specification omission are not allowed so it is easy to detect the provides a standard API for Tcl applications to end of an element. Empty elements must be explicitly process a XML or HTML document. marked. TclXML is a Tcl package which provides a sample There are two levels of conformity for a XML implementation of TclDOM. It provides XML document. At the very least, a XML document must parsers along with the tools needed to create a be well-formed. A well-formed XML document hierarchical representation of documents which can adheres to all of the syntactic rules of the XML be conveniently processed by a Tcl script. There are specification. All elements must have an end tag, also facilities to check the validity of a document, unless they are empty elements in which case the tag along with commands to produce document output. must include a trailing slash, such as <BR/>. Element TclXML provides a framework for parser and attributes must have a value and the value must be validator modules which allows some or all of the quoted. There may be no occurrances of illegal various components to be implemented in an characters, and so on. XML has been designed so that extension language. a program can check that a document is well-formed without having to use the document's DTD. Keywords: Tcl, World Wide Web, WWW, XML, DOM, Parsing In addition to being well-formed, a XML document may also be valid. A valid XML document conforms A Brief Introduction To XML and DOM to the rules laid out in its associated DTD. This means that all elements have content which is XML allowed according to the element's definition of its content model in the document's DTD. All attributes The eXtensible Markup Language, XML [XML], is a used in elements must also be permitted for that Recommendation from the World Wide Web element and their value must be of the correct type. Consortium (W3C) [W3C] which is a significant All attribute values which are used for identification simplification of SGML, intended to make a subset must be unique, and so on. Validity is a much SGML suitable for use on the Web. XML provides a stronger assertion for a document than well- much more meaningful and rich way in which to formedness, but requires access to the document's represent semantic structure in a document when DTD and more processing time to check the various compared to HTML. With XML, a document author rules. can "call a Spade a <Spade>". That is to say, elements can be given names which are meaningful XML is internationalised and support for Unicode is to the author and convey more of the semantics of the a requirement of XML, so Tcl version 8.1 is well XAPI-J and SAXDOM. A language binding for Tcl positioned to be used for software applications which is discussed in this paper. handle and process XML documents. Other Proposed Standards The following document is a small example of a XML document instance: XML will never replace HTML because XML does not supply any of HTML's semantic features, in <?xml version="1.0"> particular the elements which provide presentational <!DOCTYPE paper SYSTEM "paper.dtd"> and/or behavioural semantics such as STRONG, EM, <paper> UL and FORM. HTML elements perform a function <title>TclXML Example</title> when they are used in a document, whereas XML <abstract>This is an example of a XML document instance. elements do not. XML elements only provide </abstract> syntactic declaration of a document's structure. Of <introduction>XML uses the course, this is what makes XML useful for a wider <acronym><short>DTD</short> range of applications than HTML. <full>Document Type Definition</full></acronym> In order to provide certain semantics for the elements to define classes of documents. used in a XML document other facilities are required. Such a document is known as a A number of languages have been proposed for these <definition>document purposes. The XML Link Language, XLL, provides instance</definition>. the hyperlinking model for XML. XLL has been </introduction> derived from TEI and HyTime and as a result will be <point>A DTD is used to define the a much richer model than HTML's. For example, elements and entities that are XLL will allow bidirectional and multi-destination allowed to appear in a document. links, as well as allowing link targets to specify and Empty elements have a trailing use the structure of the destination resource. In order slash: to render XML documents for viewing or printing a <figure image="figure-1.png"/> </point> stylesheet language will be used. Either Cascading <conclusion>XML is way better than Style Sheets (CSS) [CSS] will be used or the new HTML.</conclusion> proposed XML Stylesheet Language (XSL). XSL <references></references> will also include scripting. </paper> Tcl Support For XML DOM Tcl can be very useful for processing and generating The Document Object Model (DOM) [DOM] is a XML documents, just as Tcl is useful for handling language-neutral specification of a standard HTML documents in CGI scripts. There are many Application Programming Interface (API) for both applications which may be able to use a scripting accessing the features and the content of a document language for document processing, both in GUI and and creating or modifying documents. Documents are non-GUI modes and in standalone, server and client in the form of a tree and the DOM has methods of environments. accessing properties of the tree nodes. DOM provides a core set of features and specific features may be Since applications will have varying requirements, provided for certain markup languages. In the first different supporting libraries may be used to process instance, DOM is providing a specification of documents. Some of these may be implemented features for XML and HTML. The aim of the DOM entirely in Tcl, for cross-platform portability, some is to allow application developers to write software may use C or C++ extensions to improve which manipulates a document and to use whichever performance and some may interface with Java programming language is suitable for the task. The components. Ideally, there will be a standard same underlying constructs will be available in programming interface for applications written in Tcl whatever language the developer chooses. to access the document structure for processing. It is the goal of TclDOM to provide that standard DOM already has language bindings for Java and interface, based upon the DOM specification. ECMAscript. Many tools are being written in Java for processing XML, and standard interfaces are XML support in Tcl has two distinct parts. Firstly, starting to emerge for Java components, such as TclDOM is a language binding for the DOM. TclDOM provides an implementation-independent may need to access the document's tree structure specification of a Tcl API of the DOM for Tcl scripts while at the same time accessing the document's XSL to access and manipulate XML and HTML stylesheet, which is itself a XML document. documents. Secondly, TclXML is a sample implementation of TclDOM which provides XML Previous Work parsers and generators. Uhler's html_library Zveno is making the TclDOM specification and TclXML freely available. Both the TclDOM Stephen Uhler wrote an all-Tcl HTML parser called specification and the TclXML distribution may be html_library [Uhler95]. This library was found on the Zveno website [TCLXML]: sufficiently generalised internally to be easily http://www.zveno.com/zm.cgi/in- adapted for parsing XML documents. Indeed, the tclxml/ tokeniser which is part of the Tcl XML parser in TclXML is derived from html_library. TclDOM html_library translates a HTML or XML TclDOM is a Tcl language binding for the DOM. The document into a series of calls to a Tcl procedure. specification details how to access DOM features Hence, it is an event-based parser where the using Tcl constructs. The goals for TclDOM are: to application is notified of the start and end of provide access to all features of the DOM from a Tcl elements.