STG XML Validation Form

STG's XML 1.0 Reference Validator STG's XML 1.0 Reference Validator Abstract This report examines why validation, and readily available validation facilities, are critical to the rapid dissemination and success of XML; it also introduces a new, public reference validator intended to help fill this niche. Table of Contents A Surprising Fact What's Wrong With Invalid XML? Valid XML and the DTD DTDs and STG's Validator Using the Validator (aka "Quick Start") Inside the Validator Availability Note: This report was written originally in October of 1998 - at a time when there were no complete, working, web-available XML validators. Since then, in addition to STG's validator, one other web-available validator has appeared (author: Richard Tobin). Others will doubtless follow. A Surprising Fact With all the hubbub surrounding XML lately - all the conferences, debates, books, papers, and articles - it is a surprising fact that only a small fraction of the XML available on the net is actually valid; i.e., only a small fraction of it follows the full February 1998 W3C XML 1.0 spec. The reason for this is simple: There isn't much XML software (as yet) to adequately generate and check it. Nor are there any full, working, web-based XML validation services analogous what we see in the HTML world. Access to validation services, however, is critical to the success of XML because without it we end up back where we started, i.e., back to the very same chaos that prompted the development of XML in the first place. In efforts to help reduce the chaos, and make validation facilities more broadly available, Brown University's Scholarly Technology Group (STG) has placed on its website a public reference XML 1.0 validator. This report examines the rationale behind that validator, and offers a brief semi-technical overview of its design. What's Wrong With Invalid XML? http://cds.library.brown.edu/service/xmlvalid/Xml.tr98.2.shtml[6/20/14 12:22:39 AM] STG's XML 1.0 Reference Validator The ubiquity of invalid XML documents (or, more broadly, our inability to detect them easily as such) presents a serious obstacle to the rapid dissemination and success of XML because it perpetuates the same interoperability problems that have hampered the development of XML's cousin, HTML. As most Web designers and programmers are well aware, nonconformant HTML (i.e., HTML that fails to validate against an IETF or W3C standard) is, in many quarters, more the rule than the exception. Nonconformant HTML, though, often works out in practice because browser manufacturers, in addition to creating their own HTML extensions, have managed to work around most of the mistakes that programmers and authors typically make. But the manufacturers can't anticipate every possible mistake; and neither can every piece of software we use with our HTML. As a result HTML software is something of a free-for-all. Some software works fine with some HTML. Other software breaks on the same material. The fundamental reason why HTML software has become such a free-for-all is that HTML began its life with no formal specification. Worse yet, when formal specifications finally did begin to appear, they came too slowly to be of much use to Web designers and programmers. As a result, every browser manufacturer felt obligated to define its own version of HTML. Microsoft and Netscape also felt it necessary to hire armies of programmers to figure out what their competitors were doing. The result has been a dramatic increase in the cost and complexity of HTML processors - and an interoperability nightmare. Valid XML and the DTD With XML ("Extensible Markup Language"), the situation is potentially quite different from what we have seen with HTML. With XML we don't have to worry as much about browser manufacturers arbitrarily redefining the specs. Nor do we have to wait for standards bodies to reach consensus. With XML, each of us has the power to take matters into our own hands; to define our own markup language, or to extend an existing one - and to decide what is, and isn't, a valid construct in that language. What is more, we can do all this in a way that conforming XML processors will understand. In other words, we can do it without creating the same interoperability problems that have dogged HTML. http://cds.library.brown.edu/service/xmlvalid/Xml.tr98.2.shtml[6/20/14 12:22:39 AM] STG's XML 1.0 Reference Validator The mechanism through which XML grants us these powers is the document type definition (DTD) - a document that specifies what elements, attributes, and entities an XML document instance may consist of, and in what order and combination. With a DTD (and a stylesheet) users have close to total control over the language and presentation of their documents. (Although HTML has official DTDs, they are controlled by standards organizations, are rarely used, and often do not reflect actual practice.) DTDs and STG's Validator Despite the freedom that XML DTDs can give us, there is, as yet, little software that allows anyone to take advantage of them. Most XML processors available now essentially ignore the DTD. And of those that do full (DTD-aware) validation, only one, as of this writing (Oct 98), is available freely over the Internet (I have not yet managed to get that validator, based in Korea, to work). See Robin Cover's definitive XML testing and validation resource list. The absence of a full, working, publicly available XML reference validator creates a critical gap, especially now that consortiums have begun popping up everywhere, defining their own XML-based formats, and laying claim to its platform independence and interoperability. Without widely available validation facilities these claims are null because there is no way to verify, or enforce, actual conformance. Perhaps not surprisingly, even an informal check of actual and proposed XML interchange formats reveals that most do not reflect valid XML 1.0 constructions. Some are so far from the spec that one wonders how anyone could call them XML. Until there is a publicly available reference XML validator people can point to, it will be difficult to stem the tide of this faux XML, and to get down to the business of creating genuinely interoperable formats, and field testing the XML processors that are to operate on them. It is in efforts to fill this need for an XML reference validator that the Brown University Scholarly Technology Group (STG) has placed on its website a simple form-based XML 1.0 validation system. http://cds.library.brown.edu/service/xmlvalid/Xml.tr98.2.shtml[6/20/14 12:22:39 AM] STG's XML 1.0 Reference Validator Using the Validator Using STG's XML validator is easy. Just go to the Web form, and either type in a local filename, or paste some actual XML into its text field; then click on the validate button. The validator will then either respond with a "validates OK" message, or else output a list of error and warning messages. Inside the Validator The overall design of STG's system is tripartite. It is a familiar design common to many "traditional" web-based interfaces. It consists of: 1. a static HTML form 2. a short (500 line) PERL script 3. a back-end written with stock programming utilities (e.g., YACC and Lex) The back end (component 3 above) is written specifically for legacy computer systems that lack intrinsic library support for Unicode and that may even have old-style SGML catalogs around. It validates at a rate of about ten seconds a megabyte on an old dual 125mhz HyperSparc 20 server, about four seconds per megabyte on a Pentium Pro 200 desktop. For more information on the back end, see its Unix man page. The PERL script (component 2 above) is something of a bottleneck, but it uses the now nearly universal CGI interface, and has the advantage of being portable and easy to maintain. The same might be said of the static HTML form (1 above), which provides a simple, effective, maintainable entry point into the system. Obviously it would be nice to have an XML-based entry point, but the software is not yet available to support this. Availability The reference validator's back end has just finished a brief in-house alpha testing, and the system as a whole is now ready for public access on STG's main website: http://cds.library.brown.edu/service/xmlvalid/xmlvalid.var We consider the system to be in beta testing now, and we invite bug reports. (Doubtless there will be more than a few of these.) The source code for the parser is available at STG's website, as are binaries for a few platforms. Please direct questions or comments on the system, or on any of the issues surrounding its release, to the STG staff (address below). STG: [email protected] http://cds.library.brown.edu/service/xmlvalid/Xml.tr98.2.shtml[6/20/14 12:22:39 AM] STG XML Validation Form XML Validation Form To validate a small XML document, just paste it into the text field below and hit the validate button. If the document is too large to be conveniently pasted into the text field, enter its filename into the local file field. You may also validate an arbitrary XML document on the Web by typing its URI into the URI field. For more instructions, see below. See also the FAQ. Local file: no file selected Suppress warning messages Relax namespace checks URI: Suppress warning messages Relax namespace checks Text: Suppress warning messages Relax namespace checks Instructions http://cds.library.brown.edu/service/xmlvalid/[6/20/14 12:22:33 AM] STG XML Validation Form This interface offers full XML 1.0 validation facilities.

STG XML Validation Form

About:Config .Init About:Me About:Presentation Web 2.0 User

Database Globalization Support Guide

Plain Text & Character Encoding

DICOM PS3.5 2021C

Fun with Unicode - an Overview About Unicode Dangers

Automatic Detection of Character Encoding and Language

Web Internationalization

Sphider-Plus Manual

Hitachi Ops Center V10.6.1 Open Source Software Packages

Geonetwork User Manual (PDF)

Dataparksearch Engine 4.50: Reference Manual Copyright © 2003-2008 by OOO Datapark Copyright © 2001-2003 by Lavtech.Com Corp

Naming Text Encodings to Demystify Them