The Future of XML: How Will You Use XML in Years to Come?
Total Page:16
File Type:pdf, Size:1020Kb
The future of XML How will you use XML in years to come? Level: Intermediate Elliotte Rusty Harold ([email protected]), Adjunct Professor, Polytechnic University 05 Feb 2008 Elliotte Rusty Harold prognosticates what he thinks is in store for XML. The wheels of progress turn slowly, but turn they do. The crystal ball might be a little hazy, but the outline of XML's future is becoming clear. The exact time line is a tad uncertain, but where XML is going isn't. XML's future lies with the Web, and more specifically with Web publishing. It seems a little funny to have to say that. After all, isn't publishing what the Web is about? The Web was designed first and foremost as a mechanism to publish information. What else can it do? Quite a lot. The last three years have seen an explosion of interest in Web applications that go far beyond traditional Web sites. Word processors, spreadsheets, games, diagramming tools, and more are all migrating into the browser. This trend will only accelerate in the coming year as local storage in Web browsers makes it increasingly possible to work offline. But XML is still firmly grounded in Web 1.0 publishing, and that's still very important. Oneiromancy Several dreams are coming true this year. Sun's dream of network deployed applications is happening now, although shockingly the language of choice for these applications is JavaScript™, not Java™. This is a missed opportunity of the first order: Sun could have delivered this 10 years ago, but sadly it never had the experience, vision, or interest in the client to make it happen; now Sun is playing a desperate (and doomed) game of catch-up. Netscape's dream of replacing the operating system with a browser is also coming true this year. Netscape had the vision to see this coming. Unfortunately, it didn't have the business savvy or artistic taste necessary to pull it off. Nonetheless, Firefox and the Mozilla Foundation, both direct descendants of Netscape, are key players in bringing about this bold new world. For Microsoft®, the nightmare of a younger, nimbler competitor overtaking them is also coming true. The company was so distracted by Sun and Netscape that it failed to notice Google sneaking up on Office and Windows. GMail, Google Docs, and similar applications from a variety of sources are rapidly rendering the underlying operating system irrelevant. Sure, you still need an operating system to run a browser, but increasingly no one will care which operating system it is, any more than anyone in the last decade cared who manufactured their PC, as long as it ran Microsoft Windows®. Now no one will care who 1 From www.ibm.com/developerworks/xml/library/x-xml2008prevw.html1 3 February 2008 manufactures their operating system as long as it runs Google. Operating systems are being commodified, just as PCs were. The Windows monarch hasn't been defeated so much as abandoned, leaving Microsoft guarding the gates to an empty castle. What does XML have to do with this? For a much-hyped technology, XML has had little The evalJSon() option to do with this situation. Although the rebels sail under the Asynchronous JavaScript + XML (Ajax) banner, and although the x in Ajax stands for XML, Yes, I know about evalJSON(). no one uses XML much for any of this. Almost as Judging from some of the Ajax apps soon as the acronym was coined, Web developers I've seen, I'm more familiar with it than began replacing XML with raw JavaScript code and quite a few developers writing Ajax passing it around as data, and then executing it apps, who persist in using eval() years with eval() —security issues be damned. after better options became available. The problem is one of APIs, not data formats. More specifically, it's a problem with one API: Document Object Model (DOM). Most developers learn DOM first and then never learn any of the alternatives. They don't distinguish between DOM and XML, and thus they confuse their well-founded disgust with DOM with an unfounded disgust with XML. DOM isn't a least- common-denominator API: it's a worst-common-denominator API. You couldn't design a worse API for processing XML if you tried. But developers are extremely resistant to learning new things. Outside the Java community, where JDOM and dom4j have made some progress, better alternatives like E4X and the Amara XML Toolkit remain almost unknown and are actively resisted. The genius of JavaScript Serialized Object Notation (JSON) was also its biggest weakness. Because JSON is executable JavaScript code, it doesn't require JavaScript programmers to learn anything new to use it. A more secure data-transfer format wouldn't have been accepted. DOM is a millstone around XML's neck. It's the Frequently used acronyms single biggest impediment to broader XML adoption in software development. XML has • API: application programming gone as far as it can in programming while interface dragging this 2,000-pound boat anchor behind • HTML: Hypertext Markup it. Unless the World Wide Web Consortium Language (W3C) and browser vendors deprecate DOM • SGML: Standard Generalized and replace it with a sane alternative Markup Language (preferably several sane alternatives: trying to • W3C: World Wide Web do everything with one API is a large part of Consortium why DOM is as bad as it is), XML has run its • XML: Extensible Markup course in software development—especially Language Web software development (and increasingly, that's the only kind of software development that matters). The W3C should address the needs of working developers and deprecate a bad spec when required. Is XML dead? No, I believe that XML has a bright and important future. It just isn't a future that has much if anything to do with either classic or Web software 2 From www.ibm.com/developerworks/xml/library/x-xml2008prevw.html2 3 February 2008 development. To understand where XML is moving in 2008 and beyond, you have to first look back to 1997 and even earlier to find the origins of XML. The roots of XML You have to understand that XML was never meant to be used in software development—at least, not in the early days. None of the early specs—XML 1.0, XPath, Extensible Stylesheet Language Transformation (XSLT), Namespaces in XML, Extensible Hypertext Markup Language (XHTML), and DOM—focused on the needs of software developers. If XML had been designed for software development, it would have supported lists and maps and data types as JSON eventually did. XML was instead designed for publishing, and more specifically for publishing Web pages. XML was an outgrowth of a 20-year-older technology known as SGML. At roughly the same time Codd was at IBM® figuring out how to structure data by shredding it into tiny little unordered pieces, Charles F. Goldfarb, Edward Mosher, and Raymond Lorie were also at IBM figuring out how to structure large ordered documents that would never make sense as tables. Codd was thinking about business data like inventories and financial records. Goldfarb, Mosher, and Lorie were thinking about business documents like annual reports and airplane technical manuals. SGML was intended to solve publishing problems: how do you write, maintain, update, print, search, and read documents that may run to tens of thousands of pages across a variety of platforms with different processors, character sets, natural languages, operating systems, and vendors? SGML achieved some success with organizations in the government and military sectors that had these needs, and a few technical publishers like O'Reilly made occasional use of it; but overall it was too large and complex for most people's needs—even people in the publishing industry. SGML's biggest success was also its biggest failure: HTML. HTML was intended to be an SGML application, but almost none of the people who wrote browsers, editors, or Web pages knew anything about SGML beyond what the acronym stood for. (Many didn't even know that.) Extensions were introduced willy-nilly that rapidly degraded any claim HTML had to SGML conformance. Even the few and expensive SGML tools that then existed couldn't process the miasma of real-world HTML on the Web circa 1996. This was the situation XML was invented to rectify. On the one hand, it was supposed to simplify SGML down to one reasonable, standard subset everyone could agree on and faithfully implement. The hope was that this simpler specification could achieve the broader adoption that had eluded SGML. In this, XML mostly succeeded. On the other hand, XML was meant to lay the groundwork for a well-formed Web with fewer annoying cross-browser incompatibilities and idiosyncrasies In this, XML mostly failed. XML and XHTML just introduced yet another dialect of HTML that browsers would have to handle, without even coming close to replacing tag soup. Success or failure, XML was intended for publishing: books, manuals, and—most important— Web pages. XML wasn't optimized or planned for use in software development outside of publishing. Its use for config files, remote procedure calls, object serialization, database 3 From www.ibm.com/developerworks/xml/library/x-xml2008prevw.html3 3 February 2008 dumps, and similar developer-oriented tasks wasn't anticipated or planned for. Therefore it should come as little surprise that XML isn't always a perfect fit for these chores. Nonetheless, XML did offer developers something they never had before: a platform-independent, language- agnostic, internationally-savvy data format with numerous high-quality, free parsers easily available.