
Improving Web Search 14 New challenges and new techniques Product Briefs 15 XML • SVG • Smart cards • PDA security Semantic Resources 15 Test suites • Validators • Tools • Resources

browser, thing — that perhaps choices along the WWW10 in Asia or one of several plug-in viewers.) road thus far were not always optimal. The foundations of XML are in place W3C’s ability to solve these issues is With broadband access available to 95 now that namespaces and schemas are limited by resources (their annual percent of households and all busi- ready, but XML remains incomplete. operating budget is about US$7 mil- nesses, Hong Kong seemed an appro- Query functions are still needed, says lion), but some very real questions priate choice for the first Asian host to Berners-Lee, and things like XPointer, remain for the technical community as the International World Wide Web XInclude, and XLink are still under a whole. W3C specs should address conference series. WWW 10 started off development to fill in missing ele- such concerns, he said, but they can’t with a roar as two lion dancers wel- ments. XML also needs mechanisms for solve the digital divide. Those issues comed the 1,200 guests to the rhythm privacy, security, quality of service, and are also discussed within the IETF and of drums and traditional instruments handling binary attachments. More- Internet Society (ISOC). off stage. Over the days that followed, over, he said, the current spec is com- Berners-Lee closed his talk by con- attendees had access to more than 300 plex, and elements in it could be inter- gratulating W3C members for great presentations and papers. preted in conflicting ways. He thus progress toward establishing a solid The World Wide Web Consortium suggested revisiting the architecture foundation for the semantic Web — “the (W3C) has made much progress toward and processing model to remove clut- stuff of creating harmony and integra- fleshing out the “semantic Web” since ter so that XML can be implemented on tion.” But he finished with a warning last May’s Amsterdam meeting. In Hong PDAs, phones, and so on. about the dangers of “frivolous patents” Kong, working group members got to A lot was accomplished over the past that threaten open standards. show off phase-next specs and tools and year: XHTML is stable; SVG is making To help limit patent-related troubles, continue the ongoing dialogue. inroads in usage; and cascading style W3C asks its corporate members to sheets (CSS) and extensible stylesheet waive royalties before they are allowed The Road Ahead language transformations (XSLT) are in into working groups. By supporting To kick off his opening keynote, W3C place for presentation. Some conver- open standards, Berners-Lee counseled, director Tim Berners-Lee officially gence questions remain, however, everyone will benefit far more from the released the XML Schema recommen- regarding things like methodologies for expanded user base than from propri- dation. With that done, he turned to mixing languages such as XHTML and etary claims on technologies that are the crowd and asked, “Are we done MathML, or CSS and XSLT. tied to specifications. yet?” as he followed a hyperlink in his Infrastructure questions are also presentation to a component diagram gaining importance as “always on” con- Semantic Overview showing the W3C technology road nectivity increases because of broad- Many working group members were map for reaching the full-fledged band and wireless access. What’s more, clearly excited about all the ongoing semantic Web (see opposite page). W3C’s concern with universal access work surrounding the semantic Web Many ellipses represented current, pro- issues pushes up against the so-called activities, which were frequently totyped, or finished work, but just as digital divide. Berners-Lee suggested described as being in the “fun stage” of many represented missing pieces. that if the Internet were created again development. As W3C metadata activ- Clearly, the work is not yet through. without the installed telecommunica- ity leader Ralph Swick said, “We want (Try viewing the “road map” scalable tions infrastructure found in the West, to actually build some stuff, like in the vector graphic using W3C’s Amaya the network might be a very different early days of the Web.”

12 JULY • AUGUST 2001 1089-7801/01/$10.00 ©2001 IEEE IEEE INTERNET COMPUTING Marketplace

Not surprisingly, most sessions at Device Independence user needs. A CC/PP profile uses the WWW10 focused on technologies like Web technologies are diverging in RDF model and syntax to define device RDF and the aspects of the XML puz- non-PC arenas with things like digital capabilities. RDF triples describe com- zle that fill in the pieces of Berners- TV, the wireless application protocol ponents and properties for hardware, Lee’s road map. W3C initiatives can be (WAP), and NTT DoCoMo’s iMode ser- software, and user preferences, and can broadly divided into three areas: vice, but W3C is trying to coordinate describe proxy behavior as well. and cooperate with external standards The CC/PP WG created a working information management (sharing organizations to minimize fragmenta- draft in March 2001 to define the struc- information in small units, in- tion of the Web. ture and recommend vocabulary, but the cluding descriptions for privacy The W3C device independence (DI) working group is leaving full vocabu- and so on), activity wants to develop general lary definitions to groups like the WAP process and workflow (semantics frameworks for network access regard- Forum, which adopted CC/PP in its user and issues like who can do what less of the devices in use. The activity profiles in 1999, and the 3GPP, which within a process), and is only a few months old now, but it incorporated the framework in 2000. trust and proof (ensuring data includes a number of development Hidetaka Ohto, chair of the DI WG, said integrity with digital signatures efforts including composite capabili- the current proposals aim to define a and other technologies). ty/preference profiles (CC/PP) and framework rather than a protocol, and voice browsers. the final details might actually be See the sidebar, “Semantic Resources,” worked out by the IETF if it comes to a for pointers to some of the tools and Profiles and Preferences protocol-level answer. A birds-of-a- utilities for incorporating W3C specifi- CC/PP will provide information to help feather meeting at the 48th IETF meet- cations into development processes. adapt content to different devices and ing included mention of CC/PP.

Full potential of the Web

Inter- Document Creative Agents business processing space

Service directory Trust Annotea Jigsaw service

Query Proof Amaya service

Universality Intranet XMLprococol Socially Logic and accessibility app2app enhanced appropriate © 2001 Tim Berners-Lee. Used with permission. Tim Berners-Lee. © 2001

Device Model actual XML Binary Reliable Services Digital rights Rule/query I18n WAI Service desc. vocab QoS Routing Security Independence business processes Protocol attachments messaging privacy management language

Separate form Flow XML Web Cool XML many... many... UDDI ebXML SOAP SMTP HTTP 1.1 WSDL P3P and content (paper trail) signature classification U.I. enhanced

Formatting e.g. Pi XSL T CC/PP DNS MIME TCP XML cannonicalization DAML+OI L MathML XHTML XForms PNG Multimodal SVG XML schema XML query objects calculus? SMIL

RDF XHTML 2D CSS IP XSV xlink DOM schem a validator integration

CSS XML RDF Voice infoset xpointer validator encryption

XML + namespaces


Processing Unicode model

W3C technology road map. In the end, all W3C activities are current Goal this prototypes X spec X in service to the top-level goal of reaching the semantic Web’s full prototyped LEAD subgoal software potential. Arrows indicate “how” things are implemented; following planned external them in reverse indicates “why” they exist (or should).View the full- v1 done size figure at X depends on this X influenced by this Legend

IEEE INTERNET COMPUTING JULY • AUGUST 2001 13 Department Improving Web Search Speech interfaces audience suggested adding online The information retrieval community The voice browser WG (http://w3. voting forms. has been wrestling with search issues for org/voice/) was founded in May 1999 In general, the ad hoc per-project the past 25 years or so, but the Web and is due for rechartering later this approach no longer seems sufficient for adds a new aspect over the relatively year with a clarified IP policy. Current managing the array of activities the controlled corpus of information in a efforts focus on designing a speech W3C has going. HTML WG chair Steven typical enterprise setting. On the Web, interface framework that could be used Pemberton said he is preparing to pro- data is primarily unstructured, remote, by people with disabilities as well as by pose a charter for a “horizontal” work- hyperlinked, and often dynamic. motorists and others who need hands- ing group, like the Web accessibility ini- free network access. The working group tiative (WAI), to protect users’ interests New Challenges has published requirements drafts for and help with interactivity coordination. The “Beyond Keyword Search” panel dis- LexiconML, which aims at pronuncia- He said NIST, IBM, and Sun have cussion at the 10th World Wide Web tion extensibility, and Natural Lan- already expressed support for the idea. Conference started with such existential guageML, which aims to go beyond When asked about conformance test- questions as,“What is the value of Web keywords to a richer natural language ing for XHTML and other specifications search?” and turned to some technical comprehension that can include sophis- to ensure interoperable implementations fundamentals like,“Is search a computa- ticated linguistic analysis techniques. and reduce fragmentation, Pemberton tional problem?” In the end, the panel Dave Raggett, technical lead for added that the consortium is about to worked through to some research top- W3C’s work on voice browsers, says launch a “Conformance and Quality ics that can improve the search experi- speech recognition technology now Assurance Activity” ( ence for frustrated users. boasts greater than 90 percent accura- The activity will focus on ensuring that “Link analysis is a good start for boot- cy, especially with constrained entry (as W3C recommendations are correctly strapping semantics,but it is not enough,” with lists of choices). Voice interfaces are implemented. It should thus help address said moderator Prabhakar Raghavan (see currently used mainly for personal assis- concerns on adding and improving test the interview on enterprise portals with tants, voice portals, and front-end call suites, which Pemberton said “continu- him in this issue of IC Online at http:// centers, but future usage will be multi- ally haunt us,” not least because even the modal — matching voice and graphical WGs have trouble reaching consensus on agreed that semantic representations display — which first requires a dia- implementation details. should be made in both natural language logue-based interaction model. There are already test suites for SVG, and machine-usable computational VoiceXML is based on forms, but it the synchronized multimedia integra- descriptions.That means understanding includes tags for events and actions tion language (SMIL), MathML, CSS, user needs as well as the relationships that can be used for voice interfaces to and the DOM activity is building them between documents, which reside in a applications. The current view of the now for all three levels — although one “free-for-all” space on the Web. architecture includes a VoiceXML audience member noted that different gateway between the PSTN/VoIP net- tests for DOM level 1 interpret things New Techniques work and the Web site, which will also differently. The SVG conformance suite Leveraging metadata is vital to the effort have a front end that feeds audio files implementation, which actually came to mine the diffuse underlying structure and parses grammars. from outside the WG, was released at because the keyword interface is too the same time as the original spec, and weak to leverage the inherent relation- W3C Town Meeting a new version is due out soon. al structure that exists on the Web. On the final day of the regular confer- AltaVista’s Andrei Broder suggested ence, a panel of W3C staffers answered Conclusion that next-generation search techniques questions about W3C policies and As modern living becomes ever more will use semantic context analysis and activities at an open-microphone ses- reliant on Internet-based applications, cross-information integration to help sion. The discussion turned at one the specifications that rule the Web are determine “the need behind the query.” point to how W3C prioritizes its activ- increasingly important to the interna- John Lowe of AskJeeves believes ities, and the panel admitted that there tional community. W3C is reaching out advances in search will come from the aren’t really any set rules. to the developer community by pub- query side. “Dialog management will The activity process starts when lishing implementation reports and bring dramatic improvements using lin- the W3C team or members propose a adding more public areas to its Web site. guistics and voice-recognition tech- topic. The member organizations vote For information on next May’s con- niques,” but task and content-dependent on whether to commit resources to ference in Honolulu, Hawaii, see evaluations must first be standardized to the topic, and the W3C director enable conversational searches. makes the final decision. Some in the — Steve Woods

14 JULY • AUGUST 2001 IEEE INTERNET COMPUTING Marketplace Semantic Resources XML Resources Update PDA Encryption IC Online’s XML resources page Certicom’s new movianCrypt software Test Suites / Validators ( has uses a password-based user log-in Cascading Style Sheets been updated to include the newest system and 128-bit Advanced Encryp- products, standards, and development tion Standard (AES) to encrypt data HTML Validation Service utilities in the ever-expanding XML stored on Palm PDAs running Palm space — from graphical user interfaces OS 3.0 or higher and Handspring Mathematical Markup Language (GUIs) and wireless technologies to Visors running Palm OS 3.1 or higher. Web services. The company says applications run Scalable (SVG) Web services are a new breed of unmodified, but data is encrypted on self-describing, self-contained modu- the fly as it is stored and decrypted as IBM XML Schema Quality Checker lar applications engineered to be plat- it is accessed. form-neutral, operate within heteroge- Certicom also shipped version 1.1 of neous environments, and interconnect its IPSec-based movianVPN client for Web Tools to create products and services on handheld devices. The software sup- Amaya browser demand. The applications are usually ports two-factor authentication mech- written in a platform-specific pro- anisms — such as the RSA SecurID Annotea annotation application gramming language and then net- token cards used in gateways from worked so that they launch and coor- Alcatel, Cisco Systems, and Nortel Net- DC-Dot Dublin Core editing tool dinate with each other in real time. works — for devices using both Palm Web services won’t really take off OS 3.5 and WinCE 3.0. HTML Tidy until they are attached to an intuitive Further information is available user interface and can be accessed from Jigsaw Web server across a wireless network. It’s a bold vision and one that still has a ways to Getting Smart Jena Java API for RDF go before being realized, but the basic In response to the disproportionate concepts are sound and the recent shift number of fraudulent Internet-based rdf/jena/index.htm toward service-based architectures and versus offline sales, MasterCard has Redland RDF application frame- lightweight, cross-platform clients has started several initiatives aimed at work (beta) moved the Internet a few steps closer end-to-end protection. The company to ubiquity. has introduced new merchant rules Other Resources — Lisa Rein and authorization schemes that use DARPA Agent Markup Language the secure electronic transaction 1.0 (SET) protocol and 3DSET, and users Dave Beckett’s RDF Resource Guide The Apache XML group has released can even get a virtual card just for the open-source Batik 1.0 (http://xml. Internet commerce. Bundled with Dublin Core Metadata Initiative Java-based toolkit IBM’s e-wallet, the virtual card for (SVG). The includes a security code and card IC’s XML Resources Pages toolkit comprises a set of modules for number but no physical card. generating, manipulating, transcoding, MasterCard has also initiated sev- Int’l Semantic Web Workshop and searching SVG images. For exam- eral smart-card programs throughout 30-31 July 2001, Stanford University, ple, Java-based applications can the Asia-Pacific region in the past http::// export graphics as SVG using Batik's five years. Most employ a public-key Ontology Inference Layer (OIL) SVG generator. infrastructure (PKI) and the Multos The Apache team plans to update operating system (http://www.multos. RDF Interest Group the toolkit to reflect modifications in com/), which can run multiple appli- SVG, which is currently a W3C Pro- cations on each card. Various cards RDF Interest Group “scratch pad” posed Recommendation. Batik 1.0 sup- are already being used by banks in 16 ports most of the static features in the Chinese cities, and the Taipei city RDF at SVG basic effectivity test suite; future government and the National Inter- releases should support dynamic SVG operability Project in Australia are RDF Site Summary (RSS) 1.0 behaviors, such as scripting and ani- using them for identification, track- mation with the synchronized multi- ing, transportation, e-cash, and sev- XML Cover Pages media integration language (SMIL). eral other applications.