W3C on mobile, CSS, multimodal and more

A look at the upcoming standards by W3C

Bert Bos, W3C, December 2005

I saw in an entry on the 22C3 blog the question rather than replace is the recently started if this presentation and my other one on W3C Mobile Web Initiative (MWI), a project inside could be considered to be about hacking. The W3C in close cooperation with OMA. The goal author thought so. Working for W3C certainly is to develop guidelines and tests for how to does feel like hacking: hacking the future. write Web content that is accessible from desk- Working for W3C requires idealism, W3C top computers as well as from mobile phones. isn't a dot-com. It also requires political skills, W3C already had a Device Independence (DI) the only way to reach a long term goal is often Working Group, which is developing technol- by means of compromises and short-term solu- ogy (in particular mark-up and APIs) to facili- tions. And it requires creativity in finding sim- tate semi-automatic adaptation of content for ple solutions for novel problems. And isn't that various devices and situations. the essence of hacking? The CSS Working Group is also working In fact, W3C is a balancing act. It is an on adaptation, in so far as the adaptation is in industry consortium with a humanitarian ideal, the domain of style. The technologies by the viz., to let the whole world communicate eas- CSS and DI Working Groups are complemen- ily. It has to create short-term solutions in order tary. W3C believes mobile technologies will be to reach a long term goal. Sometimes people the next big step on the path to the “Semantic challenge us to choose: is W3C about evolu- Web,” so let's take a closer look at those tech- tion or revolution? In fact, the two aren't oppo- nologies as they are currently being developed. sites, they just happen to rhyme. Let me Adaptation is necessary to make optimal explain why. use of the features of each device and the pref- W3C is revolutionary in the sense that it erences of each user. The assumption is that wants to replace proprietary systems with open most information has an abstract essence (its standards; it wants to give the reader the power semantics) that can be represented in various to use information anywhere at any time; it ways. You display it differently on a big screen wants to create an information system that is than on a small one, you may want to replace a powerful and easy enough to make things pos- layout in four columns by a sequence of small sible that have never been possible before and pages, and if there is sound available or even a that nobody has even thought about yet. And it speech generator, some or all of the informa- wants all these things as soon as possible. tion may pass better in audio. Ideally, the user But W3C is also evolutionary, in that it agent (typically a browser) adapts the content wants to progress with small steps, which to the device and the user's preferences based allow for mistakes to be caught early and on the structure and metadata of that content, which allow existing systems to be integrated. but fully automatic adaptation doesn't yet work It would be a waste if adopting the Web meant very well and the author or somebody else losing everything that is not in HTML and on often has to add extra information to guide the an HTTP server. One of the “glue” systems of adaptation process. the Web is the URL (or URI, as it is now CC/PP is one of the technologies devel- known): Web resources can link to almost any- oped by the DI WG. It is a format for exchang- thing on the Web, it doesn't have to be on an ing device characteristics and user preferences HTTP server. between a client and a server. The format is An example of W3C's wish to integrate based on RDF (and thus on XML). It is exten- 12 Yes 208x320 Yes ISO-8859-1 ISO-8859-2 ... ... GSM_GPRS_IPV4 GSM_CSD_IPV4 ...

Figure 1. A fragment of a device profile in CC/PP with the UAPROF vocabulary.

sible and can be used for almost any kind of default, the URL can point to the version stored characteristics, but it is currently used mostly on the manufacturer's site. for UAPROF, a set of characteristics of mobile The server typically doesn't even have to phones defined by OMA. Most mobile phones fetch the profile when it recognizes a implement UAPROF. well-known URL. The goal of CC/PP is to provide the server The DI WG is finalizing DISelect, an with enough information about the device and XML-based format for storing alternative ver- the user to be able to adapt the content it sends. sions of documents in one file. The idea is that This is especially useful if the author has pre- alternatives typically only differ in small areas pared optimized alternative content for some so you can write most of the document once devices or when the device has limited capabil- and then have multiple branches where needed, ities of adapting content by itself. See figure 1 very much like “#ifdef” in C. The branches for an example profile. depend on conditions, which are typically Profiles can be sent from the client to the device characteristics such as those found in server, but typically they are stored somewhere CC/PP or in Media Queries (see below). Here and referred to by URL. It would be too ineffi- is a fragment (taken from the draft specifica- cient to send the profile with every request. If tion): the profile is unmodified from the factory css">

Figure 2. An example of HTML with Media Queries. The first line uses only the MEDIA attribute from HTML4.

vided by programming libraries on clients and

The flooding was quite that gives programs access to device character- extensive.

istices, the same characteristics that DISelect

hasProperty() and insertProp-

erty(). Via these calls, a program can, e.g., find out what the screen size is or the battery

Many people were evacuated from their homes.

charge and display a warning when the battery is low. This fragment expresses that the ele- Where DISelect is static, DCI is meant to ment (presumably some photo) should only be be dynamic. Programs can not only query and kept in the document if the document is going change device characteristics, they can also lis- to be displayed in a window of more than 200 ten to events that announce that characteristics pixels wide. This example uses a language that have changed. is a mixture of XHTML1 and DISelect and the Media Queries is a technology developed result of processing it with a DISelect imple- by the CSS WG. It consists of a small expres- mentation is thus an XHTML1 document. sion syntax and eleven keywords for device The expression in this case uses a device characteristics. It can be used in CSS, HTML characteristic ('width') defined by Media and XML whenever there are different style Queries (see below), but the syntax is mangled sheets available and the choice of style sheet a bit because expressions in this syntax have to depends on the type of device. Media Queries be compatible with XPath and are, moreover, are a generalization of the MEDIA attribute embedded in an XML file. and @media rule that have been in HTML, The DISelect format expresses what the XML and CSS for many years, figure 2 shows alternatives are for each type of client, but an example. doesn't limit where the adaptation is per- Compare this with the DISelect example formed. That depends on the protocols and the above, which used the same 'width' feature to situation. It may be done by the server, by a exclude an element from an XHTML proxy that a phone provider offers to its cus- file. In this case, the conditions are used to tomers, or by the client itself. determine which style sheets should be used. DISelect is only for static adaptation. The In principle, the Media Queries are DISelect document is processed once and dynamic, in the sense that a change in device yields a new document (an XHTML document characteristics, such as a change in window in the example above). The adaptation isn't size, may cause a different style sheet to be re-run when the device characteristics change. applied instantly. But browsers may have some DCI (Delivery Context Interfaces) is the limitations in how “dynamic” they are, espe- latest and least stable technology from the DI cially when the change requires new style WG. It is an API that should eventually be pro- sheets to be downloaded. The CSS Mobile Profile 1.0 (and there is corresponding Mobile Profile for XHTML1) is body { display: "aaa" "bcd" } not a technology for adaptation. It only defines #main { position: a } the baseline that all implementations must con- #weather { position: b } form to. The idea is that an implementation of #news { position: c } CSS on a handheld device should either imple- #adv { position: d } ment all of the profile or nothing at all. It may implement more, but CSS is big and it is Of course, layouts can be nested. Note espe- unlikely that any implementation implements cially that the order of the elements on the everything, at least not in the short term. screen is determined with the 'position' prop- W3C defined a profile a few years back erty and is thus largely independent of the with help from some mobile phone vendors. order in the source. When the screen is big Since then, the OMA has defined a new CSS enough, less important things can be put above profile, which differs in some details. At the or to the left of more important things, but on a moment, OMA and W3C are working together small screen you want to avoid that. on the next version, which should be both a Such layouts can be combined with Media W3C and an OMA standard. This is one of the Queries to provide several alternative layouts goals of the MWI project. for the same document. Web designers know that you can create The idea of One Web, i.e., access to the grid-like layouts with CSS, such as the very same information, no matter where, when or popular three-column layout found on many with what equipment, also has implications in home pages, but it is not as easy as it should other areas, such as in interaction. Desktop be. The properties of CSS each influence small computers typically have a mouse and a key- details of the layout and you need many of board, but phones don't, or if they have some- them to describe a high-level “grid.” When you thing similar, it it not as easy to use. On a see a style sheet for a particular layout, it is not phone you may want to use voice and gestures, easy to see what layout it describes. maybe also handwriting. W3C's Multimodal Designers who have to provide several Interaction (MMI) WG develops technology different layouts for the same content would for these alternative modes of interaction. In like there to be a way to specify the high-level fact, its goal is to enable switching interaction layout and replace that specification easily with modes in the middle of a session, even switch- another one, without touching other aspects of ing devices. Think of planning a route on your the style. This is especially true in design for smartphone and then having the board com- mobile devices, where the different screen puter of your car take over the interaction when sizes require content to be laid out and even you enter the car and store away the phone. ordered differently for each device. The MMI WG is developing InkML, a Providing such high-level layout descrip- common format for storing the traces you make tions in CSS is an old idea, first proposed in with a stylus, and EMMA, a format to annotate 1997, but at that time there was little interest. raw data with partial interpretations. EMMA is Recently, however, the DI WG asked the CSS used whenever interpreting user input requires WG to take up work on this again. The first multiple passes, possibly done by different pro- result is the CSS3 Advanced Layout draft that cesses on different devices. If you speak a has just been published. It is an early draft, but word into a voice recognition system, for a small example should give the idea of what it example, the raw sound undergoes several lev- tries to do. els of analysis to determine the set of possible Assume you have a “portal” page in words corresponding to the sound and the most HTML, with a big area (a) in the upper half likely words in the given context. EMMA is an and three blocks (b, c and d) side by side in the XML format in which you can store both the lower half. The following CSS rules describe raw data, the possible interpretations and the that layout. likelihood that each interpretation is correct. Boston Denver

Austin Denver

Figure 3. The EMMA file in this figure could be an intermediate result of a speech recognition sys- tem. So far, the system has identified two likely interpretations: “Boston” and “Austin”. The meta- data includes confidence values and time stamps.

Hopefully, at the end of the processing pipeline there is only one interpretation left. of two cities as two XML elements and . (description) is optional. There is also Also driven in large part by the mobile room for metadata, such as timestamps, the Web is another new activity in W3C, called brand of the capture device, the maximum and Efficient XML Interchange (EXI). In fact, it has minimum coordinates of the device, the accu- little to do with XML. Its goal is to develop a racy, the maximum frequency of capture, the meta-format that will do for structured binary number of the pen/brush in case there are sev- data what XML has done for structured text eral, etc. data. It will be based on the same tree model of elements and attributes as XML, the so-called XML Infoset, but it will be a binary format. Robert's signature The format should allow quick access to each element in the tree, without parsing intervening 130 155 144 159 158 160 170 154 179 143 179 129 166 125 152 128 elements (as much as possible). It's meant for 140 136 131 149 126 163 124 177 applications such as streaming video to devices 128 190 137 200 150 208 163 210 with limited memory or accessing parts of 178 208 192 201 205 192 214 180 compressed data without the need to uncom- press all data up to the desired parts. All techniques for adaptation of content to The example in figure 3 is from the EMMA new devices carry a privacy concern. On the specification. It shows one stage in a speech one hand, giving precise information to a recognition system that eventually should pro- server about a device and the user's preferences duce a simple XML file containing the names enables optimizations that enhance the user's experience, on the other hand, the user may not contain DCI calls, it is not possible for a client want all servers to know so much about him. to know what it would have done it it had been An example is information about handicaps. a different kind of device. Even if you don't tell the server exactly what W3C recently started work on defining a your handicap is, when you ask for certain common platform for applications, under the types of adaptations chances are that you are a name of RWC (Rich Web Client). Most of certain type of user. And that may result in W3C's work, apart from the DOM, has so far spam or worse. been concentrated on documents, but there are At first sight, techniques such as Media designers who don't want to leave the UI to the Queries that let the adaptation be done on the browser (as proved by the interest in client side should be safer, but servers can still DHTML/AJAX). And, of course, there are find out things with carefully designed Web kinds of information that are more easily dis- pages: if certain style sheets or certain images tributed procedurally than declaratively, i.e., in are only downloaded by certain types of the form of a program rather than a document. devices, you know that whoever downloads With such a platform, the possibilities for that resource has that device – unless clients hacking are endless (and thus security is of pri- mask their actions with random other down- mary concern). The possibilities for my col- loads. leagues at W3C who work on the specification As long as the adaptation is described to hack it into their ideal programming plat- declaratively, the client can see what it whould form are somewhat more limited, but it may have done in other situations. When programs nevertheless turn into something interesting…