SpeciAl SecTion {Open Source}

and Open{ StandardOpen Sourced SMIL for Interactivity

by Daniel F. Zucker | Wake3, Inc. | [email protected]

| | [email protected] by Dick Bulterman CWI

Smil, SynchrOniZeD multimeDia elements is activated, as well as where they are ren- integratiOn language, was the first mem- dered. Activation is triggered either by time-based ber in the family of open, XML-based standards rules or user interaction. developed and supported by the World Wide Web SMIL can be used to structure passive multimedia Consortium (W3C). It is an important resource in presentations that play without user input like a TV the interaction designer’s toolbox. It can be used show. More interestingly, SMIL can be used to react not only to develop time-based multimedia presen- to user input to build a more interactive presenta- tations, but also to implement media-rich interfaces tion such as the news [5] shown in Figure 1. In this for PC or embedded applications and devices. SMIL example, SMIL is used to create a in is supported by a number of open source players which clicking still-picture key frames causes the [1, 2], as well as by ’s Explorer, audio-video program to start in the main window. Apple’s QuickTime, and RealNetwork’s RealPlayer. SMIL can be used for any application requiring SMIL 1.0 was first published as a W3C recom- synchronization and presentation of media in time. mendation in June 1998. The SYMM Workgroup The DAISY Consortium [6] uses SMIL, for example, [3], responsible for the SMIL specification, is plan- to control the playback of talking books for the ning for a new release of SMIL with updated and visually impaired. SMIL is useful because it provides enhanced functionality as SMIL 3.0 [4]. SMIL 3.0 is a standards-based structure to DAISY’s time-based expected to become a recommendation by early audio presentation. Using SMIL, the user can easily 2008. The main new features of SMIL 3.0 will be navigate throughout the document, rapidly mov- discussed later in this article. ing to the next word, paragraph, or chapter. SMIL SMIL is used to specify the time-based interac- is also useful because it enforces the separation tions between different multimedia objects. SMIL between presentation content and structure. This does not encode the multimedia objects them- means that a common talking-book presentation selves, but references Web addresses where the structure can be used for many different individual media can be found. In creating a multimedia pre- talking books that differ only in media content. sentation consisting of video, still images, audio, By virtue of being an open standard, SMIL is well and captions, for example, the SMIL document supported by the open source community. SMIL would specify when each of these various media can be used license and royalty free. Combined

interactions / november + december 2007 :/ 1

IA XIV-6.V24.indd 41 10/15/07 4:42:32 PM SMIL STRUCTURE. Compared with a tempo- rally static markup language such as HTML or SVG (SVG without SMIL, that is), SMIL adds the concept of time. HTML and SVG describe how something looks at a single point in time—animated images, EcmaScript, and SMIL encapsulated in these formats are exceptions to this. Let’s focus on the core lan- guage for this analogy; SMIL provides a framework SMIL presentation Figure 1: An interactive to allow such presentations to vary over time. To be clear, SVG does incorporate the concept of time for animations. It does this by including with similar royalty-free media codecs, SMIL offers SMIL in a modular fashion using the SMIL anima- a complete multimedia platform comparable to tion module. In fact, SMIL’s uses for SVG animation other closed proprietary standards requiring hefty are perhaps one of the better-known uses of SMIL. license fees. This allows SVG elements to be animated over time. For example, in this way a clock can be constructed SMIL AND INTERACTION DESIGN. For the entirely in SVG without using EcmaScript. interaction designer, SMIL is useful for any presen- MMS (multimedia messaging) is another well- tation or interface requiring time-based interactions known use of SMIL. Cell phone users can create, with media. It is especially useful for media that can send, and receive slideshows. For MMS, the OMA be repurposed for playback on different form factor (owners of the MMS specification) define SMIL as devices. the presentation markup [7]. In this case, SMIL is the SMIL has several advantages. SMIL enforces the vehicle that allows the stringing together of single separation of structure and media in a presentation images into a time-based slideshow with optional or interface. By design, the media objects them- transitions between the media. What is less known selves are kept separate from the SMIL document is that SMIL can do much more than this. describing the presentation. This greatly simplifies Par, Seq, and Excl. SMIL’s declarative syntax to maintenance and eases the task of, for example, describe the interaction between time-based media using a single SMIL document to describe a presen- relies on three primary constructs, or containers: tation format while updating the media to produce par, seq, and excl. Par is for parallel, seq for sequen- a new presentation. It also allows the media objects tial, and excl for exclusive. Two media objects to change over time, even though the basic presen- placed in a par container play in parallel, meaning tation logic remains the same. they both play at the same time. (Layout description SMIL uses a declarative format rather than a likely has them located in two different places so procedural scripting language such as EcmaScript. that both media objects are visible in a meaning- Advanced effects such as transitions and animations ful way.) Two media objects placed in a seq play are supported implicitly in the language and require sequentially. That is, they play one after another. no heavy-duty scripting. The declarative nature of Excl was added for SMIL 2.0 and means that only SMIL also allows it to be automatically generated one media element in the group of elements can or to provide a basis for further transformation and play at a time. Usually, some sort of event logic is customization using standard Web tools. used to determine which media object is playing. SMIL is structured to allow efficient repurpos- Each of SMIL’s time containers define a local ing of content for adaptation to different types of timeline, in which a group of related media objects display devices. SMIL has a content-control archi- can be managed. The nature of the time container tecture that allows a designer to specify content provides a basic set of activation constraints that alternatives for different devices or for differing user eases the designer’s task of creating a presentation. preferences. The run-time binding of objects to the A seq container imposes general slideshow-like presentation also allows a presentation’s content temporal constraints among the objects: Adding model to change without having to redefine the new slides with default timing is easy, since only SMIL container. a new media reference needs to be added to the

:/ 42 interactions / november + december 2007

IA XIV-6.V24.indd 42 10/15/07 4:42:33 PM SpeciAl SecTion {Open Source}

SMIL file. None of the timings on individual objects as a media object’s beginning or terminating) and needs to be changed. In a parallel container, a com- external user events (such as an object within mon multitrack timeline is defined that provides a the presentation—a next button or a navigation common reference time base for the activation of arrow—being selected interactively by a user). multiple objects. The par and seq containers can Nearly all SMIL timing-related actions define a be nested, allowing an audio track to accompany a begin or end event. This allows companion media slideshow, or to provide several logical collections objects to be scheduled interactively, based on of media objects within the same presentation the duration or (conditional) activation of related context. content. No scripts are required to control this inter- SMIL is used to determine the interaction activity; instead, a begin (or end) condition is set on between media elements. It does not define the the companion object. media itself. Media elements can be audio, video, For user-centered interaction, SMIL provides text, decorated text such as HTML or time-based a mechanism for associating user events such as text. The last is useful for subtitles. mouse clicks or hovering activity with the start SMIL Timing. The par/seq/excl time-container or end of either individual media objects or with structure of SMIL provides a default set of timing SMIL timing containers. This allows basic interac- relations among objects. In many cases, all timing tion within a presentation with the need to invoke within a presentation can be determined by simply a scripting architecture. SMIL also provides basic placing objects in an appropriate time container. support for interaction using a DOM interface, The default timing can be further refined by using a although many SMIL players do not yet support this collection of timing attributes: attributes that add a feature. specific begin offset, an end time, or a duration to a media object. The timing attributes in SMIL override any tim- ing within a particular media object. For example, Messaging: a 15-second video clip can be trimmed or extended Multimedia using SMIL timing attributes. The attributes clip- Begin/clipEnd also allow a fragment of a larger MMS is a short text video to be played. Note that unlike formats such message optionally including as MPEG4, which define a timeline based on the media; the entire message can encoding of a particular video object, SMIL provides a flexible timeline that abstracts timing away from be played as a slideshow. A user the media and into the overall presentation. can start with a short message using In addition to timing attribute control, SMIL also SMS on a cell phone, and add provides time navigation within a presentation via a elements; some clients start with temporal hyperlink architecture. Jumping from one MMS. Adding specific fields object to another in a presentation has the effect of changes the SMS to MMS, which is adjusting the presentation timeline to the context of the link’s destination anchor. This allows all of tariffed at a higher rate. For the related content that would otherwise be active example, adding a subject line or a when the destination had been reached normally CC to an SMS makes it MMS. MMS can to also be active once the link is followed. It is the also have media attachments— SMIL scheduler that determines the temporal rela- a photo, a video, or an audio tionship among elements, meaning that each of the clip—but only one attachment can be individual media objects does not need to be aware dynamic (audio or video). The user of the presence of other media in the presenta- tion—a major benefit of SMIL. can add and rearrange pages. When SMIL Events. Interaction in SMIL is provided the message is received, it can be by a declarative event-based architecture that dis- played as a slideshow or viewed tinguishes between internal player events (such as a series of pages.

interactions / november + december 2007 :/ 3

IA XIV-6.V24.indd 43 10/15/07 4:42:41 PM In SMIL 3.0, a state mechanism is expected to be EZ Channel downloads content packages of added to SMIL in which a collection of variables can SMIL-based multimedia content that is played for be defined to further control presentation interac- the user like an interactive television show. The con- tion. These variables can be used to implement tent is downloaded during off-peak hours, stored counters, to store dynamic content or to interact on the handset, and is then played on demand with the content-control mechanism in SMIL to for the user. Today, there are many channels avail- influence the selection of individual media objects. able showcasing a variety of content. The KDDI EZ Channel service has been perhaps one of the most

References 1. Ambulant Open MEDIA IN SMIL. Prior to SMIL 3.0, W3C SMIL successful services based on SMIL to date. SMIL Player. http://www.cwi. profiles did not require particular media types or nl/projects/Ambulant/distPlayer. codecs. Unfortunately, this has led to interoper- ENHANCEMENTS FOR SMIL 3.0. SMIL html 2. “RealNetworks Releases Industry Standard SMIL Source ability problems in that compliant SMIL players can 3.0 includes several significant enhancements Code to Helix Community” 2006 play the same SMIL document, but they may not be from SMIL 2.1. It includes several new modules— Press Releases. 7 July 2003. able to play the same SMIL media. This means that SMILtext, State, External Timing, and DOM—as 3. necessarily play for another player. This is perhaps The bulk of SMIL structure and syntax remains W3C Synchronized Multimedia one factor that has dampened SMIL’s popularity. unchanged from SMIL 2.1, while several new addi- Home page. 4. Synchronized SMIL 3.0 attempts to remedy this situation by tions add functionality often requested from previ- Multimedia Integration Language. including a common set of popular, royalty-free ous versions or were implemented as private exten- 5. Ambulant Open SMIL Player Demos 6. Digital Access ed media types is guaranteed to be playable on any use within SMIL. Prior to SMIL 2.1, there was no Information SYstem (DAISY) Consortium. 7. Open Mobile erability between content intended for both desk- presentation. The text needed to be referenced Alliance Release Programs and top and mobile devices. Future set-top box profiles from an external file. Thus, a new file often had to Specifications 8. Freed, N. and N. Borenstein “RFC The required formats from the SMIL Mobile, text formatting and decoration, such as the ability 2046: Multipurpose Internet Extended Mobile, and Language Profiles are defined to specify colors, fonts, and text styles, are support- Mail Extensions (MIME) Part as follows: ed. SMILtext is meant to be lightweight. If a heavier Two: Media Types,” November 1996, 9. “ audio • Ogg Vorbis [9] Exchange Profile” (DFXP) [14] is recommended. format” from Xiph.Org Foundation. • image/png [10,11] The addition of SMILtext is extremely useful to A fully open, non-proprietary, patent-and-royalty-free, general- • image/ [8, 12] include simple inline text without the overhead of purpose compressed audio format. creating a new external media object. The Ogg Vorbis specification is COMMERCIAL EXAMPLES OF SMIL DOM. For the first time SMIL 3.0 explicitly available at http://www.xiph. org/vorbis/doc/ 10. Randers- ACCESS/KDDI deployment. One example in defines a DOM to allow interaction with scripting Pehrson, Glenn and Thomas which SMIL is successfully used in a commercial languages such as EcmaScript. The DOM functional- Boutell. “Registration of new service is KDDI’s EZ Channel Plus Service [13]. It ity is divided into primarily two components. The Media Type image/png.” 27 July 1996.

:/ 44 interactions / november + december 2007

IA XIV-6.V24.indd 44 10/15/07 4:42:41 PM S p e c i a l S e c t i o n {Open Source}

the model earlier provided for SMIL Animation profiles to meet the needs of the mobile industry within SVG. for cellular handsets such as used in the KDDI SMIL External Timing, or Timesheets. Timesheets service described earlier. The Mobile Profile is based are an exciting new addition to SMIL 3.0. The goal on 3gpp’s SMIL profile, while the Extended Mobile of timesheets is to allow the inclusion of SMIL infor- Profile is based on 3gpp2’s SMIL profile. In SMIL assignments/media-types/image/ mation into another XML-based markup language 3.0, required content types have been added to png.> 11. “Portable Network such as HTML. In this way, timing information improve interoperability. Graphics (PNG) Specification (Second Edition)” Information is added to a presentation format that does not Tiny Profile. The Tiny Profile replaces SMIL technology—Computer graphics include timing information. The original document Basic from earlier versions of SMIL. This is meant to and image processing—Portable can still be played using a legacy host player since be the lightest feature set that can be implemented Network Graphics (PNG): Functional specification. ISO/ the addition of SMIL external timing information while still supporting the core elements of SMIL. It IEC 15948:2003 (E).Thomas does not modify the host-language syntax. Of is designed for low-complexity devices such as digi- Boutell (Ed.), World Wide Web course, in this case the timing information is lost, tal cameras or players. It can also be used to Consortium 01 October 1996. This W3C Recommendation is but the presentation is still playable. support functionality as would be used from available at http://www.w3.org/ External timing uses an external file to specify a media server. TR/REC-png. 12. “JPEG File timing information. In this way, it is syntactically Daisy Profile. The Daisy Consortium defines Interchange Format, Version 1.02”; Eric Hamilton, World Wide similar to CSS [15]. The functionality provided in a format for digital talking books for the print Web Consortium September the ability to append time-based information to an impaired, including those with blindness, low vision, 1,1992. This specification is existing XML document can be considered analo- and dyslexia. Daisy is a recognized worldwide stan- available at http://www.w3.org/ Graphics/JPEG/jfif.txt. 13. gous to the ability to append style information dard for talking books and has been involved with “NetFront(tm) SMIL Player using CSS. Similarly, a single timesheet document, SMIL for some time. SMIL 3.0 includes a formal Adopted as Multimedia Player for like a style sheet, can be defined once and reused Daisy profile for the first time. Profiles such as the KDDI’s New ‘EZ Channel Plus’ Service” ACCESS Press Release. in multiple documents to provide a common tem- Mobile Profile are too heavy, since Daisy does not 21 September 2006. 14. Timed Text variables from within SMIL. The intent of this is Daisy. (TT) Authoring Format 1.0— to allow more explicit control and visibility for the Distribution Format Exchange state of the SMIL presentation without requiring USING SMIL. An example better helps illustrate Profile (DFXP) 15. the use of DOM and an external scripting language. SMIL and SMIL’s benefits to the designer. Along Cascading Style Sheets Home Applications for state could include quizzes and with the Ambulant open source SMIL player, a Page< http://www.w3.org/Style/ computer-aided instruction or interactive adaptation number of examples are provided for download CSS/> 16. Ambulant Demos example. It is difficult to appreciate all the aspects SMIL 3.0 PROFILES. SMIL is designed to be of a multimedia presentation in print format, so modular. There are 63 modules grouped into 13 the reader is encouraged to download and play this functional areas. Individual modules of SMIL are example on a SMIL player. grouped to create a profile. A profile is designed to The Flashlight example illustrates several key fit a particular application’s needs in terms of bal- points about SMIL. First, it shows how to structure ancing functionality and complexity. When a given a SMIL presentation for basic navigation. If the user player supports a profile, then it is guaranteed that does nothing, then the presentation plays automati- any content authored for that profile will play in cally from start to finish in sequence. However, the that player. user has the option of clicking on any of the major SMIL 3.0 adds two new profiles, the Daisy Profile topic buttons, and that part of the presentation and Tiny Profile, and keeps four profiles—Mobile, begins playing immediately. Extended Mobile, Scalability Framework, and Second, this presentation shows how two SMIL Language—from SMIL 2.1. presentations can be created referring to the same Mobile Profiles. The SMIL 3.0 Mobile Profile content library in order to develop presentations and Extended Mobile Profile are mostly unchanged targeted for different devices. In this case, there is from their introduction for SMIL 2.1. A key moti- a presentation for a desktop player and a presenta- vator for the release of SMIL 2.1 was to define tion for a handheld player. The handheld presenta-

interactions / november + december 2007 :/ 45

IA XIV-6.V24.indd 45 10/15/07 4:42:42 PM tion is targeted for a low-bandwidth device and ogy you are looking for; it is already well known for uses still pictures with an audio voiceover, while the its role in SVG and MMS. SMIL 3.0 will add miss- desktop presentation uses video. The handheld pre- ing features and functionalities to further broaden sentation also makes more use of animation, which SMIL’s appeal. tends to be more efficient in bandwidth usage. The

layouts are also adjusted to accommodate different- About the Authors Dr. Daniel size displays. Figure 2 compares the two presenta- Zucker is an independent consultant special- tions. izing in the mobile Web. He was co-chair for Last, the presentation takes advantage of the the W3C SMIL Workgroup, and most recently systemTest element, allowing a single presentation was senior director of technology and FAE for ACCESS, a to be dynamically modified based on application leading supplier of non-PC Web browsers. Prior to requirements. In this case, audio voiceovers are ACCESS, Zucker was VP of engineering for Mobilearia and supplied using both British and American English. CTO for ePocrates. The attribute systemLanguage is used to globally select, at runtime, the proper audio files based on Dr. Dick Bulterman is a senior researcher at intended audience. CWI in Amsterdam, where he has headed The code for this is shown here: the distributed multimedia languages and interfaces theme since 2004. Prior to joining group at TU Delft (1985), and a part-time appointment in computer science at the University of Utrecht (1989-1991). SUMMARY. Both open source SMIL players as Dr. Bulterman received a Ph.D. in computer science from well as many SMIL content examples are available Brown University in 1982. He is on the editorial board of for downloading from the Web. The next time you the ACM/Springer Multimedia Systems Journal and need to create a multimedia presentation, try the Multimedia Tools and Applications. He is a member of open source approach. SMIL may be the technol- Sigma Xi, ACM, and IEEE.

Figure 2: Presentations with (right) referencing the same contentMobile library. Layout (left) and Desktop Layout

© ACM 1072-5220/07/1100 $5.00

:/ 46 interactions / november + december 2007

IA XIV-6.V24.indd 46 10/15/07 4:42:45 PM