An MPEG Standard for Rich Media Services
Total Page:16
File Type:pdf, Size:1020Kb
What’s New with MPEG? ded domain, service interfaces must leverage the online experience. Finally, because users pay for these services, they also expect a decent level of quality, efficiency, and readability. An MPEG The Moving Picture Experts Group (see the “Related Standards Organizations” sidebar on p. 64 for brief descriptions of MPEG and other groups involved in this effort) has specified Standard for Rich MPEG-4 part 20 (formally known as ISO/IEC 14496-20) as the new rich media standard dedi- cated to the mobile, embedded, and consumer electronics industries. MPEG-4 part 20 defines Media Services two binary formats: Lightweight Application Scene Representation (Laser) and Simple Aggregation Format (SAF). Laser enables a fresh Jean-Claude Dufourd and Olivier Avaro and active user experience on constrained net- Streamezzo works and devices based on enriched content, including audio, video, text, and graphics. It Cyril Concolato addresses the requirements of the end-to-end ENST rich media publication chain: ease of content cre- ation, optimized rich media data delivery, and enhanced rendering on all devices. As such, it ful- rich media service is a dynamic, inter- fills the need for an efficient open standard. Lightweight active collection of multimedia data Application Scene such as audio, video, graphics, and Key features and use cases Representation text. Services range from movies Four key features distinguish Laser from exist- (Laser) is the Moving enrichedA with vector graphic overlays and inter- ing technologies: Picture Expert activity (possibly enhanced with closed captions) Group’s solution for to complex multistep services with fluid interac- ❚ In Laser, graphic animations, audio, video, delivering rich media tion and different media types at each step. and text are packaged and streamed together. services to mobile, Demand for these services is rapidly increasing, Unlike existing mobile technologies that are resource-constrained spurred by the development of the next-genera- mostly aggregations of various components, devices. Laser tion mobile infrastructure and the generalization Laser’s design is based on a single, well- provides easy of TV content to new environments. However, defined, and deterministic component that content creation, despite long-lasting deployments and significant integrates all of the media (the same design optimized rich investment from various industries, mobile and, that made Macromedia Flash successful on media data delivery, more generally, embedded interactive services the Web). This integration ensures the rich- and enhanced (mobile Internet and interactive mobile TV, for ness and quality of the end-user experience. rendering on all example) have failed to reach the masses. In addi- devices. tion to conjectural (such as economic) and struc- ❚ Laser provides full-screen interactivity with all tural (such as a lack of compelling business streams. It uses vector graphic technology so models) problems, current technologies (see the users can easily fit content to the screen size. related sidebar on p. 63) have failed to provide an Laser therefore provides optimal content dis- effective user experience. play despite high variations in screen resolu- Using rich media services is more challenging tion. In addition, it can use virtually all pixels in embedded devices than on PCs, where various as elements of the user interface, letting users interfaces are available and homogeneously design rich and user-friendly interfaces. implemented (such as a mouse or keyboard), and ergonomic concepts have been tested and vali- ❚ Laser efficiently delivers real-time content over dated. On the move, when it’s not always easy to constrained networks. More specifically, it interact and time is limited, users expect to be delivers media content in packaged pieces, let- one click away from the information they need. ting the device display a piece as it’s received In addition, end users are accustomed to quality (as opposed to download-and-play mecha- Web interfaces, so to be successful in the embed- nisms). Laser generalizes the streaming con- 60 1070-986X/05/$20.00 © 2005 IEEE (a) (b) (c) Figure 1. Laser-based ceptalready in place for audio and video Interactive mobile TV rich media services: datato scene description and rich media. As Interactive mobile TV (see Figure 1b) aggre- (a) rich media portal, such, content providers can design services to gates multiple rich media use cases, from inter- (b) interactive mobile keep some information of interest on the active mosaic, electronic program guide, and TV, and (c) interactive screen at all times. voting to personalized newscast. All of these screen saver. applications require a system that can provide ❚ Finally, Laser delivers rich media service at deterministic rendering and behavior of rich rates from 10 kilobits per second (Kbps) using media content (including audiovisual, text, vector graphic compression and dynamic graphics, and images, along with streamed TV scene updates. This drastically reduces end and radio channels) in the user interface. The sys- users’ waiting time compared to standard tem must allow fluid navigation through content Web-like approaches in which the system in a single application or service, as well as local resends the complete page even if only small or remote synchronized interaction for voting changes have been made. This functionality and personalization (for example, related menus is useful not only in low bit-rate networks or submenus, advertising, and content in the such as General Packet Radio Service, but also user profile or service subscription). in higher bit-rate networks in which rich media services can be sent at low rates, pre- Interactive screen saver serving bandwidth to improve audio and The interactive screen saver (illustrated in video quality. Figure 1c) is an instance of a larger class of appli- cations that receive content updates in the back- Three use cases demonstrate the new stan- ground (such as fixed or mobile convergent dard’s benefits. services). The screen saver uses mostly static datathat is, text, graphics, and images Rich media portal arranged with transitions similar to those in a In this application (see Figure 1a), a Laser slide show, with an element of randomness in engine enhances an existing Wireless Access the presentation. The server adds new elements Protocol (WAP) or Extensible Hypertext Markup and removes expired elements from the applica- Language (XHTML) service with rich media tion stored on the device. Developers should October–December 2005 much like Flash enhances Web sites. When a user design this application with care to avoid overuse accesses the WAP portal, a hyperlink provides of the device’s power resource. access to the site’s rich media part, giving the user a complete, deterministic, and consistent Technical aspects rich media experience. An intuitive interface Part 20 of the MPEG-4 standard consists of with a pop-up menu makes navigating the site two specifications: Laser, which specifies the simpler and the screen feel larger. Figure 1a coded representation of multimedia presenta- shows rich text with smallbut readableArabic tions for rich media services; and SAF, which fonts. At any time, a hyperlink can bring the user defines tools for fulfilling the requirements of back to the original portal. rich media service design at the interface 61 guaranteeing tight synchronization between the Application scene and the media assets composing the rich media presentation. MPEG-4 part 20 defines a Laser engine as the Scalable Vector Laser Graphics (SVG) extensions viewer for Laser presentations. Such an engine scene tree has rich media composition capabilities on top Audio Video Image Font ... of the capabilities common to classic multimedia Dynamic updates players. These composition capabilities are, as a result of the technology selection process, based Binary encoding on Scalable Vector Graphics (SVG) Tiny 1.1.1 The composition capabilities rely on the use of an Simple Aggregation Format (SAF) SVG scene tree and are enhanced with key fea- tures for mobile services, such as binary encod- Transport ing, dynamic updates, state-of-the-art font representation, and the stable features of the Network upcoming SVG Tiny 1.2 specification (described in the “Current Technologies” sidebar), includ- Figure 2. Laser engine between scene representation and transport ing audio and video support. Figure 2 illustrates architecture. mechanisms. the Laser engine architecture. Laser SVG scene tree. Laser uses an SVG scene tree In Laser, a multimedia presentation is a col- at its core. It imports composition primitives lection of a scene description and media (zero, from the World Wide Web Consortium’s (W3C) one, or more). A media is an individual content specifications (all of SVG Tiny 1.1, some of SVG item of the following type: image (still picture), 1.1, and Synchronized Multimedia Integration video (moving picture), audio, and by extension, Language, or SMIL, version 2) and uses the SVG font data. A scene description consists of text, rendering model to present the scene tree. Laser graphics, animation, interactivity, and spatial specifies hyperlinking capabilities, audio and and temporal layout. video media embedding, vector graphics repre- A Laser scene description specifies four aspects sentations, animation, and interactivity features. of a presentation: Scene tree extensions. After selecting SVG as ❚ how the scene elements (media or graphics) Laser’s core technology, MPEG identified several are organized spatially (for example, the visu- areas needing extensions to allow the develop- al elements’ spatial layout); ment of efficient services: ❚ how the scene elements are organized tempo- ❚ Because SVG Tiny doesn’t have clipping, rally (for example, if and how they’re syn- MPEG added simple axis-aligned rectangular chronized, or when they start or end); clipping to let developers create such com- mon user interface widgets as ticker tapes and ❚ how users interact with the elements in the simple transitions. scene (for example, when a user clicks on an image); and ❚ SVG lacks a restricted, nonresampling rotation for video and images and a full-screen mode, ❚ how changes occur in a scene.