T-111.5350 Multimedia Programming Pablo Cesar

What is Multimedia? Multimedia

Pablo Cesar [email protected] http://www.tml.hut.fi/~pcesar T-111.5350 Multimedia Programming Pablo Cesar T-111.5350 Multimedia Programming Pablo Cesar

Outline

• Definitions of Multimedia • Multimedia Elements: – Multimedia Objects: Audio, video, graphics, text – Visual Style – Layout of those objects • Temporal dimension (animation, synchronization) • Graphical layout – Application Logic: State of the application (e.g., Games) – User Interaction: Passive to authoring (Visualization, Navigation, WIMP concepts) • Taxonomy of Authoring Content Formats – Expressive Power, Easiness of Use, Safety of Distribution, Interoperability • Compiled Languages (, C++) • Virtual Machine Languages (Java) • XML Based Languages (SMIL, XForms) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Heller

Media Expression • Media Type: text, sound, (Increasing abstraction) graphics, and motion • Media Expression: describes Multimedia the level of abstraction using Motion the media Graphics – Elaboration: no edited Sound

Media Type Text

information Elaboration Representation Abstraction – Representation: edited or stylized Audience – Abstraction: For example Discipline Interactivity icons (most abstract) Quality • Context: includes properties Usefulness such as Aesthetics – Interactivity Context – Aesthetics – Audience T-111.5350 Multimedia Programming Pablo Cesar

Multimedia

Purchase Sign

• Modality: Modality – Aural: audio

– Visual: graphics Concrete iconic Abstract iconic Symbolic • Nature of the sign: Aural Individual Visual – Concrete iconic (photorealistic image) Augmented – Abstract iconic (map) Temporal – Symbolic (written word)

• Syntax / Arrangement: Syntax Linear – Individual Schematic – Augmented – Temporal Network – Linear –… T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Bulterman and Hardman • Media Assets: How to reference the multimedia objects of the presentation • Synchronization Composition: – Hard timing relationships, Relative structural ordering – Constraints • Spatial Layout – Implicit (video), explicit, and dynamic • Asynchronous Events – Content-based (timing) and user interaction (navigation) • Adjunct/Replacement Content – Alternative content / adaptation content • Performance Analysis – performance optimization for various delivery scenarios T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Vuorimaa • Multiple media – Types: text, graphics, animation, image, audio, and video – Source: Natural (e.g., video) vs. artificial (e.g., 3D graphics) • Interaction – Stand-alone vs. Networked applications – Level of interaction (user interface, application, and service) – Amount of interaction • E-mail, video-on-demand, video conference, video • Game, and virtual reality • Timing – External synchronization of different media (e.g., video and slides) – Internal timing within single medium (e.g. video) – Usually applications have time dimension (e.g., story line) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Summary (1/2) • Multimedia Objects – Audio, video, graphics, text • Visual Style • Layout of those objects – Temporal dimension (animation, synchronization) – Spatial layout • User Interaction – Passive to authoring (Visualization, Navigation, WIMP concepts) • Application Logic – State of the application (e.g., Games) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Summary (2/2) ”Computer mediated applications that integrate and present different media objects, which are arrange spatially and temporally. Moreover, user interaction can control the behavior of the application. ”

Multimedia Visual Temporal Spatial User Objects Style Dimension Layout Interaction

Application Logic T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Objects and Visual Style • Discrete Media – Icons: semantic images (e.g., stop symbol). Require the user to have previous knowledge – Graphics: computer generated. Can be 2D or 3D graphics depending on the goal – Images: natural source (e.g., photograph) – Text: Size, hqpv"v{rg , Color • Continuous Media – Motion Pictures (audio + video) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Spatial Layout (Pihkala 2003, Boll 2001) • Absolute – coordinates relative to origin • Directional relations: North – define order in space • Topological relations: East – disjoint, touch, equals, inside Contains of, covered by, contains, cover, and overlap • Text Flow: – one-dimensional flow showed in two-dimensional area T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Temporal Dimension (Pihkala 2003, Boll 2001) • Temporal Models: – Definite: for example 6 seconds – Indefinite: for example, when user clicks – Parallel and Sequential relations (e.g., start these two videos at this moment or start this video after this other one) • Animation: – Mixture of temporal dimension and spatial layout (i.e., position of an object changes in time)

Time T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements User Interaction • Different Levels of Interaction (Aleem): – Passive: only visualization – Reactive: limited interaction (e.g., Scroll Pane functionality). – Proactive: choose a path or make selections (e.g., Button). – Reciprocal: corresponds to user authoring of information • Interaction Models (Boll): – Navigational: choice to decide where to go next – Design: user can modify the visual style of the presentation (e.g., colors) – Movie: user can control the global time (e.g., VCR capabilities) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Elements Application Logic • Traditionally multimedia presentations did not have that much logic: – Virtual visit to a museum, DVD menus... • Real – time interactive systems: – Virtual Reality worlds, games • Application Logic needs of a programming language (if, case, goto...) – Compiled Languages: C, C++ – Virtual Machine: Java – World Wide Web, MPEG-4, Director: scripting T-111.5350 Multimedia Programming Pablo Cesar

Taxonomy of Authoring Content Formats Requirements • Supported Media Types: audio, video, text, graphics, and animation • Arrangement of the signs: spatial and temporal • Interaction: passive, reactive, proactive, and reciprocal • Difficulty to use (threshold) • Expressional power (i.e., ceiling) • Safety of Distribution • Interoperability Threshold Ceiling Interoperability Safety of Distribution Compiled Languages +++ +++ + + VM Languages ++ ++ ++ ++ XML based Languages + + +++ +++ T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Normally, used for system software (e.g., ) and resource demanding services: C, C++ Pro Con • Efficient approach • Interoperability (each • Expressive power (closer service has to be to computer hardware) compiled to target device) • Less safer to distribute (it can include harmful code) T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages System Software • ”User Interface Software Tools” (1995, Myers) defines a layered model • Applications implemented using higher-level tools • Toolkit: a of widgets used by applications • : helps user to monitor and control different contexts (input and output functionality) T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System (1/3)

KDE Desktop Gnome Environment

Window Manager Gnome Libraries Kwin Toolkit KDE Libraries One per Session GIMP Toolkit (GTK) Toolkit Enlight. GDK Toolkit

Xlib GLib

Base X Network X Network Layer Protocol Protocol

XServer T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System (2/3) • X-Window – X.org: fonts management, graphics card support, functionality – Desktop environments: KDE, GNOME (Toolkits + Applications) – Window Managers: FluxBox, Sawfish… • DirectFB – XDirectFB: X-Window Support on DirectFB – DirectFBGL • Microsoft Windows – DirectX • Mac – Video: QucikTime – 3D: OpenGL – 2D: QuickDraw T-111.5350 Multimedia Programming Pablo Cesar Screenshots – X.org T-111.5350 Multimedia Programming Pablo Cesar Screenshots – DirectFB T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System: DirectFB

DirectFB Application

User Space

DirectFB Chipset Driver

Framebufffer Driver Kernel Space

Timing and Hardware Framebufffer Mode Accelerator T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Windowing System: Direct-X

Win32 Application Win32 Application

Direct3D API GDI HAL Device

Device Driver Interface (DDI)

Graphics Hardware T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Toolkits

• Toolkits provide – Interaction: to handle user input – Canvas Operations: both rendering region, canvas, and graphics primitives – Set of Widgets: predefined user interface elements (e.g., Button) – Graphical Layout: to control the location of the widgets • Examples: QT, GTK • Virtual Toolkit – Device independent Toolkit – Mapped to actual Toolkit in the device – Example: AWT T-111.5350 Multimedia Programming Pablo Cesar

Compiled Languages Media Providers • Audio/Video: Xine, MPlayer • Television: linuxtv • Games: SDL • Other Languages: For example libflash • 3D graphics: – OpenGL – OpenGL ES • Home media platforms: LIMMBO, MythTV T-111.5350 Multimedia Programming Pablo Cesar

VM Languages A Virtual Machine is an abstraction of the computing environment. JVM + APIs Pro Con • Platform independence • Heavy applications • Safer to distribute (because of VM concept) (restricts potential • Difficult of use security attacks) (programming language) • Expressive power • Less powerful than (programming language) compiled languages • Well documented APIs T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Java Overview • Nowadays, trying to target all kind of computer devices • Editions: – Java 2 Enterprise Edition (J2EE): for servers and enterprise computers – Java 2 Standard Edition (J2SE): for servers and personal computers – Java 2 Micro Edition (J2ME): for embedded devices, PDAs, mobile phones, and Digital television set-top boxes – Java Card: for smart cards MIDP • Profile Profile – Requirements for a specific vertical market of devices (set of APIs) CLDC • Configuration Configuration – Minimum platform for a horizontal KVM grouping of devices (VM + core APIs) T-111.5350 Multimedia Programming Pablo Cesar

Servers Personal TV STBs Mobile Phones Smart Cards Computers High End PDAs Low end PDAs Optional Packages Optional Packages Java 2 Optional Enterprise Java 2 Packages Edition Standard Personal Optional Packages (J2EE) Edition Profile (J2SE) Foundation MIDP Profile

CDC CLDC Java Card

Java Virtual Machine KVM Card VM

Java 2 Micro Edition(J2ME) T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Multimedia • User interface development (AWT/) – Layout: Grid, North-South-East-West, Flow – Set of Widgets: Button, TextArea – User Interaction: awt.ui.* (Mouse, Keyboard…) • Video/Audio and Synchronization (JMF) – Manager, Player, Data Source, and Controller • 3D Graphics – Java3D – Java wrappers for OpenGL • Different Devices – Television: MHP/OCAP/ACAP/ARIB -> GEM – Handheld: MIDP T-111.5350 Multimedia Programming Pablo Cesar

VM Languages User Interface Development T-111.5350 Multimedia Programming Pablo Cesar

VM Languages JMF (1/2)

Retrieves the actual Implements the state media data machine

Decodes and plays the media data T-111.5350 Multimedia Programming Pablo Cesar

VM Languages JMF (2/2) • Unrealised: when it does not have all the information to acquire the needed resources • Realised: when it has all the information to acquire the needed resources • Prefetched: when it has all the needed resources, and has already prefetched enough media data to start playing immediately • Started: when it is actually playing the media T-111.5350 Multimedia Programming Pablo Cesar

VM Languages 3D Graphics • Java3D – Completely new API for stand-alone 3D graphics applications – Can use any underlying architecture (Direct-X, OpenGL...) – It might not be the most efficient approach – Developers have to learn a new API • Java wrappers of OpenGL – Functionality from OpenGL – Developers knows the API already – Only wrappers: uses Java Native Interface (JNI) – Much intercommunication between layers (Java -> C) – API is not standardised yet (Java Specification Requests) • JSR 231: OpenGL • JSR 239: OpenGL ES T-111.5350 Multimedia Programming Pablo Cesar

VM Languages J2ME TV STBs Mobile Phones • Defines two Configurations: High End PDAs Low end PDAs – CDC: High end consumer devices • RAM Java Memory: around 2MB • ROM Java Memory: around 2.5MB – CLDC: Low end consumer devices Optional Packages • Processor:16 bit/16 MHz or higher • Java total memory: 160-512 KB Personal Optional Packages • CDC (Connected Device) Profile – Personal Profile • Adds support for lightweight AWT Foundation MIDP – Foundation Profile Profile • Basic application APIs (no GUI) • CLDC (Connected Limited Device) CDC CLDC – Mobile Information Device Profile (MDIP) • Application APIs + GUI APIs JVM KVM T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Handheld T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Television

Interoperable Interoperable Data Interoperable Data Application Application Application

Application Transport Sun Java HAVi DAVIC DVB Specific Manager Protocol(s) APIs APIs APIs APIs Java Virtual Machine Operating System, drivers, firmware System Software T-111.5350 Multimedia Programming Pablo Cesar

VM Languages Summary Supported Media Types Text, Graphics AWT Video, Audio JMF Arrangement of the signs Spatial AWT Temporal Java Threads Interaction AWT Events Different Devices Handheld MIDP Television GEM T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Declarative programming language (only what has to be done, not how). Major contributor is W3C Pro Con • Easiness of use (you can • Expressive power (quite even use a text editor) limited, not a • Interoperability (only programming language!) needs a compatible • Use of scripting for browser) application logic (or not?) • Safest to distribute • Needs of a service under it (browser) T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Overview Document • HTML & XHTML Document XML Based • Multimedia LanguageDocument XML Based – SMIL, Timesheets Language XML Based • User Interface Language – XForms, XIML • Vector Graphics – SVG • Voice – VoiceXML T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages HTML & XHTML

HTML XHTML • HTML 4.01: (24 Dec. 1999) W3C • XHTML 1.0 (26 Jan. 2000, revised 1 Recommendation Aug. 2002) W3C Recommendation • Lingua franca for publishing • XHTML 2.0: (22 July 2004) W3C hypertext on the WWW. Working Draft • Non-proprietary • Reformulation of HTML 4 in XML • Can be created by a wide range of • Intention tools: – To only describe the structure of the – Text editors document (CSS formatting) – Authoring tools • XHTML 1.0 • All kind of features (mixed – Well formed documents together): – Proper nesting – UI components – ... – Fonts • XHTML 2.0 – Lists – No backwards compatible – Reduces scripting – Includes XForms and XML Events T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages XHTML Modularization and XHTML 1.1

Other XHTML Modules Other XHTML Modules Applet Intrinsic Events Applet Intrinsic Events Presentation Frames Presentation Frames Edit Target Edit Target Core Core Modules Bi−directional Text IFrame Modules Bi−directional Text IFrame Forms Name Identification Forms Name Identification Structure Structure Tables Legacy Tables Legacy Text Text Basic Forms Metainformation Basic Forms Metainformation Hypertext Hypertext Basic Tables Scripting Basic Tables Scripting List List Image Stylesheet Image Stylesheet Object Style Attribute Object Style Attribute Client−side Image Map Link Client−side Image Map Link Server−Side Image Map Base Server−Side Image Map Base

Other W3C Private Other W3C Private Modules Modules Modules Modules Ruby Annotation T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Multimedia

SMIL Timesheets • SMIL 2.0 (07 Aug. 2001) W3C • Similar to CSS, but for Recommendation temporal dimension • Easy to write, like HTML • Document composed of: • Doesn’t define media formats, – Content: XHTML only integrates them – Formatting: CSS • ,

XML Based Languages User Interface

XForms XUL • XForms 1.0 (14 Oct. 2003) W3C • XML User Interface Language Recommendation • Only supported in Mozilla and • Next generation of web forms Netscape 6 (or later) browsers • Not intended as a self-standing • Only for window-based graphical document type UI (mobile phones?) • Uses host language for the • Abstraction only at the platform document layout (e.g., XHTML, level (not at the UI level, voice?) SMIL) • It separates: • Advances user interface features: • Client application definition and • text input, select one, select programmatic logic many, submit • Presentation (using CSS) • User input can be validated in the • Language-specific text labels client-side • Look & feel changed as wished • Calculations are done, as well, in • Interaction achieved by scripting the client side • Interface elements: windows, menubar, scrollbar T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Vector Graphics, Voice SVG VoiceXML • SVG 1.0 (04 Sept. 2001) W3C • VoiceXML 2.0 (16 Mar. 2004) Recommendation W3C Recommendation • SVG 1.1 (14 Jan. 2003) W3C • Creation of audio dialogs (user Recommendation interfaces) • Describes vector-based graphics • Input for the Web (no pixel based) • Speech Recognition and/or touch • Shapes (e.g., lines & curves) tone (keypad) • Images • Output • Text • Pre-recorded audio and Text-to- • Drawings can be Speech Synthesis (TTS) • Interactive (e.g., Mouse clicked ) • Describes: • Animated (e.g., Change location) – Spoken prompts: synthetic speech – Recognition of spoken words and touch tone key presses (fields) – Control of dialog flow (menu, form that can be submitted to server) – Telephony control (call transfer) T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Terminals & Browsers (Desktop)

http://www.xsmiles.org http://www.opera.com/

http://www.mozilla.org

http://www.microsoft.com/windows/ie/ http://home.netscape.com/ T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Terminals & Browsers (Embedded)

Espial Browser

Web TV

Mobile Phone T-111.5350 Multimedia Programming Pablo Cesar

XML Based Languages Summary XHTML SVG SMIL XForms Media Types Audio No Yes Yes -- Video No No Yes -- Text, Images Yes Yes Yes -- Arrangement of the signs Spatial Flow & Absolute -- Absolute Temporal No No Yes -- Interaction Links Links Links Full T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages

Please, OK OrderedGroup select a (Scene) topic by using your remote control

MovieTexture Transform2D (Video) (Graphics)

Text and Rectangle ImageTexture TouchSensor (Background) (Image of Topics) (Panel)

Please,

select a topic by Button Button Button using your remote control OK T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MPEG-4 Overview (1/2) • Evolution: – MPEG traditionally targeted to audio/video codecs (MPEG-1, MPEG-2) – Complex toolkit capable of providing solutions for multimedia applications • Scene: – Composition of different multimedia objects (2D, 3D, video) including their spatial and temporal relationships • Entry points: – BInary Format for Scene (BIFS): • Hierarchical structure (scene graph) • Properties: color, size, position, and timing • Behavior: BIFS commands (conditional) and Animations – MPEG-Java: set of Java APIs – eXtensible MPEG-4 Texttual (XMT): XML language that describes scenes T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MPEG-4 Overview (2/2) • Some of the Scene Nodes: – Top: root of the graph (e.g., Layer3D and Layer2D) – Grouping: containers of multimedia objects – Sensor: nodes capable of detecting events (e.g., Time and Touch) – Shape: Graphical Primitives that include two fields: Geometry (e.g., rectangle and circle) and Appearance (e.g., texture and material) – Face: integration of synthetic 3D human-like objects • Interaction: – Sensors detect events and Route distribute them – Predefined behaviors: resize, relocate – Complex behavior: script or Java • Widgets: – Can be implemented (e.g., sensor + Shape) • Layout: – Local coordinates of the objects (more complex automatic layout is not permitted) T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MHEG Overview

• Content Classes – Multimedia objects (e.g., video or audio clips) – Contained in MHEG object (small data) or reference (e.g., filename, web server address) – Author can reference to smaller sections (e.g., track 5) • Behavior Classes: – Synchronization of events and user interaction – User Interaction • The action class: – Event triggers – Sequential and parallel • The link class: establishes relationships between events and objects i.e. what actions to take on what objects in response to a particular event. • Selection and modification classes: – E.g., Push button, checkbox, radio button, slider, text entry field and text lists – Selections, input information and trigger events. T-111.5350 Multimedia Programming Pablo Cesar

Multimedia Languages MHEG Example

(scene:InfoScene1 group-items: (bitmap: BgndInfo content-hook: #bitmapHook original-box-size: (320 240) original-position: (0 0) content-data: referenced-content: "InfoBngd" ) (text: content-hook: #textHook original-box-size: (280 20) original-position: (40 50) content-data: included-content: "1. Lubricate..." ) links: (link: Link1 event-source: InfoScene1 event-type: #UserInput event-data: #Left link-effect: action: transition-to: InfoScene2 ) ) T-111.5350 Multimedia Programming Pablo Cesar

Conclusion

• Multimedia – Multimedia objects, visual style – Spatial layout, temporal dimension – Application logic, user interaction • Four alternatives (from taxonomy) – Compiled languages (C): most efficient, less safer to distribute – VM languages (Java): programming language, interoperable – XML based languages: most interoperable, less expressive power – Multimedia Languages: intended for multimedia • Number of APIs – C: OpenGL/Direct-X, DirectFB, SDL, linuxTV – Java: AWT, Swing, JMF, Java3D, Java OpenGL – XML: XHTML, SMIL, Timesheets, XForms, SVG, VoiceXML T-111.5350 Multimedia Programming Pablo Cesar References 1. T. A. Aleem. A Taxonomy of Multimedia Interactivity . Doctoral dissertation, The Union Institute, USA, September 1998. 2. S. Boll. ZYX, Towards Flexible Multimedia Document Models for Reuse and Adaptation . Doctoral dissertation, University of Vienna, Austria, August 2001. 3. D. C. A. Bulterman and L. Hardman, Structured Multimedia Authoring, ACM Transactions on Multimedia Computing, Communications, and Applications , 1(1): 89-109, February 2005. 4. P. Cesar, Graphics Software Architecture for High End Interactive Television Terminals , Helsinki University of Technology, Finland, December 2005 (in print). 5. R. S. Heller, C. D. Martin, N. Haneef, and S. Gievska-Krliu. Using a theoretical multimedia taxonomy framework. ACM Journal of Educational Resources in Computing , 1(1): article number 6, 2001. 6. K. Pihkala. Extensions to the SMIL Language . Doctoral dissertation, Helsinki University of Technology, Finland, November 2003. 7. H. Purchase. Defining multimedia. IEEE Multimedia , 5(1):8-15, 1998. 8. M. Williams. A Taxonomy of Media Usage in Multimedia . Doctoral dissertation, Nova Southeastern University, USA, May 2003. 9. P. Vuorimaa, Multimedia Technology Course (http://www.tml.hut.fi/Opinnot/T-111.350)