It has been a long journey—almost 300 pages about a couple of new elements in HTML. Who would have thought there was this much to learn. However, realistically, we are only at the start of what will be possible with audio and video in the coming years. Right now we are only seeing the most basic functionality of multimedia implemented on the Web. Once the technology stabilizes, publishers and users will follow, and with them businesses, and further requirements and technologies will be developed. However, for now, we have a fairly complete overview of existing functionalities. Before giving you a final summary of everything that we have analyzed in this book, let us mention two more areas of active development.
A.1 Outlook Two further topics deserve a brief mention: metadata and quality of service metrics. Both of these topics sound rather ordinary, but they enable functionalities that are quite amazing.
A.1.1 Metadata API The W3C has a Media Annotations Working Group1. This group has been chartered to create “an ontology and API designed to facilitate cross-community data integration of information related to media objects in the Web, such as video, audio, and images”. In other words: part of the work of the Media Annotations Working Group is to come up with a standardized API to expose and exchange metadata of audio and video resources. The aim behind this is to facilitate interoperability in search and annotation. In the Audio API chapter we have already come across something related: a means to extract key information about the encoding parameters from an audio resource through the properties audio.mozChannels, audio.mozSampleRate, and audio.mozFrameBufferLength. The API that the Media Annotations Working Group is proposing is a bit more generic and higher level. The proposal is to introduce a new Object into HTML which describes a media resource. Without going into too much detail, the Object introduces functions to expose a list of properties. Examples are media resource identifiers, information about the creation of the resource, about the type of content, content rights, distribution channels, and ultimately also the technical properties such as framesize, codec, framerate, sampling-rate, and number of channels. While the proposal is still a bit rough around the edges and could be simplified, the work certainly identifies a list of interesting properties about a media resource that is often carried by the media
1 See http://www.w3.org/2008/WebVideo/Annotations/
297 APPENDIX ■ SUMMARY AND OUTLOOK
resource itself. In that respect, it aligns with some of the requests from archival organizations and the media industry, including the captioning industry, to make such information available through an API. Interesting new applications are possible when such information is made available. An example application is the open source Popcorn.js semantic video demo2. Popcorn.js is a JavaScript library that connects a video, its metadata, and its captions dynamically with related content from all over the Web. It basically creates a mash-up that changes over time as the video content and its captions change. Figure A–1 has a screenshot of a piece of content annotated and displayed with popcorn.js.
Figure A–1. A screenshot of a video mashup example using popcorn.js
A.1.2 Quality of Service API A collection of statistics about the playback quality of media elements will be added to the media element in the near future. It is interesting to get concrete metrics to monitor the quality of service (QoS) that a user perceives, for benchmarking and to help sites determine the bitrate at which their streaming should be started. We would have used this functionality in measuring the effectiveness of Web Workers in Chapter 7 had it been available. Even more importantly, if there are continuously statistics available about the QoS, a JavaScript developer can use these to implement adaptive HTTP streaming.
2 See http://webmademovies.etherworks.ca/popcorndemo/
298 APPENDIX ■ SUMMARY AND OUTLOOK
We have come across adaptive HTTP streaming already in Chapter 2 in the context of protocols for media delivery. We mentioned that Apple, Microsoft, and Adobe offer solutions for MPEG-4, but that no solutions yet exist for other formats. Once playback statistics exist in all browsers, it will be possible to implement adaptive HTTP streaming for any format in JavaScript. This is also preferable over an immediate implementation of support for a particular manifest format in browsers—even though Apple has obviously already done that with Safari and m3u8. It is the format to support for delivery to the iPhone and iPad. So, what are the statistics that are under discussion for a QoS API? Mozilla has an experimental implementation of mozDownloadRate and mozDecodeRate3 for the HTMLMediaElement API. These respectively capture the rate at which a resource is being downloaded in bytes per second, and the rate at which it is being decoded in bytes per second. Further, there are additional statistics for video called mozDecodedFrames, mozDroppedFrames, and mozDisplayedFrames, which respectively count the number of decoded, dropped, and displayed frames for a media resource. These allow identification of a bottleneck either as a network or CPU issue. Note that Adobe has a much more extensive interface for Flash4. A slightly different set of QoS metrics for use in adaptive HTTP streaming is suggested in the WHATWG wiki5: • downloadRate: The current server-client bandwidth (read-only) • videoBitrate: The current video bitrate (read-only) • droppedFrames: The total number of frames dropped for this playback session (read-only) • decodedFrames: The total number of frames decoded for this playback session (read-only) • height: The current height of the video element (already exists) • videoHeight: The current height of the videofile (already exists) • width: The current width of the video element (already exists) • videoWidth: The current width of the videofile (already exists) These would also allow identification of the current bitrate that a video has achieved, which can be compared to the requested one and can help make a decision to switch to a higher or lower bitrate stream. We can be sure that we will see adaptive HTTP streaming implementations shortly after, when such an API has entered the specification and is supported in browsers. This concludes the discussion of HTML5 media technologies under active development.
A.2 Summary of the Book In this book we have taken a technical tour of HTML5
3 See http://www.bluishcoder.co.nz/2010/08/24/experimental-playback-statistics-for-html-video-audio.html 4 See http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStreamInfo.html 5 See http://wiki.whatwg.org/wiki/Adaptive_Streaming#QOS_Metrics
299 APPENDIX ■ SUMMARY AND OUTLOOK
Flash, VLC, and Cortado for Ogg Theora—can help a content provider deliver only a single format without excluding audiences on browsers that do not support that format natively.
The Introductory Chapters In the Audio and Video Elements chapter we had out first contact with creating and publishing audio and video content through the and
Interacting with other HTML Elements In HTML5 Media and SVG we used SVG to create further advanced styling. We used SVG shapes, patterns, or manipulated text as masks for video, implemented overlay controls for videos in SVG, placed gradients on top of videos, and applied filters to the image content, such as blur, black-and-white, sepia, or line masks. We finished this section by looking at the inclusion of as a native element in SVG or through . This, together with SVG transformations, enabled the creation of video reflections or the undertaking of edge detection. For further frame- and pixel-based manipulation and analysis, SVG isn't quite the right tool. In Chapter 6, HML5 Media and Canvas, we focused on such challenges by handing video data through to one or more
300 APPENDIX ■ SUMMARY AND OUTLOOK
a JavaScript process that runs in parallel to the main process and communicates with it through message passing. Image data can be posted from and to the main process. Thus, Web Workers are a great means to introduce sophisticated Canvas processing of video frames in parallel into a web page, without putting a burden on the main page; it continues to stay responsive to user interaction and can play videos smoothly. We experimented with parallelization of motion detection, region segmentation, and face detection. A limiting factor is the need to pass every single video frame that needs to be analyzed by a Web Worker through a message. This massively slows down the efficiency of the Web Worker. There are discussions in the WHATWG and W3C about giving Web Workers a more direct access to video and image content to avoid this overhead. All chapters through Chapter 7 introduced technologies that have been added to the HTML5 specifications and are supported in several browsers. The three subsequent chapters reported on further new developments that have only received trial implementations in browsers, but are rather important. Initial specifications exist, but we will need to see more work on these before there will be interoperable implementations in multiple browsers.
Recent Developments In Chapter 8, on the HTML5 Audio API, we introduced two complementary pieces of work for introducing audio data manipulation functionalities into HTML5. The first proposal is by Mozilla and it creates a JavaScript API to read audio samples directly from an or element to allow it to be rendered, for example, as a waveform or a spectrum. It also has functionality to write audio data to an or element. Through the combination of both and through writing processing functions in JavaScript, any kind of audio manipulation is possible. The second proposal is by Google and introduces an audio filter network API into JavaScript with advanced pre-programmed filters such as gain, delay, panning, low-pass, high-pass, channel splitting, and convolution. The chapter on Media Accessibility and Internationalization introduced usability requirements for and elements with a special regard towards sensory impaired and foreign language users. Transcriptions, video descriptions, captions, sign translations, subtitles, and navigation markers were discussed as formats that create better access to audio-visual content. Support for these alternative content technologies is finding its way into HTML5. We saw the newly defined
I hope your journey through HTML5 media was enjoyable, and I wish you many happy hours developing your own unique applications with these amazing new elements.
301
Index
■ ■ ■
■ Symbols and Numerics @loop attribute element, 154–155 of element, 21–22 element, 155 of element, 13–14 element, 155–156 @media attribute of element, 28–29 element, 155 @muted IDL attribute, 93–95 element @networkState IDL attribute, 107–110 @autoplay attribute, 21 @paused IDL attribute, 101–103 @controls attribute, 22 @playbackRate IDL attribute, 104–107 @loop attribute, 21–22 @played IDL attribute, 118–120 @preload attribute, 22–23 @poster attribute of element, 14–15 @src attribute, 21 @preload attribute description of, 20–21 of element, 22–23 @autoplay attribute of element, 18–20 of element, 21 @readyState IDL attribute, 110–113 of element, 13 @seekable IDL attribute, 120–122 @buffered IDL attribute, 115–119 @seeking IDL attribute, 100–101 @controls attribute @src attribute of element, 22 of element, 21 of element, 17–18 of element, 23–24 @currentSrc IDL attribute, 85–86 of element, 12 @currentTime IDL attribute, 97–100 @startOffsetTime attribute, 89 @data IDL attribute, 284 @startTime IDL attribute, 88–89 @defaultPlaybackRate IDL attribute, 104–107 @type attribute @duration IDL attribute, 89–92 of element, 284 @ended IDL attribute, 103–104 of element, 24–27 @error IDL attribute, 113–115 @videoHeight IDL attribute, 95–96 @height attribute of element, 15–17 @videoWidth IDL attribute, 95–96 @initialTime attribute, 89 @volume IDL attribute, 92–93 @kind attribute of
303 ■ INDEX
feature of SVG, 137 for foreign users, 254 element for hard-of-hearing users @media attribute, 28–29 captions, 250–251 @src attribute, 23–24 sign translation, 252–253 @type attribute, 24–27 transcripts, 251–252 description of, 23 interactive transcripts, implementing, 3D transforms, 71–74 256–258 3GPP Timed Text format, 271–273 navigation
304 ■ INDEX
attributes rendering spectrum, 230–232 boolean, 13 rendering waveform, 227–230 content, 82–83 resources, 231 content compared to IDL, 82 single-frequency sound, creating, IDL 232–233 @buffered, 115–119 sound from another audio source, @currentSrc, 85–86 creating, 233–234 @currentTime, 97–100 tone generator, 237–239 @defaultPlaybackRate, 104–107 overview of, 223–224 @duration, 89–92 audio codecs, 3, 20 @ended, 103–104 audio data @error, 113–115 generating @muted, 93–95 bleeping out section with sine wave, 236–237 @networkState, 107–110 continuous playback, 234–236 @paused, 101–103 single-frequency sound, creating, @playbackRate, 104–107 232–233 @played, 118–120 sound from another audio source, @readyState, 110–113 creating, 233–234 @seekable, 120–122 tone generator, 237–239 @seeking, 100–101 reading @startTime, 88–89 extracting samples, 224–226 @videoHeight, 95–96 framebuffer, 226–227 @videoWidth, 95–96 rendering spectrum, 230–232 @volume, 92–93 rendering waveform, 227–230 from content attributes, 82 audio devices. See external devices general features of media resources, 84 element overview of, 83–84 @autoplay attribute, 21 playback-related features of media @controls attribute, 22 resources, 97 @loop attribute, 21–22 states of media elements, 107 @preload attribute, 22–23 audio API @src attribute, 21 filter graph creating with sine wave, 232 advanced filters, 240–241 description of, 20–21 basic reading and writing, 239–240 audio filters, 223 resources, 245 Audio Incubator Group (W3C), 223, 239 reverberation effect, creating, 241–242 audio samples waveform display, 243–245 reading Mozilla from audio element, 224 bleeping out section with sine wave, from video element, 225–226 236–237 rendering continuous playback, 234–236 in spectrum, 230–231 extracting samples, 224–226 in waveform, 227–228 framebuffer, 226–227 in waveform with history, 229 generating data, 232 audio translations for foreign users, 254
305 ■ INDEX
AudioContext filter on Web Workers region segmentation, node types and, 240 217 reading and writing sound using, 239–240 on Web Workers sepia example, 208 rendering waveform data from, 243–244 RTSP support by, 37 reverberation effect, creating, 241–242 support for CSS3 functionalities by, 79 AudioDestinationNode, 240 support for MIME types by, 25–27 @autoplay attribute video codecs natively supported by, 9–10 of element, 21 visible controls of, 41–44 of element, 13 @buffered IDL attribute, 115–119 autostarting buffering needs and @preload attribute, 18–20 audio, 21 burnt-in captions, 251 video, 13 burnt-in sign translations, 252 burnt-in technologies, 255 ■ B Byte Range requests (HTTP), 36 background-color property (CSS), 62 backgrounds, replacing with pixel ■ C transparency, 176–177 canPlayType() control method, 126–127 baseline codec Canvas debate about, 3–5 animations through user interaction, description of, 2 198–200 bleeping out section of audio with sine wave, bringing into SVG image, 165 236–237 compositing block box type, 55 clipping regions, 188–190 blur filter, SVG-defined, 150 gradient transparency masks, 185–188 boolean attributes, 13 drawing text, 190–192 box model for visual elements function categories supported by, 165–166 absolute positioning mode, 59–60 getImageData() function, 207 block box type, 55 performance of, 165 float positioning mode, 58–59 putImageData() function, 207 inline box type, 52–54 styling none box type, 54–55 ambient CSS color frames, 181–183 overview of, 50–52 createPattern() function, 183–185 positioning modes, 52 pixel transparency to replace relative positioning mode, 55–58 backgrounds, 176–177 scaling and alignment within box, 60–62 scaling pixel slices for 3D effect, 178–179 browser plug-ins, media support in, 1 transformations browsers reflections effect, 192–195 See also JavaScript; specific browsers spiraling effect, 195–196 audio codecs natively supported by, 20 video element context menus, 45–47 createImageData() function, 173–175 controls in, 47–48 drawImage() function, 166–169, 175–176 introduction of video support into, 6 drawImage() function, extended version performance of of, 169–171 on Web Workers motion detection, 212 getImageData() function, 171–173 putImageData() function, 171–173
306 ■ INDEX
captions @controls attribute for hard-of-hearing users, 250–251 of element, 22 tracks marked as, 268 of element, 17–18 in WebSRT, 261–263 convolution matrix filter, 151 cascading style sheets. See CSS createImageData() function, 173–175 chapter markers for navigation, 277 createPattern() function, 183–185 chapter tracks in WebSRT, 264–265 CSS (cascading style sheets) chapters, tracks marked as, 268–269 ambient color frames, 181–183 Chrome user interface animations, 76–78 context menus, 46 basic properties visible controls, 43 linear-gradient and radial-gradient, clear audio, 253 64–65 click event, attaching on
307 ■ INDEX
debugging Web Workers, 213 WebM video, 34–35 @defaultPlaybackRate IDL attribute, 104–107 @ended IDL attribute, 103–104 Described Video, 248–249 enhanced captions, 251 element, 283–284 Ericsson implementation devices. See external devices of element in Webkit, 284 Dirac codec, 4 of self-view video, 285 displacement map filter, 151 of video chat web application, 293–295 display @error IDL attribute, 113–115 of audio resources on web pages, 22 event listeners, methods for setting up, 85 of audio spectrum, 231 events (JavaScript), 127–130 of audio waveform, 228 extended audio and text descriptions, 248 of audio waveform with history, 229 extending HTTP streaming, 38–40 jof formatted text in captions, 251 external audio or text descriptions, 249 of latency between playback of original external captions, 251 audio element and scripted audio, 234 external devices Document Object Model (DOM), 81 architectural scenarios for use of, 283 DOM attributes. See IDL attributes ConnectionPeer API, 295–296 download behavior element, 283–284 controlling, 18–20 Stream API HTTP progressive download, 36–38 displaying captured video from device in drawFrame() function, 216 element, 285–286 drawImage() function, 166–169, 175–176 recording, 286–287 drawImage() function, extended version of, WebSocket API 169–171 message exchange, 289–291 drawing text, 190–192 overview of, 288–289 dubbing, 254 shared video control, 291–293 @duration IDL attribute, 89–92 video conferencing, 293–295 Dynamic Adaptive Streaming for HTTP (DASH) external sign translations, 253 specification, 39 extracting audio samples, 224–226 extracting rectangular subregion of video pixels ■ E and painting onto canvas, 169–171 echo service, WebSocket example of, 288–289 edge detection, 161–162 ■ F effects in SVG face detection with Web Workers, 217–222 clip-paths, 148–149 fading opacity of objects, 185 filters, 149–154 fallback content, 11–12 overview of, 147 false positives on face detection, 218, 222 empty element, element as, 24 FFmpeg encapsulation formats, 3 to convert multimedia files between encoding media resources formats, 30–31 MP3 audio, 35 to encode audio MPEG-4 H.264 video, 30–32 to MP3, 35 Ogg Theora video, 32–33 to Ogg Vorbis, 35 Ogg Vorbis audio, 35 ffmpeg2theora encoder, 32–33
308 ■ INDEX
filter effects in SVG, 149–154 continuous playback, 234–236 filter graph audio API single-frequency sound, creating, 232–233 advanced filters, 240–241 sound from another audio source, creating, basic reading and writing, 239–240 233–234 resources, 245 tone generator, 237–239 reverberation effect, creating, 241–242 getByteTimeDomainData() method, 244 waveform display, 243–245 getImageData() function, 171–173, 207 filter graphs, 223 “globalCompositeOperation” property of Firefox Canvas, 185–187, 191 See also Mozilla, audio data API Google manipulating inline SVG with JavaScript in, Chrome user interface 140 context menus, 46–47 SVG masks used by with visible controls, 43 “objectBoundingBox”, 139 Videos and Flash, 2 user interface WebM project, 5–7 context menus, 45–46 GPAC TTXT format, 272 visible controls, 41 gradient transparency masks, 185–188 Flash (Adobe) gradients on SVG masks, 145–146 purchase of, 2 grammatical markup SVG and, 135 for users with learning disabilities, 254 Flash (Macromedia), introduction of, 2 in WebSRT, 266–267 float positioning mode, 58–59 gray-scaling, 209–210 FLV format, 2 feature of SVG, 137 ■ H foreign users, alternative content technologies H.264 standard, 4–5 for, 254 hard-of-hearing users, alternative content formats technologies for See also Ogg Theora video format; Ogg captions, 250–251 Vorbis audio format sign translation, 252–253 converting multimedia files between, 30–31 transcripts, 251–252 encapsulation, 3 HAVE_CURRENT_DATA state, 110 FLV, 2 HAVE_ENOUGH_DATA state, 111 GPAC TTXT, 272 HAVE_FUTURE_DATA state, 110 MP4, 2 HAVE_METADATA state, 110 qttext, 271 HAVE_NOTHING state, 110 3GPP Timed Text, 271–273 @height attribute of element, 15–17 formatted text displayed in captions, 251 Hickson, Ian, 4 fortune cookie video with user interactivity, hierarchical navigation, 265 198–199 HTML Canvas. See Canvas framebuffer, 226–227 HTML markup and
309 ■ INDEX
hyperlinks in-band sign translations, 253 for hard-of-hearing users, 251 in-band time-synchronized text tracks in media resources, 280 MPEG container, 271–273 for vision-impaired users, 250 Ogg container, 270 overview of, 269–270 ■ I WebM container, 271 identification of skin color regions with Web @initialTime attribute, 89 Workers, 217–222 inline box type, 52–54 IDL attributes inline SVG @buffered, 115–119 description of, 136 @currentSrc, 85–86 manipulating with JavaScript in Firefox, 140 @currentTime, 97–100 with video element in XHTML, 157 @defaultPlaybackRate, 104–107 with video element in XHTML and circular @duration, 89–92 mask, 158 @ended, 103–104 interacting with content, 249–250 @error, 113–115 interactive transcripts @muted, 93–95 for hard-of-hearing users, 252 @networkState, 107–110 implementing, 256–258 @paused, 101–103 internationalization, 247, 254 @playbackRate, 104–107 Internet Explorer. See IE @played, 118–120 @readyState, 110–113 ■ J @seekable, 120–122 JavaScript @seeking, 100–101 content attributes, 83 @startTime, 88–89 control methods defined on media @videoHeight, 95–96 elements @videoWidth, 95–96 canPlayType(), 126–127 @volume, 92–93 load(), 122–123 from content attributes, 82 pause(), 124–126 general features of media resources, 84 play(), 124–125 overview of, 83–84 custom controls, 130–134 playback-related features of media DOM and, 81 resources, 97 event listeners, methods for setting up, 85 states of media elements, 107 events, 127–130 IE (Internet Explorer) manipulating inline SVG with, in Firefox, user interface 140 context menus, 47 operations on video data, moving to Worker visible controls, 44 thread and feeding back to main web page, 204–208 Web Workers and, 204 overview of, 81–82 image overlays for hard-of-hearing users, 251 JavaScript API for time-synchronized text image segmentation with Web Workers, 212–217 MutableTimedTrack interface, 273–274 in-band audio or text descriptions, 249 TimedTrack interface, 274–275 in-band captions, 251 TimedTrackCue interface, 275 JaveScriptNode filter, 245
310 ■ INDEX
■ K document Karaoke-style subtitles in WebSRT, 265–266 with absolutely positioned video elements, 59 Kate overlay codec, 270 with floating video element, 58 keyboard navigation, 278 with gradient on video element, 64–65 keyframes, 76 with MPEG-4 video, 12 @kind attribute of
311 ■ INDEX
fortune cookie video with user interactivity, Ogg video 198–199 with @autoplay, 13 function to gain x and y coordinates of click with @controls attribute, 17 in canvas, 199 with @poster, 14 gradient transparency mask, introducing with @preload of “none”, 19 into ambient video, 186–187 Ogg Vorbis audio gray-scaling of video pixels, 209 with @loop attribute, 21 “Hello World” in HTML5, 10 with preload set to “metadata”, 23 hidden video element, style for, 54 opacity example using video, 63 IDL attributes painting video frames at different offsets of audio and video, 83–84 into canvas, 167–169 from content attributes, 82 patterns, filling canvas region with, 183–184 inline SVG, manipulating with JavaScript in paused value for media element, getting, Firefox, 140 101–102 inline SVG with video element pixel transparency through canvas, 176 in XHTML, 157 plain transcript, providing for video in XHTML and circular mask, 158 element, 255 inline video element, style for, 55 play() and pause() control methods, using, interactive transcript, providing for video 124–125 element, 256 played ranges for media element, getting, JavaScript 118 to create TimedTrack and cues in script, playing back all samples from input source, 274 235 dealing with time offsets on page hash, qttxt format used for QuickTime text tracks, 280 271 Kate file format used for Ogg time- reading synchronized text encapsulation, 270 audio metadata for audio framebuffer, latency between playback on original audio 226 element and scripted audio, event time for audio framebuffer, 227 displaying, 234 reading and writing sound using media content attribute, getting and setting AudioContext, 239 in JavaScript, 83 reading audio samples motion animation used as mask, 156 from audio element, 224 motion detection of video pixels, 210–211 from video element, 225–226 MP3 audio readyState values for media element, with @controls attribute, 22 getting, 111 with preload set to “auto”, 23 recording media from device, 286 MPEG-4 video reflection effect using canvas, 192–193 with @preload of “metadata”, 19 rendering with @width and @height with incorrect video in 2D canvas with 3D effect, aspect ratio, 16 178–179 muted value for media element, getting and waveform data from AudioContext filter, setting, 94 243–244 networkState values for media element, rendering audio samples getting, 108 in spectrum, 230–231 in waveform, 227–228
312 ■ INDEX
in waveform with history, 229 tone generator, Web-based, 237–238 reverberation effect, creating, 241–242
313 ■ INDEX
with Japanese subtitles and rendering defaultPlaybackRate and playbackRate instructions, 263 values, 105 with Karaoke-style subtitles, 265–266 duration value for, 90 with Russian subtitles, 263 ended value for, 103 with text description, 259–260 error value for, 114
314 ■ INDEX
mozCurrentSampleOffset() method, 234 @muted IDL attribute, 93–95 Mozilla audio data API ■ N bleeping out section with sine wave, named media fragment URIs, 279 236–237 navigation continuous playback, 234–236 accessibility and, 255 extracting audio samples, 224–226 between alternative content tracks for framebuffer, 226–227 vision-impaired users, 250 generating data, 232 chapter markers for, 277 overview of, 223–224 hierarchical, 265 rendering spectrum, 230–232 into content for vision-impaired users, 250 rendering waveform, 227–230 keyboard navigation, 278 resources, 231 media fragment URIs, 278–280 single-frequency sound, creating, out of content for vision-impaired users, 232–233 250 sound from another audio source, overview of, 276 creating, 233–234 URLs in cues, 280–281 tone generator, 237–239 @networkState IDL attribute, 107–110 HTMLMediaElement API, 299 node types and AudioContext, 240 mozSetup() method, 232–233 node.js web server, 289 mozWriteAudio() method none box type, 54–55 creating sound from another audio source, normal positioning mode, turning on, 52 233–234 return value on, 234 ■ MP3 audio O with @controls attribute, 22 object-fit property, 61 embedding in HTML5, 21 object-position property, 61 encoding, 35 Ogg container, 270 with preload set to “auto”, 23 Ogg Theora video format MP4 format, 2 description of, 3–6 MP4 H.264/AAC support, 5–7 encoding, 32–33 MPEG container and 3GPP Timed Text format, Ogg video 271–273 embedding in HTML5, 10 MPEG-4 AAC (Advanced Audio Coding), 3 with @autoplay, 13 MPEG-4 H.264 video, encoding, 30–32 with @controls attribute, 17 MPEG-4 video with @poster, 14 document with, 12 with @preload of “none”, 19 embedding Ogg Vorbis audio format in HTML5, 10 description of, 4 with fallback content, 11 embedding in HTML5, 21 with @preload of “metadata”, 19 encoding, 35 with @width and @height with incorrect with @loop attribute, 21 aspect ratio, 16 with preload set to “metadata”, 23 MutableTimedTrack interface (JavaScript API), oggenc to encode audio to Ogg Vorbis, 35 273–274 opacity property (CSS), 63–64
315 ■ INDEX
open audio descriptions, 249 @playbackRate IDL attribute, 104–107 open captions, 251 play() control method, 124–125 open sign translations, 252 @played IDL attribute, 118–120 Opera play/pause toggle, styling in SVG, 143–144 experimental build of, 3 playFrame() function for image segmentation, user interface 213–215 context menus, 46 Popcorn.js semantic video demo, 298 visible controls, 43–44 @poster attribute of element, 14–15 element and, 2 postFrame() function, 216 Web Workers and, 208 @preload attribute outlines, creating with SVG, 137–141 of element, 22–23 of element, 18–20 ■ P progress event, 86 pages, reloading, 86 progressive download page text technologies, 255, See also transcripts description of, 18 paintFrame() function, 173, 178–179 publishing media files to web pages, 36–38 painting properties, CSS sepia colored video replica into Canvas, linear-gradient and radial-gradient, 64–65 204–205 marquee, 66–67 video frames at different offsets into canvas, opacity, 63–64 167–169 overview of, 62–63 parallel playback, 36 pseudo-classes, 49 patterns on SVG masks, 146–147 pseudo-elements, 49 pause() control method, 124–126 publishing @paused IDL attribute, 101–103 media files to web pages perceiving video content, 248–249 extending HTTP streaming, 38–40 performance of browsers streaming using RTP/RTSP, 37–38 on Web Workers motion detection, 212 web servers and HTTP progressive on Web Workers region segmentation, 217 download, 36–37 on Web Workers sepia example, 208 sites, introduction of video support into, 6 pillar-boxing video, 17 putImageData() function, 171–207 pixel slices, scaling for 3D effect, 178–179 pixel transparency to replace backgrounds, ■ Q 176–177 qttext format, 271 plain transcripts quality of service API, 298–299 for hard-of-hearing users, 252 QuickTime text tracks, 271 implementing, 255–256 playback ■ R continuous, 234–236 radial-gradient property (CSS), 64–65 parallel, 36 reading playback-related features of media audio data resources, 97 extracting samples, 224–226 restarting framebuffer, 226–227 audio after finishing, 21–22 rendering spectrum, 230–232 video after finishing, 13
316 ■ INDEX
rendering waveform, 227–230 @seekable IDL attribute, 120–122 with filter graph API, 239–240 @seeking IDL attribute, 100–101 @readyState IDL attribute, 110–113 selectors (CSS), 49 Real-Time Protocol/User Datagram Protocol self-view video, 285–286 (RTP/UDP), 38 sepia coloring of video pixels Real-Time Streaming Protocol, 37–38 JavaScript Web Worker, 207 RealtimeAnalyzerNode filter, 243–245 painting into Canvas, 204–205 recording media from devices, 286–287 using Web Workers, 206–207 reflections, 63, 160–161 servers. See web servers; WebSocket server reflections effect, 192–195 setInterval() function, 198 region segmentation with Web Workers, setTimeout() function, 167, 198 212–217 shapes, creating with SVG, 137–141 relative positioning mode, 55–58 shared video control with WebSocket API, reloading pages, 86 291–293 rendering sign translations for hard-of-hearing users, audio spectrum, 230–232 252–253 audio waveform, 227–230 sine waves waveform data from AudioContext filter, bleeping out section of audio with, 236–237 243–244 creating audio element with, 232–233 rendering instructions in WebSRT, 264 single-frequency sound, creating, 232–233 repeat() function, 196 skin color regions, identification of with Web replicating images into regions, 183–185 Workers, 217–222 resource locations of media elements, 85 SMIL (Synchronized Multimedia Integration restarting Language), 1 audio after finishing playback, 21–22 sniffing, 27 video after finishing playback, 13 sound data. See audio API restore() function, 193 element reverberation effect, creating, 241–242 @media attribute, 28–29 rotate() function, 195–196 @src attribute, 23–24 RTP/RTSP, streaming using, 37–38 @type attribute, 24–27 spatial media fragment URIs, 279 ■ S specifications, checking, 9 Safari user interface spectrum, rendering audio samples in, 230–232 context menus, 46 spiraling effect, 195–196 visible controls, 42 @src attribute sample buffer, feeding, 234–235 of element, 21 save() function, 193 of element, 23–24 Scalable Vector Graphics. See SVG of element, 12 scaling SRT files, WebSRT compared to, 260 pixel slices for 3D effect, 178–179 @startOffsetTime attribute, 89 video, 15–17, 60–62 @startTime IDL attribute, 88–89 scene text translations, 254 Stream API scratch canvas, and tiling video, 175–176 displaying captured video from device in element, 285–286 search technology, and audio or video content, 247 recording, 286–287
317 ■ INDEX
streaming styling HTTP, extending, 38–40 gradients, 145–146 RTP/RTSP, using, 37–38 element, 139 styling patterns, 146–147 See also CSS play/pause toggle, 143–144 in Canvas text elements, 141–143 ambient CSS color frames, 181–183 using video element inside SVG resources, createPattern() function, 183–185 156–158 pixel transparency to replace synchronized media technologies backgrounds, 176–177 challenges of, 275–276 scaling pixel slices for 3D effect, 178–179 description of, 255 in SVG Synchronized Multimedia Integration gradients, 145–146 Language (SMIL), 1 element, 139 synchronized text technologies patterns, 146–147 description of, 255, 258 play/pause toggle, 143–144 HTML markup and
318 ■ INDEX
3D transforms, 71–74 two-pass encoding, 31 3GPP Timed Text format, 271–273 @type attribute tiling video into canvas of element, 284 naïve implementation of, 169 of element, 24–27 reimplementation of with createImageData, 173–176 ■ U with getImageData, 171 URLs in cues, 280–281 with two canvases, 175–176 usability. See accessibility TimedTrack interface (JavaScript API), 274–275 user interaction TimedTrackCue interface (JavaScript API), 275 animations through, 198–200 TimeRanges object, 115, 120 tone generator settings, changing, 237–239 time-synchronized text user interfaces for media elements in-band context menus, 45–47 MPEG container, 271–273 controls in web browsers, 47–48 Ogg container, 270 visible controls, 41–44 overview of, 269–270 user preference settings, 249 WebM container, 271 JavaScript API for ■ V MutableTimedTrack interface, 273–274 video codecs, 3 TimedTrack interface, 274–275 video conferencing with WebSocket API, TimedTrackCue interface, 275 293–295 timeupdate event, 92–93, 167 video devices. See external devices toDataURL() function, 165 video element tone generator, 237–239 @autoplay attribute, 13
319 ■ INDEX
plain transcript for, 255 encoding, 35 putImageData() function, 171–173 with @loop attribute, 21 relative positioned, 56 with preload set to “metadata”, 23 rotated, 70–73 VP8 video codec standardization of, 2–7 description of, 5 styled, 50–51 support by Adobe, 7 support for, 9–10 Texas Instruments and, 5 that transitions on mouse-over, 68–69 videoWidth and videoHeight values for, 95 ■ W video gallery, creating, 74 W3C (World Wide Web Consortium) @videoHeight IDL attribute, 95–96 Audio Incubator Group, 223, 239 video pixels description of, 3 data from, introducing into canvas, 166 Media Annotations Working Group, 297 segmentation of Media Fragment Working Group, 278 playFrame() function, 213–215 WAV audio postFrame() and drawFrame() with @autoplay attribute, 21 functions, 216–217 embedding in HTML5, 21 sepia coloring of with preload set to “none”, 22 painting into Canvas, 204–205 waveform using Web Workers, 206–207 display of, 243–245 video player, custom, building, 130–134 rendering audio samples in, 227–230 video support, introduction of into main web browsers. See browsers; specific browsers browsers and video publishing sites, 6 Web interface definition language (WebIDL), @videoWidth IDL attribute, 95–96 purposes of, 81 viewport, 15 web pages, reloading, 86 vision-impaired users, alternative content web servers technologies for node.js, 289 deaf-blind users, 253 publishing media files to web pages, 36–37 interacting with content, 249–250 Web Subtitle Resource Tracks. See WebSRT perceiving video content, 248–249 Web Workers visual elements face detection, 217–222 absolute positioning mode, 59–60 functions and events, 203 box model for IE and, 204 block box type, 55 motion detection with inline box type, 52–54 gray-scaling, 209–210 none box type, 54–55 implementation of, 210–212 overview of, 50–52 overview of, 208 positioning modes, 52 moving operations on video data to thread float positioning mode, 58–59 and feeding back to main web pages, relative positioning mode, 55–58 204–208 scaling and alignment within box, 60–62 overview of, 203 @volume IDL attribute, 92–93 region segmentation, 212–217 Vorbis audio format Web-audio API. See filter graph audio API description of, 4 WebIDL (Web interface definition language), embedding in HTML5, 21 purposes of, 81
320 ■ INDEX
WebM container, 271 WHATWG WebM project (Google), 5–7 adaptive HTTP streaming and, 299 WebM video ConnectionPeer API, 295–296 with @autoplay and @loop, 13 @width attribute of element, 15–17 embedding in HTML5, 10 World Wide Web Consortium (W3C) encoding, 34–35 Audio Incubator Group, 223, 239 with @preload of “auto”, 19 description of, 3 with @width and @height, 15 Media Annotations Working Group, 297 WebSocket API Media Fragment Working Group, 278 message exchange, 289–291 writing with filter graph API, 239–240 overview of, 288–289 shared video control, 291–293 ■ X video conferencing, 293–295 x264 encoding library, 30 WebSRT (Web Subtitle Resource Tracks) XHTML captions, 261–263 inline SVG with video element in, 157 chapter tracks, 264–265 inline SVG with video element in, and grammatical markup, 266–267 circular mask, 158 Karaoke-style subtitles, 265–266 rendering instructions, 264 ■ Y, Z subtitles, 263–264 YouTube and Flash, 2 text description, 259–261