<<

c 2014 by the authors; licensee RonPub, Lubeck,¨ Germany. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Open Access Open Journal of Web Technologies (OJWT) Volume 1, Issue 2, 2014 http://www.ronpub.com/journals/ojwt ISSN 2199-188X A Comparative Evaluation of Current HTML5 Web Video Implementations Martin Hoernig, Andreas Bigontina, Bernd Radig

Image Understanding and Knowledge-Based Systems, Technische Universitat¨ Munchen,¨ Boltzmannstr. 3, 85748 Garching, Germany, {hoernig,bigontia,radig}@in.tum.de

ABSTRACT HTML5 video is the upcoming standard for playing videos on the . Although its specification has not been fully adopted yet, all major browsers provide the HTML5 video and web developers already rely on its functionality. But there are differences between implementations and inaccuracies that trouble the web devel- oper community. To help to improve the current situation we draw a comparison between the most important web browsers. We focus on the event mechanism, since it is essential for interacting with the video element. Furthermore, we compare the seeking accuracy, which is relevant for more specialized applications. Our tests reveal varieties of differences between browser interfaces and show that even simple solutions may still need third-party plugins in today’s browsers.

TYPEOF PAPERAND KEYWORDS

Short communication: HTML5 video, events, seeking, user agents, browser, evaluation

1 INTRODUCTION or Silverlight [21]) were usually used. Now, given a standard compliant browser, a After the Web Hypertext Application Technology Work- should be able to achieve comparable or even better re- ing Group (WHATWG) was founded in 2004, the ad- sults with native techniques. But are current browsers vancement of the Hypertext Markup Language (HTML) standard compliant regarding HTML5 video? To exam- [15] soon was pooled mainly in this group. Here, HTML ine the situation, we focused our experiments on the most is advanced version-less as a living standard [3]. In popular browsers and operating systems, which have a terms of this group, HTML5 has become the wrong name market share of over 95% (see, for instance, [2] for a suggesting a complete language [1]. As some parts of browser usage statistic). Hence, Microsoft Ex- the specification are more mature than others (and have plorer [20], Mozilla [23] and Chrome a more reliable browser support), they argue that a ver- [13] were tested on Microsoft Windows [22] and Apple sion schema has become impractical for HTML. How- [6] on Apple OS X [5] (in each case the newest ever, the “official” specification of the W3C [9] aims at a version was used). The HTML5 standard does not spec- stable snapshot of this living standard. In this paper, we ify a video format. Therefore, the supported video for- will only focus on the W3C version of HTML and refer mats vary. We decided to perform our experiments with to it as HTML5, in the same way the W3C does. h264 AVC [29] within MP4 containers [16], which is the HTML5 [9] introduces a standard for the integration only video format supported by all user agents within our of videos or movies into web pages: HTML5 video. test set. Furthermore, it is a widely used industry stan- Before HTML5, browser plugins (like [4] dard for applications like broadcast TV or Blu-ray video.

1 Open Journal of Web Technologies (OJWT), Volume 1, Issue 2, 2014

eration of web applications, a wider range of function- alities within a common environment became possible. < t i t l e>Non−s t o p v id e o ? In earlier times the integration of plugins was necessary to create rich and interactive user experiences. While

2 EVENTS

Listing 1 shows a simple HTML code with JavaScript Events are an important link between HTML and [7] that loads a video (“sample.mp4”) with standard JavaScript and are part of the HTML specification. They browser video controls and waits for the video element are used to communicate all kinds of user input and user to receive the pause event. If the video is started and agent (respectively browser) occurrences to the running paused later on by the user, this event is fired, causing . In the case of the video element, events the playback to continue immediately. According to the are used to treat incidents occurring during the playback specification [9], the pause event is also fired when the of a video. A collection of events important in this sec- video ends. Hence, the JavaScript code should also cause tion and their appearance is shown in Table 1. an endless loop of the video. Some browsers, however, To check the different browsers against a wrong event violate the standard and miss the pause event when the behavior, we created a semi-automatic test setup consist- video reaches the end. Therefore, the behavior is browser ing of video playback with buffer underrun and seeking. dependent even in this simple example. Because we had to force the buffer underrun via a re- More on events and how the implementations handle duction of the network bandwidth with an external ap- them in contrast to the specification is given in Section plication [27], we tested this part manually. Our test pro- 2. Section 3 addresses the seeking capabilities. In par- cedure contains the following commands (1, 2, 5) and ticular, we check whether the shown changes ap- conditions (3, 4): propriately, when the seeked event occurs, by examining the frame number and measuring any deviation from the 1. load: Load a test video, assure that the network bit expected one. rate is high enough for interruption-free playback. Of course the results of both sections may change with Note that autoplay is disabled. newer browser versions and our paper only shows the 2. start: Start playback (via JavaScript) and play the state of the development on the day of its release, but as first seconds. browsers are often in use for a longer period of time, the issues of current browsers remain the problems of tomor- 3. underrun: Force a buffer underrun by slowing row’s web developers. 8, for instance, down the connection while the user-agent tries to a browser released in 2009 and not supporting the video receive more data. element, has five years later still a market share of 21.4% reset according to [2]. 4. : Reestablish a connection with sufficient bandwidth for interruption-free playback. (The playback should start automatically.) 1.1 Related Work 5. seek: Seek forward to an unbuffered position. (The Besides the specifications [9, 15] themselves, recent playback should start as data is forthcoming.) guidebooks to HTML5 video and its possibilities in gen- eral are [18, 26, 12]. With HTML5 enabling a new gen- 6. end: The video reaches the end.

2 Martin Hoernig, Andreas Bigontina, Bernd Radig: A Comparative Evaluation of Current HTML5 Web Video Implementations

Table 1: Summary of events, cited from [9, 4.7.10.16 Test part Events Event summary]

Event Fired when... 1. load loadstart progress The is fetching media data. durationchange suspend The user agent is intentionally not cur- rently fetching media data. resize stalled The user agent is trying to fetch media data, but data is unexpectedly not forth- loadedmetadata coming. loaded- The user agent has just determined the loadeddata metadata duration and dimensions of the media

resource and the text tracks are ready. canplay progress loaded- The user agent can render the media data data at the current playback position canplaythrough for the first time. canplay play

The user agent can resume playback of 2. start tupd the media data, but estimates that if playback were to be started now, the playing media resource could not be rendered at the current playback rate up to its 3. underrun waiting end without having to stop for further buffering of content. 4. reset canplay canplay- The user agent estimates that if play- through back were to be started now, the me- playing resource could be rendered at the progress

current playback rate all the way to its canplaythrough timeupdate end without having to stop for further buffering. 5. seek seeking playing Playback is ready to start after having waiting been paused or delayed due to lack of media data. timeupdate waiting Playback has stopped because the next frame is not available, but the seeked user agent expects that frame to become available in due course. canplay

seeking The seeking IDL attribute changed to progress true, and the user agent has started canplaythrough seeking to a new position. seeked The seeking IDL attribute changed to playing tupd false after the current playback posi- tion was changed. 6. end pause Others: loadstart, abort, error, emptied, durationchange, play, pause, ended, ended timeupdate, resize, ratechange, volumechange Figure 1: Specification conform event sequence. Fired during loading (1), playing (2), buffer under- A video processed in a hypothetical standard conform run start (3) and end (4), seeking (5), and video end user agent would fire an event sequence according to Fig- (6). Horizontally shown events are fired once in the ure 1. This is not the only valid sequence as some op- given order, vertical entries represent events fired re- tions exist, e.g. a user agent can suspend the loading peatedly. step after it has loaded the meta data to reduce band- width and fire the suspend event. A stall event could

3 Open Journal of Web Technologies (OJWT), Volume 1, Issue 2, 2014

also be fired if the buffer underrun (4) did last too long • Safari shows some unpredictable behavior if buffer (user agent specific timeout) and “data is unexpectedly underruns occur, for example: not forthcoming.” [9, 4.7.10.16 Event summary]. How- ever, the pretended buffer underrun (4) is defined to be – No audio is played after the playback starts short enough not to evoke a stall event. Besides such ex- again. ceptions, the events are mandatory and have to occur in – The specification states that a user agent must the given sequence. Since we know the network bit rate show the last rendered frame in the waiting and the video bit rate, we can take the canplaythrough state [9, 4.7.6 The video element], but Safari event as mandatory as well. unpredictably shows a loop of the last frames. The behavior of the test browsers according to the points (1) to (6) is evaluated with the event sequence – The canplaythrough event is fired together a [9]-conform user agent evokes. A compilation of all with a waiting event, but canplaythrough events is shown in Figure 1. Our primary test video is should only be fired if the new internal “Big Buck Bunny” from The Peach Open Movie Project ready state is HAV E ENOUGH DAT A [28], which is also used as a test video at W3C. [9, 4.7.10.7 Ready states]. It is a strange be- Unfortunately, not a single browser passes our test havior if the ready state alternates between setup (see Table 2). Only Chrome fires resize on load HAV E ENOUGH DAT A and something as required, but also sends a timeupdate event, which else without reason, particularly within a should not be present. While the video start is handled buffer underrun. correctly, only Firefox reliably fires the waiting event if Considering today’s conditions, a web application the playback has stopped. Safari does it sometimes. We cannot rely on a unified user agent behavior. While video denoted such behavior as unpredictably missing (um) loading and end detection are no noteworthy problems, as we have not discovered the pattern underneath. The the resize event on load (1) is handled correctly only in playback resume is done right again by Firefox (except Chrome. for the canplaythrough event, see the following list of Since the presence of most of the events is insignifi- observations for details). Then again seeking is handled cant for the example from Listing 1, most browsers will right on Chrome. The most important events in this show the same result. Internet Explorer, however, will context, seeking and seeked, are present all across the not restart the video after it reaches the end. If a buffer test set. In Internet Explorer a pause event is missing underrun would occur, maybe during a broadcast on a when the video ends. mobile connection, the behavior in the event system is in general unpredictable. Most of the tested browser Observations: showed problems with the handling of the waiting event • Internet Explorer fires timeupdate events in waiting to that “playback has stopped because the next state (when no update has to be committed). These frame is not available, but the user agent expects that events must be fired every 15 to 250 ms during the frame to become available in due course.” [9, 4.8.10.15 “time marches on” steps [9, 4.7.10.8 Playing the Event summary]. In this situation, a proper buffer under- media resource], which apply if the current play- run handling is not possible. A web-based player is not back position changes, what is not the case in wait- able to communicate the situation to the user. Moreover ing state. instabilities during the playback were observed during buffer underruns that made a page reload necessary. • Firefox fires the canplaythrough event after buffer- Nevertheless, the events seeking and seeked were ing is completed or halted instead of a bandwidth present across all tested user agents. In the next section, depending solution. In earlier versions of Firefox we are going to examine how reliable they are. (e.g. version 28.0) this was handled correctly. Now, this event seems to be misunderstood. 3 SEEKING • Chrome uses an optional substep within the re- source fetching algorithm [9, 4.7.10.5 Loading the In this section we will examine the seeking capabilities media resource], which is designed for a conserva- of the selected user agents. An implementation compati- tive download strategy, if the user does not request ble with the standard will set the seeking attribute to true the resource actively (e.g. in a preload step). This and fire a seeking event when the currentTime attribute is not true in our case. The resource fetching fires a is changed. When the video data at the requested time lot of suspend events to the video element, which is is available, seeking has to be set to false and a seeked in this sense a wrong behavior. event must be fired.

4 Martin Hoernig, Andreas Bigontina, Bernd Radig: A Comparative Evaluation of Current HTML5 Web Video Implementations

Table 2: Event test results. The user agents undergo our tests with some discrepancies. The candi- dates showed the absence of expected events (missing m, unpredictably missing um) and unexpected events (wrongly fired wf).

User agent, version, 1. load 2. start 3. underrun 4. reset 5. seek 6. end OS Internet Explorer m: canplay, m: canplay, 11.0.9600.16521 m: resize m: waiting playing, m: pause X canplaythrough Windows canplaythrough Firefox m: resize, wrong order; 31.0 m: canplaythrough canplaythrough X X wf: waiting X Windows Chrome m: canplay, 36.0.1985.125 m wf: timeupdate X m: waiting playing, X X Windows canplaythrough m: waiting, Safari um: canplay canplay, 7.0 (9537.71) m: resize um: waiting X playing canplaythrough X OS X playing

Table 3: seeking and seeked events are fired appropri- create preview pictures to certain positions in the video. ately in all tested user agents. User agent, Operating seeking seeked version system event event < t i t l e>Snapshot on seeked? Internet Explorer Windows 100% 100% 11.0.9600.16521

5 Open Journal of Web Technologies (OJWT), Volume 1, Issue 2, 2014

To answer those questions, we created a video that user agent enough time to actually finish seeking before stores the frame number in each image. This is simply we check the frame number. The results can be found in done by a binary coding of this number and drawing it Table 5. as black and white blocks onto the image. Some addi- We find that Internet Explorer has a constant devia- tional checksum bits ensure the correctness of the en- tion from the expected frame by two frames. Looking coding. When playing this video with a user agent, we at this problem in detail we can see that the first frame can retrieve the frame number by drawing the video onto of the video is displayed three times as long as the oth- a canvas element and examining the pixel data. ers. This leads to the observed offset of two frames. As before we seek to 1000 positions in the video and Chrome fails to seek to the correct frame in 1.45% of check the frame number when the seeked event is fired. all cases. When the wrong frame is shown, it is one, This procedure is repeated ten times with different posi- two or three frames off in our experiments. This might tions. Various issues have been observed, depending on be close enough for some applications, but might trou- the user agent. The results are summarized in Table 4. ble others. Finally, Firefox and Safari both seek to the expected frames without deviations. Table 4: We examine whether a seeked event actually Altogether, the experiments show that the seeked indicates a change of the displayed frame and observe event is not very reliable. What is the benefit of a seeked several issues. event, if we cannot be sure that a new frame is shown when it is fired? Furthermore, we have shown small de- seeked event fired without User agent, viations from the expected frame. These are usually not change of the image? version important and most users won’t even take notice of this (e.g. too early) issue. However, it prevents the emergence of application Internet Explorer Happens in 86.17% of all cases. that rely on frame-precise seeking. 11.0.9600.16521 Firefox Never. 4 IMPROVEMENTSTO HTML5 VIDEO 31.0 Chrome Never. As we are interested in a powerful HTML5 video plat- 36.0.1985.125 m to supersede third party plugins as the leading in- Happens when seeking as re- Safari ternet video technology, the portfolio of possible appli- action to a loadedmetadata 7.0 (9537.71) cations should not suffer from a transition to HTML5. event. Examples for possible media-oriented tasks are:

Using Internet Explorer, in 86.17% of all cases it was • video post-processing (edit the pixel data manually, not possible to retrieve the expected frame number from e.g. to change the contrast), a video when the seeked event was fired. However, when • audio equalization, and checking again later a change in the frame number could be observed (the latencies differed depending on video • audio spectrum visualization. and test system). It is thereby likely that the seeked event Unfortunately, these applications are currently not is fired while the video element is still decoding or ren- possible using pure HTML5. Though the video buffer dering the requested frame. Under these conditions, tak- can be obtained and drawn to a canvas (timer-based), it ing the snapshot after seeking, as in Listing 2, is not reli- is not possible to ensure this for every frame, since the ably possible. It is unknown which position in the video frame rate is unknown. The audio data cannot be ac- actually corresponds to the resulting snapshot. cessed anyway. We suggest the following improvements With Safari we observe a too early fired seeked event, to HTML5: when we seek within a callback of the loadedmetadata event. From the statement of the specification that says • Querying detailed meta data information (e.g. about the HAVE METADATA ready state that “The API frame rate, bit rate, etc.) will no longer throw an exception when seeking.” [9, 4.7.10.7 Ready states] we conclude that seeking must • Frame-wise stepping, querying frame number be possible with the occurrence of the loadedmetadata • Audio buffer access and manipulation event. Except for this special case Safari fires the seeked • Manipulating pixel data of the video (e.g. introduc- event as required. Chrome and Firefox also perform ing a new newframe event to control the manipula- these tests without problems. Next, we analyze the ac- tion on a canvas, fired after the availability of a new curacy of the seeking implementation. We give every frame buffer)

6 Martin Hoernig, Andreas Bigontina, Bernd Radig: A Comparative Evaluation of Current HTML5 Web Video Implementations

Table 5: The seeking accuracy is tested by measuring any deviation from the expected frame number.

Average Maximal User agent, Operating Wrong frames absolute absolute version system deviation deviation Internet Explorer Windows 100% 2 2 11.0.9600.16521 Firefox Windows 0% 0 0 31.0 Chrome Windows 1.45% 1.8966 3 36.0.1985.125 m Safari OS X 0% 0 0 7.0 (9537.71)

5 CONCLUSION ACKNOWLEDGMENTS

This work was supported by the German Research Foun- dation (DFG) and the Technische Universitat¨ Munchen¨ HTML5 video is an important step towards the goal of within the funding programme Open Access Publishing. having a unified way of displaying videos on the web without requiring third party plugins. However, when REFERENCES trying to interact with the element via JavaScript and create a user agent independent solution, several issues [1] “HTML is the new HTML5,” http://blog.whatwg. become apparent. We took a close look at the event sys- org/html-is-the-new-, 2011, Accessed: tem of the browsers and compared their behavior. The 2014-09-15. comparison between user agents and the specification showed missing events and differences of implementa- [2] “NetApplications,” http://www.netmarketshare. tions. As a consequence among other issues, there is no com/report.aspx?qprid=0&qptimeframe=M& reliable way of handling an underrun. qpsp=184&qpcustomd=0, 2014, Accessed: 2014- 06-10. The analysis of seeking implementations reveals fur- [3] “What does “Living Standard” ?” ther problems. The seeked event that is fired too early http://wiki.whatwg.org/wiki/FAQ#What does . in Internet Explorer prohibits web applications, for in- 22Living Standard.22 mean.3F, 2014, Accessed: stance, to seek to a position and take a snapshot of 2014-09-15. the video. The inaccuracies in seeking in Chrome pre- [4] Adobe Systems Inc., “,” http: vent meaningful frame by frame stepping through the //www.adobe.com/products/flashplayer, Initial re- video. Those and other bugs force web developers to find lease: 1996. browser-dependent workarounds. As this is a contradic- tion to the paradigm of feature detection-based web de- [5] Apple Inc., “OS X,” http://www.apple.com/osx, velopment [11], HTML5 video needs a more consistent Initial release: 2001. application interface in current browsers. If special ap- [6] ——, “Safari,” http://apple.com/safari, Initial re- plications like web video cutting are planned, we have lease: 2003. to recommend (based on today’s level of knowledge) the usage of third party plugins as seeking accuracy and [7] B. Eich, Ecma International, “ECMAScript Lan- completion are in general no trustworthy information. guage Specification, 5th Edition,” Ecma Interna- tional, Geneva, Switzerland, Tech. Rep., 2009, ECMA-262. We are offering user agent tests, test cases, and test data to developers to evaluate and improve new imple- [8] A. Bakker, R. Petrocco, M. Dale, J. Gerber, mentations. We hope the next versions of the tested V. Grishchenko, D. Rabaioli, and J. Pouwelse, browsers stand our tests. With a growing distribution of “Online video using bittorrent and html5 applied an unified and standard conform video interface, we will to wikipedia,” in Peer-to-Peer Computing (P2P), see more advanced online multimedia applications and 2010 IEEE Tenth International Conference on. user experiences. IEEE, 2010, pp. 1–2.

7 Open Journal of Web Technologies (OJWT), Volume 1, Issue 2, 2014

[9] R. Berjon, S. Faulkner, T. Leithead, E. D. [25] S. Pfeiffer and C. Parker, “Accessibility for the Navara, E. O’Connor, S. Pfeiffer, and HTML5

8 Martin Hoernig, Andreas Bigontina, Bernd Radig: A Comparative Evaluation of Current HTML5 Web Video Implementations

Bernd Radig received his diploma degree in Physics in 1972 from the University of Bonn and the doctor degree in Computer Science in 1978 from the University of Hamburg. In the University of Hamburg he also got his venia legendi and finished his habilitation dissertation in 1982. He was Assistant and Associate Profes- sor in Hamburg (1982-1986) and full professor, chair of Image Understanding and Knowledge-Based Sys- tems, Fakultat¨ fur¨ Informatik, Technische Universitat¨ Munchen¨ (1986-2009). He is a member of the Emeriti of Excellence programme. He was chairman and founder of the Association of Bavarian Research Cooperations (1993-2007), a unique network of scientists, specialising in challenging disciplines in accordance with Bavarian enterprises. In 1988 he founded the Bavarian Research Centre for Knowledge Based Systems (FOR-WISS), an institute common to the three universities TU Munchen,¨ Erlangen and Passau. He was the general chairman of the annual symposium of the German Association for Pattern Recognition in 1981, 1991 and 2001. He was also the general chairman of the European Conference on Artificial Intelligence (ECAI) in 1988. He is active as organizer and programme committee member of the German-Russian Workshop on Pattern Recognition. He holds the German Order of Merit (1992) and the award Pro Meritis Scientiae et Litterarum of the State of Bavaria for outstanding contributions to science and art (2002). His current research activities are in real-time image sequence understanding for applications in robotics, sports or driver assistance systems.

9