Mobile Vision Mixer

A System for Collaborative Live Mobile Production

Ramin Toussi

Department of Computer and Systems Sciences

January 2011

Advisor: Oskar Juhlin Second Advisor: Arvid Engstrom¨ Examiner: Fredrik Kilander

This masters thesis corresponds to 30 credits. IV

Summary. Mobile phones equipped with video cameras and access to high-bandwidth net- works enable a new generation of mobile applications: live amateur . There are already mobile technologies that allow capturing, editing and broadcasting live video from mobile phones; however some parts of production techniques still remain exclusively in the hands of professionals; multi-camera filming in a live broadcast of sporting events is an obvious example for such. In this thesis, a system is described that is developed to address these needs for amateur video producers. Specifically, it focuses on providing a real-time collaborative mobile environment that can be used by mobile users who are interested in making live video from various angles only by using their phones. A user study was also conducted to evaluate the system and see how people would use the system to collaboratively produce a live video. Results from the study prove that although pro- ducing a live mobile video is not always easy and straightforward, features like live preview and being able to actively communicate with each other are of great importance and help. Acknowledgments

This project would have never become possible without the contribution of several people. First and foremost I would like to thank my advisor and MobileLife centre director, Oskar Juhlin. Thank you for the great support, for trusting me and guiding me through the project. I also want to thank Arvid Engstrom,¨ my second advisor and Mobility Studio director at MobileLife, whose knowledge and talent have always inspired me. Arvid also led the team while evaluating the prototype in Malmo¨ and Goteborg.¨ When it comes to the evaluation of the work, acknowledgments should go to Alexan- dra Weilenmann and Emelie Dahlstrom¨ as well, with whom I had fantastic experi- ences. Great thank you to Mahak Memar and Mudassar Ahmed Mughal, my friends and colleagues at SICS and MobileLife who helped us running tests by acting as remote in Stockholm. I also have to mention all those anonymous young people who volunteered to participate in our test sessions. Emelie wrote a separate report about the evaluation. It was written in Swedish; but later on, she helped me translate some parts of it to English, for this thesis and the paper. Tack sa˚ mycket Emelie! The project, the dissertation and all other related reports and papers received some invaluable comments from other contributors to this effort. First and most of all I want to thank Goranka Zoric (Goga) for the fruitful discussions we had when writing the paper. She really inspired me by her knowledge and patience. This also includes Kari Gustafsson, Michael Kitzler and Per Sivborg with whom we collaborated a lot for patenting the idea; the thesis and particularly its technical parts was excellently aspired by these people. While implementing, I took advice and technical support from a couple of people from external companies. Among them, I mention Bambuser and MediaLooks who provided us with resources about their services. Also, they were both smart and fast in responding to my technical questions. Specially, I want to mention Mans˚ Adler from Bambuser and Hanno Roding from MediaLooks particularly. VI

A very special thanks to Fredrik Kilander, my teacher at DSV, KTH Interactive System Engineering program manager and this thesis examiner. I believe I was so lucky to meet you and have you as my examiner. You most impressed with all your kindness, patience and commitment to your job and to your students. The present dissertation also received several excellent comments from you. Thank you so very much. Acknowledgments as well to all people at MobileLife, SICS and Interactive Insti- tute, for every wonderful moment we shared together. And last but not least, I have to thank my family and friends from the bottom of my heart. My parents, for always being there; my father who has always been a real father, supportive, caring and compassionate. To mum, for all the goods you have in your soul, your love and devotion. My brother and his wife, for all the laughter we had together and that you will be amazing parents soon! Contents

1 Introduction ...... 1 1.1 Research Problem ...... 4 1.2 Methodology...... 5 1.3 Contribution ...... 6 1.4 Layout ...... 6

2 Background ...... 9 2.1 Video in HCI and CSCW ...... 10 2.2 Video Production ...... 10 2.2.1 Professional Production ...... 11 2.2.2 Amateurs and semi-professionals practices ...... 12 2.2.3 Comparison and conclusion ...... 13 2.3 Mobile Broadcasting, More Mobility ...... 14 2.4 Related Work ...... 14

3 System overview ...... 17 3.1 Inspiration and Implication for Design ...... 17 3.2 Employed Technologies and Components ...... 18 3.3 Ideal Architecture ...... 21 3.4 Use Scenario ...... 21 3.5 First Attempts and Lessons Learnt ...... 23 3.6 Implemented Architecture ...... 25 3.6.1 Bambuser API ...... 27 3.6.2 Broadcasting with Media Live Encoder ...... 27 3.6.3 Combiner Process ...... 27 3.6.4 Switch Process ...... 28 3.6.5 Vision Mixer Mobile Application ...... 29 3.6.6 Communication ...... 30 3.7 Further Technical Improvements ...... 30 VIII Contents

4 System Evaluation ...... 33 4.1 Method and Setting ...... 33 4.2 Study Results ...... 34 4.3 Problems Found ...... 35 4.4 Discussion ...... 36

5 Conclusion and Further Work ...... 39 5.1 Further Work ...... 39

References ...... 41 List of Figures

1.1 SKY Sport24 news channel ...... 3 1.2 A typical combination of live streams ...... 4 1.3 The Mobile Vision Mixer application prototype running on a Nokia N86 8MP phone ...... 5

3.1 FLV playback with DirectShow in GraphEdit ...... 20 3.2 Vision Mixer Architecture ...... 22 3.3 Mobile Vision Mixer in Operation ...... 22 3.4 Simplified Vision Mixer Architecture ...... 26 3.5 Mobile Vision Mixer in Operation, in a Simplified Architecture ...... 26 3.6 Combiner Flash component layout ...... 28 3.7 Abstract Model of Communication and Data Flow in Mobile Vision Mixer...... 30 3.8 Conceptual Design of the Ideal Video Combiner and Switch Component Integration...... 32

4.1 Evaluation in Malmo...... ¨ 34 4.2 Codirecting with MVM ...... 35

List of Tables

3.1 Some of the available metadata fields contained in a typical video object returned by Bambuser ”getVideos” API function ...... 27

O 1

Introduction

This thesis is reporting on the Mobile Vision Mixer (MVM)system, an application prototype than can provide mobile users with a real-time collaborative environment by which they can make live broadcast with their own of any event. It can be specifically useful for mobile users who are interested in video practices. This work can be recognized as a significant step forward in mobile video; between July and December 2010 it gained some press interests1, designated the title of innovation while a patent is also pending and is expected to be finalized soon2. Having features like video cameras and high-bandwidth network access inte- grated into the recent(2010) mobile technologies, mobile users are now enabled to have a firsthand ability of social media creation. With this, mobile phones are now beyond communication and passive media consumption devices. This integration, by taking advantage of a sheer peculiar characteristic of mo- bile devices, being available everywhere and all the time, has ended up with the emergence and development of new services for immediate publishing and sharing of live image and video [16, 26, 27]. ComVu Pocket Caster, launched in 2005 which later on was renamed Livecast is the pioneer in live mobile video publishing. In the years that followed, more services were introduced like Qik, Kyte, Flixwagon, Floobs, Next2Friends, Stickam, Ustream and Bambuser3; among which, Qik and Bambuser are the most widely used [22]. Employing this sort of ”capture-and-share-straightaway” [23] services allows peo- ple to instantly share their captured mobile images through manageable web pages instead of using emails, paper prints and web publishing [23,26]. Mobile phones in this way enhance a shared experience among the spectators of a live event. More- over, in distributed events like car rally or bicycle racing, this experience will become even more enjoyable [16, 18]. However, results from previous studies and research show that although these live mobile broadcasting services are available for indi-

1 e.g.: http://www.metro.se/2010/09/22/49027/gor-gerillatv-med-din-mobiltelefon/ 2 European Patent Office, under Rule 19(3) EPC 3 http://bambuser.com/ 2 1 Introduction viduals, allowing people only to broadcast from their mobile devices is not enough. This includes situations in which a group collaborates to create a live show such as sporting events or live TV interviews. Accordingly, challenges still remain for the designers of these services to provide their users with features that so far have been exclusively in the hands of professionals [22]. The production of live TV shows usually takes place under a time critical con- dition [19] and needs to be highly coordinated among the entire production team members. On the other hand, events like team based sports might be distributed over a large area or could happen so fast (as it is in ice hockey and football) to be cov- ered only by one camera; hence the need for the real-time coordination of several cameras is extremely felt [19]. In such multi-camera settings, each camera starts filming from a position that is defined by the director; their corresponding video streams are simultaneously transmitted to the production control room. There, the director by having multiple views of both live and pre-recorded items on an array of monitors, can manage a suitable selection and combination of streams to provide the spectators of the final broadcast with the best viewing experience. The main role in the production room is played by the director or Vision Mixer(VM) who also has control over the switching operation, ProcAmps(Processing Amplifiers) like brightness, contrast, hue and saturation as well as the instant replays and handling the communication between team members [19, 25]. Video production in this sense, should be considered as an interactional pro- cess which demands an extensive collaboration to provide the director with efficient direction capabilities [14]. The spectators consequently, would enjoy the final out- come of this collaboration, the seamless broadcast of the event from multiple angles, without ever being aware of what happens behind the scenes [19]. Figure 1.1 shows SKY Sport24 news channel production control room. In this picture, the director and his staff together with the mixing console, video sources, preview monitors and other equipments are visible. With the advancements in digital and mobile technology, contribution from am- ateurs in video production has been made possible. Phones with advanced function- alities are coming that can promote current amateur mobile video technologies by establishing distributed real-time collaborative environments [16, 18, 25]. The area has recently gained more interest in research and a growing field of practice is visi- ble as well; yet to enable users to experiment a real video work, more research and studies are expected [19, 22]. There are already mobile technologies for capturing and sharing of multime- dia content; some current devices like Nokia N-Series, Apple iPhones and Android phones also support basic post-recording editing functions such as rough cuts and or- dinary transitions that are useful for individual spectators who want to capture and 1 Introduction 3

Fig. 1.1. SKY Sport24 news channel production control room4 share moments with others. However, no solution exists to address more advanced requirements of mobile video users. It has also been argued that with the increasing interest in this area and to sup- port amateurs with a more robust and effective collaboration on the Internet, appli- cations like a mobile mixer need to be designed to allow viewers to collaborate in such a shared experience [16,18,25]. The Mobile Vision Mixer(MVM) prototype tar- gets these needs by providing users with freedom, liveness and coordination of the task. This work, mainly addresses the mobility and collaborative aspects of amateur video making. MVM consists of a mobile application and a remote service setup for choosing and receiving a group of four live video streams from users broadcasting an event with their mobile phones. The mobile application represents live preview of each stream to its user. The mobile user then, can select a video stream from the preview for broadcasting to the Internet. This broadcast can immediately be publicly shared via social networks or a personal webpage. Figure 1.2 depicts a possible mix of streams from four different cameras in an ice hockey match. MVM is presented in the current discourse. It is an innovative example in which a collaborative mobile environment enables a group of users to co-produce and broad- cast a live footage only by using their mobile phones. MVM is developed as a func- tional prototype to probe the feasibility of mobile collaborative live video production applications and to investigate how people would use such systems to make a video. While working, some room for further work was also discovered. Some invaluable results were also found, like an agenda of system properties and other challenges in developing similar and more advanced systems. 4 1 Introduction

Fig. 1.2. A typical combination of live streams

The system consists of a backend mixing and switching application, a web ser- vice and a mobile application for any mobile phone that supports Adobe Flash Plat- form [6]. Bambuser is been used to address the live streaming needs. The system in this sense, includes four mobile camera operators streaming from their mobile phones; while the user who is running MVM on his phone (the director) sees a quadruple live preview of them (mixing) and can select one for broadcast at any moment (switching). The system is created to allow mobile users coordinate a live mobile video pro- duction through a multi-camera setting. Previous studies also reveal that live TV shows or sporting events become more attractive and understandable to viewers if multi-angle shots like wides and mediums or detailed and overviews are pro- vided [14, 18, 25]. MVM can address this need likewise by presenting the seamless switching capability. Figure 1.3 depicts a running instance of MVM. MVM can make co-directing a live broadcast enjoyable by getting all participants joining the production task; having live preview of every stream, will provide the director with a better understanding of the collaboration. The author hopes that this work could influence the design for mobile video services.

1.1 Research Problem

Current technologies do not allow amateur video producers to collaboratively make and broadcast a live footage. Addressing this problem is the main focus of this dis- sertation; in this way and by presenting a system prototype, the thesis belongs to 1.2 Methodology 5

Fig. 1.3. The Mobile Vision Mixer application prototype running on a Nokia N86 8MP phone the Artifact Development type5. Chapter 2 justifies why such a system is required and what aspects were found missing in the previous works; but briefly speaking, bringing extreme mobility to video producers, providing mobile video users with a collaborative tool, all together in a mobile system, is the main research problem in this thesis.

1.2 Methodology

Considering the nature of the problem that I was supposed to create a functional prototype in a reasonable time (February - May 2010), it was realized that the tra- ditional methods of user centered design which expects the HCI designers to begin by observing the target group, defining their requirements and designing for them might be time costing. Alternative solution then, was to put more effort on design- ing and implementing the prototype; yet the main strain toward this thesis can be divided into the following steps:

1. Pre-studies 2. Design and the implementation of the prototype 3. System evaluation

The first step constituted some literature review, checking similar works, getting familiarized with related technologies and trying out possible solutions to make the prototype.

5 Thesis Information, Version 4, May 3, 2010. available at http://dsv.su.se, last visited: De- cember 10, 2010 6 1 Introduction

Next step focused on the actual implementation of the system, which is described later and in details in chapter 3. System evaluation as the final phase was expected to provide essential feedback about users’ experimenting the system. To do this, user studies were conducted while test sessions were video recorded for further analysis. The study method as well as results found, are presented in chapter 4.

1.3 Contribution

The effort was done as part of my master thesis final project at MobileLife research centre 6 in Stockholm, Kista between January and July 2010. I was mainly respon- sible for designing and developing the entire MVM system. The evaluation of the project however, was not in the scope of my task when the project started; yet, later on it was decided and done under the same working group’s supervision, morevideo! 7 at MobileLife. Needless to say that the main contribution of the thesis is the MVM prototype itself, a mobile system that offers its users to co-produce a live video. The success- ful launch of the system showed the possibility of similar and more sophisticated systems. While working, my colleagues and I also learned a lot; particularly with the user studies, seeing people in collaboration and figuring out some other related aspects.

1.4 Layout

After presenting an introduction to the main idea of the work in this chapter, the rest of the dissertation is outlined as follows: The discourse continues by chapter 2 for giving a background of the topic men- tioning the importance of video in HCI8 and CSCW9, followed by an overview of professionals and amateurs video production interests and practices. Mobile broad- casting services as well as some examples from previous works are also presented in the same chapter. Chapter 3 then gives an overview of the design and implementation of the system. Technical aspects of the system are discussed in detail by presenting the architec- ture of the system, how it is implemented, use scenario and system components. Expected technical improvements are also briefly presented.

6 http://mobilelifecentre.org/ 7 http://mobilelifecentre.org/project/show/3 8 Human Computer Interaction 9 Computer Supported Collaborative Work 1.4 Layout 7

The evaluation process, addressing the two conducted user studies is briefly dis- cussed in chapter 4. The results of the evaluation as well as the problems found with MVM are also described in that chapter. The discourse is finally concluded with chapter 5. Some expected work to improve the system are also discussed in this chapter.

O 2

Background

When it comes to video production in CSCW, video as a social media gains multiplied interest from its researchers as it could be considered both as a mediated channel for communication and a topic of concern [25]. Technologies have been developed to provide a certain amount of collaboration between members of a video production team; they are usually supplied along with fixed devices like TV studios, production rooms or custom fitted buses which provide the production team with some degree of fluidity. The concept of mobility however, has been widely ignored in CSCW [24], while with the recent advancements in mobile technologies more effort in research on live interaction with visual content is expected. This movement includes devel- opments and improvements in processing power of mobile phones and pervasion of high-bandwidth Internet mobile networks like 3G and recently 4G (2010); at the same time, high speed network access and storage costs are decreasing [19]. In the consumer demand, the situation is even worse; there actually exist no real practices to provide ordinary people with collaborative video productions, even though several products have been developed that allow some sort of fluidity be- tween individuals on fixed devices, which limits the users to their domain [24, 25]. These issues are discussed briefly in the following subsections. First, video as a visual social media and its role in HCI and CSCW is discussed. Then, the video production is presented in two different ways, first on what hap- pens in professional production and then what amateurs do. A comparison of these two will also be given. The concept of mobility in collaboration, together with how camera phones are used in video production is described next. Then, the mobile we- bcasting and so far available services are investigated; Some background on previ- ous research, studies and similar works are provided and the findings from previous research and implications for design are outlined finally. 10 2 Background

2.1 Video in HCI and CSCW

The topic of video has always gained a vast and growing interest in HCI; which matters different concerns like in production, as live and non-live media, stream- ing, users’ generated content and as means to support collaboration [19, 22, 25]. Perry et al [25] show how these topics come together in a practice of TV production although they have been always considered separately. They also argue how a col- laborative work is supported and coordination of multiple people takes place around and through the video material. It was aforementioned in chapter 1 how live TV could be recognized as an in- teractional process. To manage the production in a meaningful way, an enormous amount of coordination and collaboration between the actors is needed; in this, the turn toward users’ involvement has resulted into a broader focus on CSCW by allowing non professionals to collaborate and coordinate the production of a live media, like broadcasting an event as it happens. With the advancements in multi- media computing, high-bandwidth networks and mobile video enabled devices, the issue has gained even more interests and been also incentive for designers to support amateurs’ contribution in video production [19]. The relationship between video and interaction has also been a longstanding topic of concern in HCI for the past 20 years [19]. The main goal is to help both professionals and amateurs not only create their own video data but also access this complex data in the best possible ways and most proper techniques such as browsing, editing and summarization [19].

2.2 Video Production

There has been notable development in video production during the past decades, focusing mainly on digital facilities for live and post production practices. In a typ- ical digital production, the process starts with a setting of multiple cameras pro- ducing and proposing shots from different angles; at the same time, the director watches these images, selects one for broadcast by pressing the corresponding but- tons on a switch board. Examples of this application are broadcasting of live sport events wherein actions are so fast and need to be covered from multiple views or in the production of live interviews that the director needs to switch between wide and medium angle shots to cover both participants and interviewers. In such set- ting where camera operators and the director are co-producing the media, cameras are the starting point in a process that becomes enabled with interaction through technological interfaces for simultaneous transfer of images, switch operations, and mediated talks between participants [14, 19, 27]. 2.2 Video Production 11

In traditional way of filming, cameras are brought to events and scenes are recorded and shared with selected audience later. However, with the emergence of camera enabled mobile phones, more spontaneity in filming is visible. This spon- taneity can be seen in both video capturing and sharing the created media later through Bluetooth, emails or by playing back straight on the device screen. Mobile devices in this way of presence everywhere, whenever and for whatever has affected the traditional ways of video production [16, 18]. A more professional setting of video production, consists of a multi-camera work and vision mixing like it is in live sports broadcast, TV interviews and studio pro- ductions. In a typical configuration with five cameras, two produce close-up shots of the guests, two others positioned behind the participants, taking close-ups of the interviewers while the fifth camera is responsible for making wide and spontaneous pictures [14]. The end product of this effort, is a series of shots produced around a carefully coordinated collaboration among team members [14, 19]. The produc- tion cycle together with both professionals and amateurs interests and methods are discussed in more details below.

2.2.1 Professional Production

Professional video production is a process that demands a high volume of collabora- tion among the team members including the camera operators, the director (or the Vision Mixer) and their equipments. In case of broadcast of a live sport event or a live TV show, this setting might also include instant replay operators, commentators and interviewers. These two cases will be taken as model below to describe some aspects of the professional production process. The final outcome of a production team’s collective work is a series of shots delivered to the audience. In such collabo- rative work, the media itself is a means of collaboration which organizes the process to go through [25]. Depending on the event to broadcast, its distribution, length, fastness and some other factors, best shots need to be selected by the director to be presented to re- mote viewers. This selection brings the audience an appropriate understanding of the event by letting them feel both its progress and liveness. To coordinate the task, director talks to the camera operators through a headset while they are proposing shots through their filming; he can also light up a tally lamp on the selected camera to show that it is selected and it is needed to remain steady, looking for no more shots and neither panning or zooming until the next camera is selected for broadcast. This method of communication and collaboration between team members is called ”proposal-acceptance” which has enabled the mixing of overview shots like the total unfolding of the action with details like participants’ expressions. Video in this way has been used both as the topic of concern and as a means of collaboration [16, 25]. 12 2 Background

To sum up, camera operators present and suggest their work through filming while they receive feedbacks from the Vision Mixer(VM) with their selection, shown by the tally lamp. Other means of communication like an audio link or gestures in cases where verbal communication is not allowed, could be also available [25]. The main process of direction and production takes place in the production con- trol room wherein the vision mixer and others orient toward the broadcast. The main role in this studio is played by the vision mixer who continuously identifies the potentially selectable feeds from different angles and previously recorded shots and manages a selection for the broadcast. In live sport production, the vision mixer is also cooperating with the instant replay operator to replay key sequences immedi- ately as they occur in the game. The setup of the production control room also includes an image gallery display- ing all sources together with preview monitors. A backchannel is provided likewise for giving instructions to others both inside and outside the studio. The outcome of the co-development in this room is a balanced and dynamic assembly of the images covering the action from various viewpoints, providing the remote viewers with an ongoing interesting experience of the action [19, 25]. This intersection of video production interactional process and the emerging new technologies, has been a significant motivation toward recent interests in research for video production and consumption.

2.2.2 Amateurs and semi-professionals practices

The chief difference between professional and amateur video work is in how the cameras are used, first with the types of cameras and second with their setup config- uration. Kirk et al [23] in their investigation of nonprofessional home video makers have mentioned how teenagers (as the focus group of their study) experience the practice of video production. Most of the teenagers they spoke to, have preferred to use their mobile phones rather than video cameras as they have found no benefit in buying cameras. The spontaneity of the action and the level of involvement in it were also found to be two other topics of concern. Like said before in section 2.2, this spontaneity is also visible in capturing and sharing of the media. As with the level of engagement, users do not want to be so absorbed in the filming; they prefer to actively participate in it. This also describes why video cameras are not taken with them constantly [23]. On the other hand, camera phones can be brought everywhere and all the time. They can be easily carried and while in use, provide their users with a high degree of mobility. However, to collaborate around the topic of interest, the positions of filming devices needs to be negotiated between director and the camera operators [25]. Engstrom¨ et al [19] also argue that amateur video production could be improved if the participants can see and get enough information of what other group members 2.2 Video Production 13 are doing. To do this, a video backchannel could be provided for example to enable the team to communicate over and through different modalities like text and speech interfaces. As a result, by using mobile phones as video cameras and letting people actively engage in producing a footage and consuming it at the same time the amateur video production experience could be drastically improved [18, 23]; yet, little attention has been put so far in designing services or adapting modern technologies to address these needs [25, 27].

2.2.3 Comparison and conclusion

The difference between amateur and professional video practices is not only with what equipments and devices are used, but is also in how they are utilized in the process. Professional video production starts with a setting of video cameras brought to the location, might also include a control unit or a central operation place (e.g.: custom fitted bus), while amateurs on the other hand might find no interest in fol- lowing this setting. In the latter case, camera enabled mobile phones could be an alternative to video cameras that also provide their users with more degree of fluid- ity and mobility. An extensive degree of spontaneity has also been seen in the mobile video making and consumption while as with professionals the recorded are usually edited and shared with the viewers afterward. This is because mobile created materials are rarely planned in advance and are often used to ”enhance the moment” [23]. In this manner, the post production works like editing, seem unnecessary to their producers [22, 23]. In a multi-camera setting, the communication between team members is of immense importance. Methods like proposal-acceptance are used to increase the awareness in whole team, tally lights as well as body gestures are also other means of communication. However, there are no current technologies to provide nonpro- fessionals with collaborative video mobile production. Challenges also remain for the designers of these services to support amateur collaborative video practices. Al- lowing people only to broadcast from their phones does not seem to be appropriate enough [22]. By the MVM prototype, I wanted to take advantage of unique characteristics of mobile phones, their extensive mobility and spontaneity in action, to provide the users with basic mixing and switching functions in a collaborative environment. In this manner, amateur video producers are let coordinate their own footage of any event. 14 2 Background

2.3 Mobile Broadcasting, More Mobility

Juhlin et al [22] have defined the mobile broadcasting, its characteristics and fea- tures as follows. Mobile broadcasting services are new enhancements on mobile phones that allow users to capture and broadcast live video from their devices to web interfaces on the Internet in real-time. The web application lets people browse these live and archived feeds and interact with their producers. In this sense, mobile broadcasters by grabbing their phones can share their moments through the Inter- net with unlimited number of audience. The mobile webcasting services typically provide the following features:

• Immediate sharing of live video from mobile phones to websites • Archiving the live videos for later reviews • Distributing the live feeds via social networks, emails or through other web pages embedding • Title and GPS location descriptions • Live chats and online commenting

Mobile broadcasting is also similar to mobile video conferencing system and web- cam live video chats in that all provide immediate sharing of the captured video [22]. They are on the other hand different, as the cameras are wireless and allow users to capture from anywhere in the coverage of mobile networks, targeting thousands of online viewers [22]. Taking advantage of the most remarkable characteristic of mobile camera phones, its ubiquity and being always present and reachable, users can also benefit from the combination of capture and immediate remote sharing of images that has not been possible before, with traditional or digital cameras [22, 26]. Broadcasting from a handheld has its own restrictions such as small screens and limited interaction. The mobile streaming users also lack a key feature, editing; but it is still evident that mobile broadcasting is a growing medium with a growing number of users [22].

2.4 Related Work

Engstrom¨ et al present Instant Broadcasting System (IBS), formerly known as SwarmCam [16, 17] with which people by using their mobile phones connected through mobile networks such as 3G and a laptop computer, can collaboratively produce, edit and broadcast live video. IBS also provides its users with live features and techniques such as real-time loop editing as well as a backchan- nel for communication purposes that were previously only available to professional TV-producers. Possible use scenarios include VJ systems in nightclubs that support 2.4 Related Work 15 visitor-generated video or parents who want to broadcast live images of their chil- dren in events like football matches. IBS is a considerable step forward for nonpro- fessional collaborative video production despite it lacks the full mobility of its users by limiting the vision mixer to use a stationary computer. Successful aspects of IBS are taken by MVM; yet, its mobility was to be extended to allow running the vision mixer on a mobile phone. InstantShareCam is another example in this area [27]. InstantShareCam is a ser- vice concept that targets ordinary citizens holding video cameras. It operates on wirelessly-networked cameras and allows its users to simultaneously collaborate to capture, edit and view the coverage of an event in real-time. InstantShareCam how- ever relies basically on wireless networks and personal video cameras, restricting its degree of mobility and target population respectively. Bambuser also presents a web application that give its users the ability to col- laborate and co-produce their footage in a multi-camera setting1. That is an online event manager which allows users to get together through the web interface and add their cameras to each others’ events. The user who has set up the event would then hold control over the vision mixer and at any moment can select one camera for the broadcast. Those who are watching the event through the Bambuser website will be presented by a mixed stream consisting of videos from all participants. With MVM, an innovative mobile system has been set out for nonprofessionals who are interested in collaborative video practices. Even though the current work only provides basic mixing and switching functions, it will hopefully become a strong foundation for further mobile multimedia systems development.

1 http://blog.bambuser.com/2008/04/core-vision-from-start-with-bambuser.html (accessed January 20, 2011)

O 3

System overview

The Mobile Vision Mixer(MVM) prototype has been implemented to allow a group of people to collaboratively film and coordinate the broadcast of an event. The cur- rent work can enables collaboration between 4 camera operators who live stream their desired scenes to Bambuser using their phones, with a director who holds Vi- sion Mixer(VM) application on his phone and can switch to one of them for final broadcast, at any moment. //The system has been implemented in a two tier archi- tecture with a web service in between to connect the tiers and for message passing. However, due to the project purpose and the time limit, its scope had to be narrowed down is some way which is explained in section 3.5. The following subsections ex- plain all the design and implementation related issues to the latest version of MVM by the time of writing the dissertation.

3.1 Inspiration and Implication for Design

Perry et al [25] have shown that there is no real technology to support amateurs’ col- laborative video production. Their research also has determined how distinctly the need to design consumer technologies, assisting inexperienced users with collabo- rative live video production is felt [25]. The present work was inspired by all these facts; the main goal here has been to develop an approach to let people collabora- tively film and broadcast an event. In this, several issues needed to be considered. First, Juhlin et al [22] discuss that only providing users with live video streaming from their mobile phones is not enough, neither mobile webcasting has fulfilled its potentials expected by the researchers. These have led them to claim that: ”There remains a challenge for the designers of these services to develop the concept in order to support people’s appropriation and thereby democratize a medium which up to now has been entirely in the hands of well-trained professional TV-producers.” Second, since the product is intended to be used by almost every mobile user, the design should be simple enough so that even amateur users can use it. To address 18 3 System overview this, Bambuser was chosen for the live broadcasting service. It is easy to install, to learn and to use and it is also one of the most popular mobile live broadcasting services [22]. Its low latency was also another reason of choice. Finally, and the most challenging one has been the live preview of what all the participants are filming. It has been argued that in cases like VJing, static thumbnails of prerecorded materials could be enough for the basic recognition, yet that would never be sufficient for a live video production [16]. The evaluations and comparisons later on also proved this fact [15]; It was expected from the beginning likewise, that a live preview would give a better impression. These three as well as the team’s previous experience resulted in the design for a mobile mixer with quadruple pre- view window of Bambuser live broadcasts. The mixer also allows its user seamlessly switch between these streams for final broadcast.

3.2 Employed Technologies and Components

The following services, technologies and components are used in MVM:

1. Bambuser: Bambuser by its founders is described as follows [1]. It is a live streaming online service which lets its users to broadcast instantly from their mobile phones or desktops and share them right straight away with followers all over the net. Bambuser can be easily integrated with social networks like Facebook, Twitter and Myspace and also with users’ blogs and websites. In the current work, Bambuser is used to empower individuals’ broadcasts; it is also utilized to stream the combined video as well as handling the switching operation. Here, we have benefited from its capability of broadcasting from a desktop application; Adobe Flash Live Encoder in this case. Bambuser also provides its users with an API for fetching some complementary data of any activity like date, time, title, size and location of the broadcast; If eligible, a direct URL pointing to a location on their storage which holds the broadcast actual FLV file will be also returned. 2. FLV: or FLV is a file format originally developed by but currently owned and supported by Adobe. According to [8], FLV is used to deliver video over the Internet using 6-10; it is gaining a growing interest from online video providers like YouTube, Google Video, meta- cafe as well as services like BBC Online and Reuters.com. FLV files can contain multimedia content encoded in VP6, Sorenson Spark, H.264 and HE-AAC codec. FLV content can also be embedded in SWF files. Flash Video is viewable not only by most operating systems but also on the majority of mobile video enable devices. 3.2 Employed Technologies and Components 19

3. SWF: According to [11], SWF or Small Web Format is a repository for multimedia content originally developed by FutureWave Software, transferred to Macrome- dia and now owned by Adobe. SWF is intended to contain animation, applets, ActionScript code and FLV videos. Its small size enables SWF components to be easily published to the web. FLV files embedded in SWF containers can be watched on most operating systems and a variety of mobile phones. 4. Adobe Flash: Adobe Flash is explained as a multimedia platform founded by Macromedia that was transferred later to Adobe. Flash is typically used to create rich Internet applications by adding animation, visual content, multimedia and interactivity. Flash contents can be displayed on most operating systems, some electronic devices and mobile phones [6]. 5. : Adobe Flash Lite, the lightweight version of Adobe Flash as its name also implies is the Flash platform for mobile phones and portable elec- tronic devices. It enables its users view multimedia contents on their devices. Flash Lite supports ActionScript which has enabled it to bring some degree of interaction to its users. The latest version of Flash Lite, 3.0 is based on Flash and also supports H.264, On2 VP6 and Sorenson video codec which means it can play FLV videos [12]. 6. ActionScript: ActionScript is a scripting language primarily developed by Macro- media to run and control simple animations, but later on owned by Adobe, tar- geting Adobe Flash platforms on web pages and portable devices in the form of embedded SWF files [5]. With the release of recent versions of Adobe Flash, ActionScript is now an object oriented programming language that provides more interaction and features like video playback and control (since Action- Script 3) [5]. 7. Microsoft DirectShow: The Microsoft DirectShow application programming in- terface (API) is Microsoft Windows media streaming platform which provides high quality capture, edit and playback of multimedia streams. DirectShow sup- ports a variety of multimedia formats like Advanced Systems Format (ASF), Mo- tion Picture Experts Group (MPEG), Audio-Video Interleaved (AVI) and MPEG Audio Layer-3 (MP3). By providing access to the underlying stream control ar- chitecture, DirectShow can be extended to support new formats such as FLV [4]. A common DirectShow application is based on a set of connected components (Filters) for receiving multimedia content as well as parsing, presenting, chang- ing and streaming purposes. Filters are embedded and connected to each other in a container called Graph. Graph is in charge of running and controlling the media stream through filters [4]. DirectShow graphs can be built and controlled in programming languages like C++, C#, VB .net and Java. Additionally, for test and simulation purposes or rapid development, tools like Microsoft GraphEdit exist that allow developers 20 3 System overview

find and see their desired filters as building blocks and build up a graph using them with drag and drop [4]. Figure3.1 represents a simple DirectShow graph for a FLV video file playback in GraphEdit.

Fig. 3.1. FLV playback with DirectShow in GraphEdit

8. MediaLooks Flash DirectShow Source Filter: Adobe Flash does not internally support in Microsoft DirectShow; yet, with MediaLooks Flash DirectShow Source Filter, Flash media (. and .flv) can be played back in DirectShow graphs via the native Flash runtime [9]. 9. Adobe Flash Media Live Encoder: According to [13], Adobe Flash Media Live Encoder is described as follows. This is an application for capturing, encoding and streaming multimedia content to Adobe Flash Media Server or the Adobe Flash Video Streaming Service. Through its interface, Flash Media Live Encoder provides its user with audio and video capture and live streaming features. It is also possible to run it from the command line. 10. MediaLooks MultiGraph Sink/Source: This is a set of DirectShow filters devel- oped by MediaLooks to allow transfer of multimedia content between different DirectShow graphs running on either the same or different threads and pro- cesses [10]. 11. Web Service: Web Services are described as follows [2]. Web services are soft- ware services executing on remote hosts, accessible through HTTP Protocol. One type of web services called XML Web Services use XML standards for their data structure and transfer. XML web services are becoming a platform for distributed application integration through the Internet. One key advantage of these services is that they allow programs written in differ- ent languages and running on different platforms, communicate with each other; this also includes function calls from mobile phones to remote hosts which was benefited from in the design of MVM [2]. 3.4 Use Scenario 21

12. Microsoft C# Pronounced ”See Sharp” is a modern object-oriented language that supports component-oriented programming as well. It is developed by Mi- crosoft as part of the .NET framework. C# has its roots in the family of C pro- gramming languages, yet many similarities to Java programming language can be seen [3]. 13. Microsoft Internet Information Services (IIS): Formerly called Internet Infor- mation Server is a web server application with a number of featured extensions, developed by Microsoft to use with Windows systems [20].

3.3 Ideal Architecture

MVM is a setup of different services and applications consisting Bambuser as the live streaming service provider, a group of four video enabled mobile phones streaming to Bambuser, the mixer machine and the VM mobile application. All mobile phones are connected to the mobile Internet network through 3G, while the mixer machine itself uses Internet connection to communicate with Bambuser. A backchannel is also provided between camera operators and the VM. However, due to the project scope, the backchannel is not implemented. Figure 3.2 represents this model wherein the communication between different parts of the system can also be seen.

3.4 Use Scenario

This section describes how people are envisioned using the system. An example scenario in which MVM could be utilized is as follows: A group of five people, all interested in making home video, are attending an ice hockey match. During the game, they decide to use their mobile phones to live broadcast their own view of the game over the available 3G mobile network. They choose one among themselves to be the director while the other four hold mobile cameras. They start filming and moving around in the spectators area, each trying to find the best view. They have planned to have the following cameras setup:

• One camera for providing overview shots, • two for details, one on each side of the rink, • and one camera aiming at the spectators and the bench coach.

During the match, the director is able to see live previews of every camera and at any moment can seamlessly to the most liked one. The video created in this way is the coverage of the event through an spontaneous collaboration that is publicly visible on Bambuser website and is also recorded there. Figure 3.3 represents this scenario in operation. Figure 3.3 is described in more details below: 22 3 System overview

Fig. 3.2. Vision Mixer Architecture

Fig. 3.3. Mobile Vision Mixer in Operation 3.5 First Attempts and Lessons Learnt 23

1. The camera operators (ML1, ML2, ML3 and ML4) start filming and broadcasting to Bambuser. 2. The VM application enables the director to view Bambuser video feeds. 3. The VM application also enables the video to select a group of video streams to be combined for an intended preview on its display. 4. Upon selection of the group of video feeds, the VM application sends a request to the mixer machine. The request includes information associated with the group of video streams to be combined. 5. The VM application also notifies the selected (corresponding to the group of video streams) to stay on shot. 6. Upon receipt of the request from the VM application, the Mixer Machine requests and fetches the selected group of video streams from Bambuser. 7. The combiner process then combines the group of video streams to create a combined video signal (a quadruple view). 8. Subsequently, the stitched stream is broadcast back to Bambuser (with MLMixer username). At the same time, another video signal which delegates the final output is broadcast (under the MLDirector username). 9. The switching operation affect the MLDirector output. 10. The Mixer Machine also sends a notification for the availability of the combined stream to the VM application hosted in the mobile device. 11. The VM application then fetches the combined stream from Bambuser and starts displaying it to the user. 12. The VM user (the director) chooses a video stream from the previewed combined video signal on the display for broadcast. 13. The VM application sends a notification to the mixer machine with details re- garding the selected video stream. 14. Concurrently, VM application also notifies the camera operators corresponding to the selected video stream about the on air status of the video stream. 15. On receipt of the notification, the Mixer Machine with its switching process switches the final output stream to the selected video from the combined video signal.

3.5 First Attempts and Lessons Learnt

After investigating different possibilities for implementing the Mixer Machine, the following four solutions were proposed for initial implementation and feasibility test:

1. Directshow Mixer + Directshow Switch application 2. Flash Mixer + Directshow switch application 24 3 System overview

3. Flash Mixer + Flash Switch application 4. Flash Lite mobile mixer + Flash Switch application 5. Switch on the player side + any kind of mobile or desktop switch application

In all these five, Flash Source Filter was supposed to be used as the FLV source player in DirectShow together with Adobe Live Media Encoder as the final broad- casting tool. Also, two Video Mixer Filter objects were available, MediaLooks Vision Mixer Filter and Rogue Stream 3D Video Mixer Filter1. These filters are capable of real-time mixing of several video streams from a variety of sources; In the case of MVM, they are intended to be used in conjunction with a group of Flash Source Filters to perform a basic video mixing. Finally, using DirectShow graphs at the inter- mediate level for connecting Flash components to Adobe Live Media Encoder seems to be inevitable. It is also worth to mention that the pre studies and some basic tests showed that the best way to bring Bambuser live videos into the programming context (Direct- Show graphs), is by using MediaLooks Flash Source Filter. Later tests proved this model, yet it was an innovative idea and never tested before. Among the four men- tioned solutions the first two were recognized as infeasible due to some difficulties between Flash and DirectShow such as:

1. Threading/Process problem: One strange behavior witnessed when using Flash Source filter in DirectShow was that using more than one instance of that ob- ject in a graph is utterly impossible. It means that for bringing 4 different video streams into one graph at the same time, at least 4 separate Directshow graphs are needed which in turn, ends up with having 4 different processes. Running several processes of video tasks is extremely resource consuming and personal computers are generally not capable of this. 2. Latency: An increasing delay between the live stream and the graph output was seen after running the mixer graph for few minutes. 3. Incompatibility Issues: The incompatibilities between Flash and DirectShow was resolved using MediaLooks Flash Source Filter working as a bridge in be- tween, to connect these two platforms together. However, for a huge resource consuming process like live video mixing with previously mentioned video mixer filters, instabilities, strange behaviors and unpredicted performance were ob- served. 4. Sound: Another observed problem with the Flash Source Filter integration into DirectShow, was its inability to play the live interleaved audio stream. Medi- aLooks later on reported that this is likely to happen if the interleaved stream has no audio data on its first frame. This issue was solved later by playing a soundless audio stream on the first frame of the Flash object.

1 http://www.roguestream.com/3D video mixer filter.html 3.6 Implemented Architecture 25

The fourth solution (Flash Lite mobile mixer + Flash Switch application) was also rejected. Although Flash Lite objects are capable of playing video streams, bring- ing four live FLV streams at the same time in one context due to mobile network bandwidth limits is impossible2. The third solution (Flash Mixer + Flash Switch application) however, seemed to be possible. Due to the internal support for live FLV streams, playback over RTMP and HTTP protocols in Adobe Flash platform and playing multiple video streams from live sources has become possible. Seamless Switching between different video streams is not technically a problem either. Communication between Flash objects (combiner and switch) and DirectShow and Adobe Live Media Encoder is managed out with the help of the Flash Source Filter. Section 3.6 describes this in more detail. The idea of the fifth solution (Switch on the player side + any kind of mobile or desktop switch) was actually taken from Adobe’s article on building live video switches [7]. This solution also refused later since it is entirely based on running server side ActionScript codes on a media server which demands running a sepa- rate server application. However, sending commands to registered watching clients switch to another video stream is a novel idea.

3.6 Implemented Architecture

To meet the timeline of the project, its scope had to be narrowed down. The most notable outcome of this degradation is the removal of the backchannel. Figure 3.4 represents the new architecture in which this change is visible. Considering this simplified model and assuming that the combiner and switch operations are already run and controlled by an operator, in operation:

1. Camera operators start filming and broadcasting to Bambuser. 2. The Mixer Server requests and fetches selected streams from Bambuser. 3. The Mixer Server runs an instance of Video Combining Process, providing it with specified selection of streams. The related information about the selection is written in the XML input text file by the Initiator Application. An instance of the Stream Switching Process will be also executed to manage the switching operations afterward. 4. Upon having the combined video stream ready it will be broadcast to Bambuser. 5. The director runs the Vision Mixer application on his phone; it requests the com- bined stream from Bambuser and starts displaying it. 6. The director via the application can select any of the four combined streams for broadcast. 2 Perhaps by the emergence of 4G network and mobile devices in near future (4G is already accessible in some parts of Sweden, since September 2010) this issue could be resolved. 26 3 System overview

Fig. 3.4. Simplified Vision Mixer Architecture

7. A request will be sent to the Mixer Server to switch to the desired stream. 8. The Mixer Server switches to the requested stream. It is done through Stream Switching Process and the result is visible on Bambuser.

Fig. 3.5. Mobile Vision Mixer in Operation, in a Simplified Architecture

Combiner and Switch components, as well as the VM application, are discussed in the following subsections. A quick look over remote procedure calls, communica- tions between different parts and the processes in the entire system will be given as well. 3.6 Implemented Architecture 27

3.6.1 Bambuser API

Bambuser provides its users with a set of API functions to access parts of their database through HTTP calls3. The API includes two main functions, getVideo and getVideos; the latter is used both in the implemented VM mobile application and by the Mixer Machine to retrieve information about MLMixer and ML1, ML2, ML3 and ML4 recent activities. The ”getVideos” function returns a list of results containing the following fields: vid Integer id that uniquely identifies the Bambuser video. title String The broadcast title, set in the broadcast device before starting the broadcast. type String Available values: ”live”, ”archived” username String Bambuser username of the broadcasting user created Integer Unix timestamp for the video. Table 3.1. Some of the available metadata fields contained in a typical video object returned by Bambuser ”getVideos” API function

Depending on the caller’s permissions level, it might also return another String value, url, which points to a location on the Bambuser server to reach the broadcast through HTTP or RTMP.

3.6.2 Broadcasting with Adobe Flash Media Live Encoder

It was mentioned in section 3.2 how Adobe Flash Media Live Encoder (FME) can be used for live media streaming purposes. FME, through its interface, can be con- figured to use audio and video capture devices as input source. Since MediaLooks Source filter in paired with the Sink filter are designed to operate as a virtual stan- dard capture device, they can be identified and used by FME; in this way, and with the help of a DirectShow graph, broadcasting to Bambuser from Combiner and Switch components are made possible. For security reasons, broadcasting to Bambuser from other applications rather than their own mobile service are lim- ited. However, with authentication profiles created specifically for each user, FME instances can be configured for streaming to Bambuser as well.

3.6.3 Combiner Process

The video combiner process fetches the current broadcasts of Bambuser ML1, ML2, ML3 and ML4 users, stitches them together in a cross view and broadcast it back to Bambuser under the MLMixer username. Like already said, fetching from Bambuser is made possible through their provided API function, ”getVideos”. 3 http://bambuser.com/api/feeds 28 3 System overview

The ”getVideos” output which contains all necessary information including a lo- cation on Bambuser server, is written in an XML text file, bam.xml. The initialization of the system including this function call and preparing the bam.xml file, should be done by the operator through a simple application written in C# which is also provided. The actual combiner component however, is implemented as a Flash SWF file, which has four video player objects laid out on its canvas; each locating a video file on Bambuser as well as a group of labels providing complementary information to the user. The Flash object is illustrated below in figure 3.6 in which its quadruple layout and the camera index and timestamp labels are visible.

Fig. 3.6. Combiner Flash component layout

This Flash component in operation, starts by reading the bam.xml file. Related information for each broadcast is retrieved from this file and assigned to associ- ated player objects. The complementary information including the time, title and username will also be shown. The automatic playback of every stream will start afterward. For broadcasting with FME, a DirectShow graph is used as a link in which a Flash Source Filter is connected to a Sink Filter. In this, the Flash Source Filter by reading from Combiner SWF file will streamdown its content to a shared memory area allocated by the Sink Filter. FME simultaneously starts reading from this shared area using a paired instance of MediaLooks Source Filter. Running and starting both the DirectShow graph and FME instance needs to be done by the operator.

3.6.4 Switch Process

The switch process works rather similarly to the Combiner. The only difference is in the Flash SWF file where instead of stitching the streams in a cross layout, they 3.6 Implemented Architecture 29 are put on top of each other. Each stream in this embodiment, corresponds to a different channel. They are all automatically started with their sound off. Whenever a switching request is received, the corresponding player will be brought up to front, covering all other instances. In such way, the selected player would be heard while others are muted. In this implementation, the Flash object always keeps reading the content of a text file, broadcast.txt every second. The retrieved value is supposed to be a number between 1 and 4, representing the selected channel. It will be shown later how this file is used by an implemented web service which is accessible from the mobile application to allow mobile users switch between video streams from a live simultaneous preview of the combined video. Another instance of FME in conjunction with a DirectShow graph is used for broadcasting the output to Bambuser; this instance is configured to work under MLDirector privilege.

3.6.5 Vision Mixer Mobile Application

The Vision Mixer mobile application is created to allow a director coordinate a pro- duction task. It is implemented as an Adobe Flash Lite object which is suitable for ordinary mobile interactive applications. This object can run on any mobile device with Flash support capability concurrently with all other applications without inter- rupting them. However, just before the start of VM application, the desktop mixer needs to be run by an operator as it was mentioned previously. The application is configured to retrieve and playback the current Bambuser broadcast by the MLMixer user via a call to ”getVideos” function. This Flash Light object also lets the user use keys 1 to 4 to switch between the video channels. The application waits for the user (i.e. the director) input on the mobile device to select a channel. The related area of the selected channel will be demarcated by a red rectangle to separate from the rest of the display. A red ”On Air” text will be also lit up to show that the chosen video is already selected for broadcast. At the same time, through the web service selectChannel method call, the corresponding channel number (1 to 4) will be sent to the Mixer Machine to be read by the Switch component. VM application is implemented as a two layer Flash Lite object. It also has an input controller which reads the user inputs from the keypad and associates them to a proper channel selection. The first layer is a video player object which loads a file from Bambuser live broadcasts (pointing to the current MLMixer activity) while the second provides the labels indicating complementary information and the red rectangle. 30 3 System overview

3.6.6 Communication

An abstract schema of the communication model in the entire system is illustrated in figure 3.7. As seen in that figure, Bambuser lies in the middle of the architecture as the media provider; with mobile application on one side and the desktop mixing machine on the other, they are connected through 3G mobile network and the Inter- net over HTTP respectively. The VM mobile application is also accessing the Mixer Machine for sending channel switching requests through web service method calls over 3G. The four mobile devices broadcasting to Bambuser are not depicted in the figure.

Fig. 3.7. Abstract Model of Communication and Data Flow in Mobile Vision Mixer

On the desktop side, the inter-process communications take place via text files (as with bam.xml for initialization of the system and channel.txt for switching) or via DirectShow graphs hosting instances of MediaLooks Sink Filter.

3.7 Further Technical Improvements

Although the current prototype is working well enough to meet its primary purposes, further improvements and changes are expected for a real product. Among them, the most prominent ones are as follow:

1. A full-duplex backchannel is needed to enable the production team with a good way of mutual communication. Via this, they are intended to be able to send and receive messages, talk and propose scenes for capture and broadcast. 3.7 Further Technical Improvements 31

2. The desktop application that handles combining and switching operations needs to be started by a remote operator. The web service can be extended to sup- port an array of method calls such as automatically starting the combining and switching processes by receiving the proper commands. 3. The current mobile application needs to be developed to provide a better GUI and interaction to its user. The already created Adobe Flash Lite application can be extended for this purpose; however, using mobile application development kits, frameworks and SDKs like J2ME or Android SDK can be also proposed as they support programming with device specific functions such as touch screen and soft keys. 4. No security checks, authentications and logins are performed neither in the desk- top application nor the web service and the mobile side. In a real collaborative environment, users are supposed to login to the system, and be known and au- thenticated both to the applications and other collaborators. 5. Enough options should also be given to the director to make his own selection out of Bambuser feeds. Current application is restricted to only use videos made pre-defined users (ML1, ML2, ML3 and ML4). 6. Like already said, the result of switching and the combined stream are both broadcast to Bambuser and are publicly visible. This is unnecessary, and policies should be applied to make these feeds private. 7. DirectShow filters can be developed to play live FLV streams directly over RTMP or HTTP media servers. With this, SWF files embodiment in DirectShow graphs would be replaced by a straight Bambuser live streams access. The overall con- trol and possibilities for video practices in DirectShow would be also extended. Utilizing editing functions, effects and transitions would become possible as well. DirectShow filters in this way, will also help for a better integration of Switch and Combiner components. Another set of DirectShow filters can be also written for media broadcast to Bambuser or any other media servers which again will help with both a closer integration and narrowing down the final implementation by omitting FME instances execution. Figure 3.8 illustrates a conceptual design of this integration. Fig. 3.8. Conceptual Design of the Ideal Video Combiner and Switch Component Integration O 4

System Evaluation

To evaluate the system, two studies were conducted. The method used to evaluate the system as well the study results are described in this section. Evaluating the system was not in the scope of the author’s work, although per- forming some basic tests intended. However, due to the successful launch of the first version of the system and by realizing the potential of continuation, conducting a user study was designed and arranged by my other colleagues at MobileLife. I was then asked to join them only as Technical Staff for setting up the mobile devices, controlling the remote Mixer Machine and helping the test subjects to learn how to use Bambuser live streaming, as well as the VM application. A separate report on the design of the user studies, their purpose and relationship to the topic is already written [15]; from which some relevant facts are presented below1.

4.1 Method and Setting

Evaluation sessions took place in two public places in Sweden during June and July 2010; first in Stapelbaddsparken¨ 2, Malmo¨ and then, Universeum3,Goteborg.¨ The evaluation sessions were ethnographic field studies during which participants were filmed and observed. With this method of observation, people can be studied in context to see how they face the situation [29]. At the beginning of every session, participants were given a brief introduction of the system and how it would work. To conclude the sessions, participants were brought together and interviewed after each test to examine the experiment and their views about the prototype. By these debriefing sessions, the situation they were situated in could be analyzed and un-

1 The first evaluation report is written in Swedish but an informal English translated sum- mary exists as well. 2 http://www.stapelbaddsparken.se/¨ 3 http://www.universeum.se/ 34 4 System Evaluation derstood [29]. These assemblies were also video recorded for content analysis later on. Figure 4.1 represents an evaluation session in Malmo¨ Stapelbaddsparken.¨

Fig. 4.1. Evaluation in Malmo¨ [15]

The studies were performed through 7 sessions totally, with volunteers aging between 11 and 17. Participants were divided into groups consisting of four camera holders and one director. Upon their interest, the director or camera operators were allowed to appoint an assistant. Every group was asked to provide a footage of the environment; However, evaluators in Stapelbaddsparken¨ were free to choose their topic of filming while those in Universeum were instructed to produce an overview of one of the various showcases there, the ”Crime Lab”4. They were all requested to represent the environment in the best understandable way to the web followers.

4.2 Study Results

The results from interviewing participants during the test sessions as well as from observations and recorded material analysis are presented below. Interestingly, MVM was liked by the most of its users, and was said to be natural as it could show a live preview of what others were filming; in this manner, the Vision Mixer test subjects also said that they could experience the feeling of playing the main role in the production team. The overall collaboration to perform the task was also an enjoyable experience to the whole team even though some problems raised. These issues are discussed in the next section. In addition, the Vision Mixer was reported as an easy-to-use and learnable ap- plication by most directors since after using the prototype for a while, they could

4 http://www.universeum.se/index.php?option=com content&task=view&lang=en&id=677 4.3 Problems Found 35 understand how it would function. Due to these new learning, some group restruc- tured their work after a few minutes. Some participants also chose to have assistants to collaborate better with the director. In this, the cameraman could focus on the task while the assistant would coordinate it, for example by talking to the director or trying to propose topics for filming. This, most obviously became a solution for directors when the Vision Mixer delayed and other team members were out of sight. Codirecting a scene by using MVM is depicted in figure 4.2.

Fig. 4.2. Codirecting with MVM [15]

In some cases, directors were observed instructing camera holders for what to film. These observations illustrate how the whole team found workarounds to col- laborate in a better way.

4.3 Problems Found

During the test sessions, the following problems were reported by the participants:

1. Delay: Most of the directors complained about the obvious up to 30 seconds delay seen in the mixer. The delay is between the actual moment of filming and when the stream is received by the Vision Mixer; it used to happen after a while5 of using the Vision Mixer. Mixing in this way became difficult for some directors since what they were seeing on display was behind of what camera operators were filming. One chosen solution to this problem was to use a co- director, following the cameramen, watching what they are filming, but mixing and switching by the mobile phone. Some other evaluators on the other hand,

5 By the time the user studies were being conveyed, there was no exact estimation of how long this ”a while” could have been; However, recently and just at the time of writing this report, my colleagues and I have been working on this issue by setting up a simulated environment, testing the prototype several times and trying to measure the amount of delay and figuring out how this could affect the collaboration. 36 4 System Evaluation

said that they could benefit from this delay in some way by being able to plan in advance and having in mind what would be on screen soon. 2. Communication and feedback: Since the system does not currently provide any channel for awareness, almost every camera holder experienced a common problem that they could never know who is selected for broadcast. The only solution to this problem was to ask the director or the codirector. This means that the need for a backchannel or any feedback to be at least aware of the current camera selection is inevitable. 3. Picture Quality: A group of evaluators in Malmo¨ thought that having higher quality pictures could improve the application. Moreover, these group reported that because of the low quality video on their mobile phones they rarely use mobile video. What they complained about might be due to the conditions of the Malmo¨ study which took place in a sunny day and an open area. Thus, the mobile screens were reflecting sunlight; therefore, watching video became problematic.

4.4 Discussion

The successful launch of the prototype proved that the development of similar or even more advanced systems is possible. The study revealed that for users of such systems, being aware of what others are doing is indispensable. Showing live pre- views of every camera is one way of doing so, but providing a channel for back- ground communication also looks to be of high importance. This could be imple- mented as verbal channel, text based chats or only one way notifications to show who is on air. Employing different methods for user studies was so beneficial; in that, my col- leagues and I actually could gain a vast amount of feedback and information about the system and users’ opinions. We learned a lot from the recorded material analysis as well. Totally, the evaluation of the system helped us get some data and see our next steps for the future iterations. Delay and problems turned out to be important issues, although participants could manage to work with delayed streams and even some claimed it to be advantageous. With the emergence of the 4G network and benefiting from a greater network capacity [21,28], these problems are expected to be solved, yet adapting the system to use this higher bandwidth is needed. With 4G, broadcasting videos with higher quality and resolution (e.g.: HD video streams) might also become possible; Thus, more details would be visible in the picture and people who have not been using mobile cameras because of the quality issues, could start enjoying mobile video. 4.4 Discussion 37

Participants’ choosing assistants was another notable notion of the evaluation. It was not only interesting to see how people collaborate to overcome the delays, but also became a motivation for developing a backchannel in the next version.

O 5

Conclusion and Further Work

MVM has demonstrated the potential of mobile phones in collaborative live video production. What was envisioned from the start of the project has been to address the amateurs’ needs in video practices. MVM in this sense, is a considerable step forward by which, an innovative way for mobile users have been set out to co-produce their own footage of any event like sports as it was mentioned in section 3.4. Further investigation also proved the fact that MVM is the first and unique mobile video mixing system1 2 which is taking mobile video another step further. Yet another application of MVM would be in cases where people or events are needed to be watched or recorded such as in surveillance and security systems. Through the development of the system, several challenges were raised. Fetching live streams from Bambuser and bringing them into a multimedia environment such as Microsoft DirectShow was the biggest one. The selection of DirectShow however, was promising since it could give almost full control over the media streams. Direct- Show also provided functions for integration with Adobe Flash Live Media Encoder to help broadcasting the combined video to Bambuser. To get DirectShow running, an intermediate server needed to be used. It was shown in chapter 3 how this archi- tecture was beneficial; transmitting one video stream which actually contains four, was a solution to overcome with the current mobile network limitations.

5.1 Further Work

Still, there are remaining challenges like those technical improvements mentioned in 3.7. Besides, I want to create a real mobile application rather than the current Flash Lite object. Mobile apps are more consistent and durable; also with recent advancements in mobile operating systems, they seem to provide more flexibility in programming and developing user interfaces.

1 Designation of invention - communication under Rule 19(3) EPC, European Patent Office 2 http://www.metro.se/2010/09/22/49027/gor-gerillatv-med-din-mobiltelefon/ 40 5 Conclusion and Further Work

My colleagues at MobileLife and I, want to continue our work on MVM to ad- dress these issues. A few fresh features also deserve some efforts, like expanding the system to provide the director with more editing capabilities. Utilizing HD video is in our intention too; which requires live streaming service providers like Bambuser support this as well. We are also interested in testing the system in larger and different events like ice hockey or football matches over time. This will help us better understand how the application will be used to co-produce a footage in real situations. Particularly we want to conduct other studies to measure the delays and see how the lags could affect the collaboration. References

1. Bambuser. http://bambuser.com. accessed August 24 2010. 2. Xml web services basics. http://msdn.microsoft.com/en-us/library/ms996507.aspx, De- cember 2001. accessed August 25, 2010. 3. C# language specification version 3.0. http://download.microsoft.com/download/3/8/8/388e7205- bc10-4226-b2a8-75351c669b09/CSharp Language Specification.doc, 2007. accessed August 31, 2010. 4. Introduction to directshow. http://msdn.microsoft.com/en- us/library/dd390351(v=vs.85).aspx, June 2007. accessed August 24 2010. 5. Actionscript technology center. http://www.adobe.com/devnet/actionscript.html, July 2009. accessed December 24, 2010. 6. Adobe flash platform blog. http://blogs.adobe.com/flashplatform/, July 2009. accessed December 24, 2010. 7. Building a live video switcher with flash communication server mx. http://www.adobe.com/devnet/flash/articles/live video switcher print.html, July 2009. accessed Spetember 1, 2010. 8. F4v/flv technology center. http://www.adobe.com/devnet/f4v.html, July 2009. accessed September 2, 2010. 9. Flash directshow source filter. http://www.medialooks.com/products/directshow filters/flash source.html, 2009. accessed August 24 2010. 10. Multigraph sink/source. http://www.medialooks.com/products/directshow filters/multigraph sink source.html, 2009. accessed August 25 2010. 11. Swf technology center. http://www.adobe.com/devnet/swf.html, July 2009. accessed December 20, 2010. 12. Adobe - flash lite. http://www.adobe.com/ap/products/flashlite/, 2010. accessed August 24 2010. 13. Adobe. Using adobe flash media live encoder 3. 14. Mathias Broth. The production of a live tv-interview through mediated interaction. Recent Developments and Applications in Social Research Methodology (Proceedings of the Sixth International Conference on Logic and Methodology), 2004. SISWO, Amsterdam. 15. Emelie Dahlstrom.¨ Att dokumentera och uppleva med live video, en utvardering¨ av tva˚ mobila applikationer for¨ live videoredigering (documenting and experiencing with live video, an evaluation of two mobile applications for live video editing). MobileLife, 2010. 16. A. Engstrom,¨ M. Esbjornsson,¨ and O. Juhlin. Mobile collaborative live video mixing. In Proceedings of the 10th international conference on Human computer interaction with mobile devices and services, MobileHCI ’08, pages 157–166, New York, NY, USA, 2008. ACM. 17. Arvid Engstrom,¨ Liselott Brunnberg, Josefin Carlsson, and Oskar Juhlin. Instant broad- casting system: mobile collaborative live video mixing. In ACM SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation, SIGGRAPH ASIA ’09, pages 73–73, New York, NY, USA, 2009. ACM. 42 References

18. Arvid Engstrom,¨ Mattias Esbjornsson,¨ Oskar Juhlin, and Cristian Norlin. More tv! - sup- port for local and collaborative production and consumption of mobile tv. In In pro- ceedings of the 10th international conference on Human computer interaction with mobile devices and services, 2010. accessed September 02, 2010. 19. Arvid Engstrom,¨ Oskar Juhlin, Mark Perry, and Mathias Broth. Temporal hybridity: footage with instant replay in real time. In Proceedings of the 28th international con- ference on Human factors in computing systems, CHI ’10, pages 1495–1504, New York, NY, USA, 2010. ACM. 20. Alex Homer et al. Book Review: Professional Active Server Pages 3.0, chapter Chapter 1 - Overview of Internet Information Services 5.0. Wrox Press Ltd., Birmingham , UK, 2000 1999. Reviewed by John Meade, Microsoft Corporation. 21. Vangelis Gazis, Nikos Houssos, Nancy Alonistioti, and Lazaros Merakos. On the complexity of always best connected in 4g mobile networks. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.646. 22. Oskar Juhlin, Arvid Engstrom,¨ and Erika Reponen. Mobile broadcasting: the whats and hows of live video as a social medium. In Proceedings of the 12th international conference on Human computer interaction with mobile devices and services, MobileHCI ’10, pages 35–44, New York, NY, USA, 2010. ACM. 23. David Kirk, Abigail Sellen, Richard Harper, and Ken Wood. Understanding videowork. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’07, New York, NY, USA, 2007. ACM. 24. Paul Luff and Christian Heath. Mobility in collaboration. In Proceedings of the 1998 ACM conference on Computer supported cooperative work, CSCW ’98, pages 305–314, New York, NY, USA, 1998. ACM. 25. Mark Perry, Oskar Juhlin, Mattias Esbjornsson,¨ and Arvid Engstrom.¨ Lean collaboration through video gestures: co-ordinating the production of live televised sport. In Proceed- ings of the 27th international conference on Human factors in computing systems, CHI ’09, pages 2279–2288, New York, NY, USA, 2009. ACM. 26. Risto Sarvas, Mikko Viikari, Juha Pesonen, and Hanno Nevanlinna. Mobshare: controlled and immediate sharing of mobile images. In Proceedings of the 12th annual ACM interna- tional conference on Multimedia, MULTIMEDIA ’04, pages 724–731, New York, NY, USA, 2004. ACM. 27. Akemi Tazaki. Instantsharecam: turning users from passive media consumers to active media producers. In In proceedings of the 2007 conference on Designing pleasurable prod- ucts and interfaces, 2006. 28. http://telia4g.se/. accessed 09 December 2010. 29. Alexandra Weilenmann. Negotiating use: Making sense of mobile technology. Personal Ubiquitous Computing., 5:137–145, January 2001.