Lean Collaboration Through Gestures: Co-ordinating the Production of Live Televised Sport Mark Perry1, Oskar Juhlin2, Mattias Esbjörnsson2, Arvid Engström2 1SISCM, Brunel University 2Mobile Life, Interactive Institute Uxbridge, Middx, UB8 3PH, UK Box 1197, SE-164 26 Kista, Sweden “Authors listed in reverse alphabetical order” {arvid.engstrom, oskarj, mattia}@tii.se ABSTRACT setting with limited means for communication and where This paper examines the work and interactions between the hands of the participants are constantly moving, relies camera operators and a vision mixer during an ice hockey on the use of in-camera gestures for coordination. match, and presents an interaction analysis using video As we will show, live and collaborative conditions conspire data. We analyze video-mediated indexical gestures in the to make a particularly fast-moving and collaborative production of live sport on television between complex activity, yet one that is practically managed with distributed team members. The findings demonstrate how extremely lean mechanisms of interaction. In this, we have video forms the topic, resource and product of collabora- chosen to examine a particular form of television, the pro- tion: whilst it shapes the nature of the work (editing), it is duction of live sport. Here, the fast moving nature of its simultaneously also the primary resource for supporting topic matter also contributes to particular forms of video- mutual orientation and negotiating shot transitions between based co-ordination and image production techniques that remote participants (co-ordination), as well as its end prod- are suited to this type of activity. uct (broadcast). Our analysis of current professional activi- ties is used to develop implications for the design of future In this paper, we make a close empirical examination of a services for live collaborative video production. professional television production team to examine how Author Keywords gestures are used to coordinate the production of seamless Live TV, collaboration, communication, video production, broadcast during game play. The applied aim of this sport, mobile technology, indexical gestures. research is to influence the design of systems ACM Classification Keywords such as the emerging field of mobile collaborative video H.5.3 Group and Organization Interfaces: H.5.3. CSCW; production tools [5,26]. Using networked camera phones, Synchronous Interaction. such as the Nokia N-series, it is possible to mix concurrent video streams from multiple users for public display. Such INTRODUCTION situations might include the broadcast of live images of In CHI, the topics of video production, and video as a motor sports by fans or of soccer matches by parents. means to support collaboration, have always been consid- ered separately. We show how both these topics come to- In our examination of collaboration in the production of gether as part of ongoing work in the practice of TV pro- live video of sporting events, we focus on the details of the duction. Our interest in the collaborative production of live on-going interaction between cameras and production team. television is to understand how the co-ordination of multi- In particular, we consider the specific narrative feature of ple people takes place around and through the video materi- mixing of overview shots with detailed, or close up shots, als that they are preparing for broadcast. This video produc- during highly mobile and complex game sequences. By the tion process is unique in that it involves collaboration in, on use of narrative, we simply refer to the organisation of and through video: by this we mean that collaborative activ- broadcast video that viewers of the unfolding game connect ity is in (i.e. working within the medium) and through (i.e. and interpret as to its outcomes through the assembly of a communicating across the medium) video, as it is with coherent sequence of visual images in real time. Since col- video conferencing, and has been noted in previous re- laborative video technologies and co-ordination practices search [7,22]. However, in the case of live TV production, do not exist within the consumer market, we have little op- video is also the topic of concern of the collaboration tion to understand how these activities might operate within (hence, on the medium). The end product of the team’s col- our target population. Thus, although our aims are to inform lective work involves the assembly of the broadcast media the design of future consumer (i.e. non-professional) tech- that they simultaneously use to organise the collaborative nology, we have applied Sacks’ Gloss [6] here. This rec- production process through. This type of collaboration, in a ommends that researchers should search out people who in their daily work are professionally responsible for accom- plishing a particular social activity (i.e. gesture based TV Permission to make digital or hard copies of all or part of this work for co-ordination) in order to understand how that social activ- personal or classroom use is granted without fee provided that copies are ity is practically managed. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2009, April 4–9, 2009, Boston, Massachusetts, USA. 1 RELATED WORK ON EDITING LIVE VIDEO versation Analysis to investigate video mediated workplace Given that television forms one of the most pervasive fea- interaction between team members. He shows how the in- tures of contemporary life and that as an industry it occu- teraction between the vision mixer and ‘script’ in the con- pies a major economic sector within the developed world, it trol room and the camera operators in the studio is to a con- is perhaps surprising that the collaborative production of siderable extent non-verbal and relies on all members’ abil- televised material has received scant academic attention. ity to predict each other’s actions. Camera operators com- This is also somewhat peculiar, given that in a parallel set municate with the control room through their choice of of developments we have recently seen a turn towards the framing shots and camera movement. Broth calls this form production and consumption of video material by end users, of gestural communication ‘proposal-acceptance’: camera with users developing their personal mobile and video operators propose shots by stabilising their camera; if a shot blogs, utilising resources such as youtube.com and produc- is selected, they remain with this image until the next cam- ing citizen journalism [e.g. 4,8]. In the area of user- era is selected and their red ‘on air’ light becomes dimmed. generated content, this turn towards examining the design The role of gestural communication is well documented and use of visual display media within HCI and CSCW sits when it comes to studies of video mediated communication. with a broader focus on the technical editing of visual mate- Luff et al. [19] report how participants’ gestures involves rials that builds on the ready availability of powerful mul- much more than simple reference when participants coordi- timedia computing, high bandwidth networks, and mobile, nate their actions with each other: …gestures can be used to wireless and video-enabled devices. demarcate different elements of an object, to exaggerate Few academic texts focus on coordinating practices in live features and characteristics, and to animate and embed TV production and the use of broadcast technology in the action in materials, so that they gain a significance, then production of sports [17]. Much of the academic corpus and there, that they might not otherwise have.... Through deals with the commoditisation and commercialisation of bodily action users can act in ways that are impossible sports. In line with our survey, Grunneau [10] notes that when they are disassociated from the material objects with many scholarly articles on televised sport focus on their which they are engaged. In this way, collaborative work is textual analysis, and not on their production techniques and accomplished in and through objects and artefacts, tools, situated practices that underlie their creation. However, and technologies [15,18]. These resources not only feature some notable exceptions exist that explore the use of live in how people produce actions but also in the ways that TV footage, including Real’s [24] examination of the proc- they recognize or make sense of the actions of others. ess of ‘myth’ production and ritual, and Williams’ [29] When it comes to using video as a mediated channel for work on the structural components of production tech- communication, as well as the topic of concern, we turn to niques, looking at camera use, graphics and audio. Simi- the area of automatic video editing. A number of automated larly, Bower’s [2] work on ‘inhabited television’ integrates and semi-automated editing tools have been proposed, ca- live TV broadcasts with collaborative virtual environments. tering to a common set of problems. Most notably, amateur It focuses on the problems emerging from ‘virtual’ camera videographers lack the time and skills to produce high qual- control, and provides useful insights into coordination chal- ity video without lengthy episodes of uninteresting and lenges for the amateurs and the need to provide virtual tools badly captured material [1,9,13,30]. Furthermore, they are similar to studio camera technologies. unmotivated or find it difficult to learn video editing soft- Whilst the topic itself differs, Heath et al. [11], examine the ware. These tools let the user extract edited sequences from production of action and interaction around video displays, raw material, utilising various approaches, including image showing how (live) CCTV video footage can be used in analysis and automating established editing principles to develop coordinated responses to developing situations. discard footage deemed ‘unsuitable’ by the system Studies of distributed social scientists’ professional work in [9,13,30]. Automation of live video capture has not been analyzing video data [e.g. 27] also offer some insights here explored to the same extent. However, in a recent attempt, with respect to utterance and the timing of actions at the Ranjan et al. [23] present a system for automatic multi cam- interface, and how participants render visible the hidden era control allowing video capture in meetings. Their sys- work that makes sense of the video materials to their remote tem leverages television production principles for camera- co-participants. Recent work by Silk et al [25] explicitly work and uses input from a motion tracking system and addresses situated practices in the video production process microphones, and utilises ideal framing, movement, timing to show how they influence broadcast outputs, although and mixing in automated multi camera productions. they focus their analyses on its impact on the media organi- Complementing this move to support amateur video pro- sations themselves. However, whilst useful to understand- duction, recent work examines mobile collaborative video ing live video, these papers do not deal with the interactive production. One such study by Kirk et al. [16] investigated practices of creating broad castable footage. ‘video work’ among teenagers, concerning aspects of video One exception to this dearth of empirical work stands out. recording, editing and sharing. Analysis showed that tradi- Broth [3] examined the co-ordinating processes of a studio- tional video cameras were used relatively formulaically, based live television broadcast production team, using Con- while mobile phones were used more spontaneously in video capture. This spontaneity was also visible in the shar- ing from the various ing of , which was usually done locally immediately camera feeds on the after recording. Users did not see the point of manipulating monitors in the gallery the clips, as these were short snippets of action, and the clip and mixing these with title gave enough information for later recall. In a technical graphics and instant slant on spontaneous capture, Tazaki [26] presents a ficti- replay materials for tious, however thoroughly elaborated, conceptual design broadcast. Although (InstantSharecam) that emphasizes the collaborative proc- the demands on ess in video production. She envisions a group of users, production teams each with a video camera, simultaneously shooting and co- Figure 1: Gallery differ, some general directing coverage of an event in real time. With some simi- rules apply. The vision mixer must always be able to larities, Engström et al. [5] presents a study on how VJs present an appropriate angle of the action to the viewer. The produce and mix visuals live. The study informs the design camera operators must not only cover the live action but be of the SwarmCam prototype, which is intended for use in ready to record the unexpected [21]. club settings, where club visitors can capture video and The conditions of live sports productions, and specifically stream it directly to the VJ, who can merge the video into for arena-based sports such as ice hockey, pose particular the live VJ performance. Clearly, much of this videowork is coordination problems for their broadcast crews. They need likely to be very different to professional TV production in to ensure that the broadcast footage of in-play action en- its goals and values. Nevertheless, both [5] and [25] display ables viewers to understand the progress of the game as it similarities in that they are designed for several simultane- unfolds, yet at the same time to convey the impression of its ous users collaborating in the production process. ‘live-ness’ to remote viewers [27]. Shots therefore need to PRODUCING LIVE SPORTS TV be selected to allow viewers to appreciate these dual con- Professional handbooks on TV production provide a start- cerns, and the production team need to manage the narrative ing point for understanding collaboration and coordination of the game at the same time as producing broadcast foot- technology in this area [12,21]. Live sports TV mixing fo- age that is professionally produced, i.e. that broadcast foot- cuses primarily on rendering an aesthetically appealing and age is relatively steady, cameras are not seen to be search- understandable view of the action at all times. Visually, live ing for footage, and that cuts between cameras do not occur television production follows traditional film grammar, a at inappropriate moments of play. Thus, the production system of rules for how to effectively tell a story in images. team needs to attend to the film ‘grammar’ noted in the A main goal in editing is providing multiple viewpoints on literature review whilst under particularly challenging con- the covered action, and using these to produce rhythm, bal- ditions of action. Because of its live-ness and depending on ance between detail and overview, and a dynamic and com- the rules of the game, its location, duration and other fac- pelling sequence of images [12]. For practical reasons, the tors, the various production team members have to coordi- positioning of cameras in a hockey arena is restricted to nate their efforts to cover the action in a meaningful way. certain fixed points along the rinkside and from the bal- This is particularly problematic in conditions in which sev- cony. Too many shots taken from similar camera angles and eral events may take place simultaneously. The totality of a similar distance from the subject is considered to be dull the action may be distributed over a large area, or too fast to and tiring for the audience, and this is typically addressed be covered from one angle, and thus demands a combina- by the patterning of shots [12]. A scene often opens on a tion of coverage from different vantage points and close-up wide showing the general setting and cameras fluidly following the action. Yet this cannot be mood, followed by more closely framed medium and close- verbally articulated and negotiated, as intercom communi- up shots of the characters. This way, the viewer gets both cation is normally restricted and asymmetrical [3]. Camera the overview and the emotional closeness to the characters operators receive directions verbally and through a red tally as the scene progresses. Other predictable situations that light on their cameras, indicating they are ‘on-air’, but they reoccur throughout the production may also have prede- are unable to respond back to the vision mixer verbally: fined patterns that support editing decisions. they can only communicate back through their camerawork. The main direction and visual production of live shows is The vision mixer has a number of activities to manage si- conducted in a by a vision mixer multaneously, alongside the mechanical process of switch- (VM), who manages the selection of video for broadcast. ing between cameras and the other visual images seam- This room contains a ‘gallery’ of video monitors (Fig. 1) lessly. Part of this is to maintain a sense of narrative ‘flow’ displaying all image sources. At the centre of this typically in the game that can be interpreted and understood by its lies a main broadcast monitor and a preview monitor. An viewers, in which the game is seen by its viewers as a series intercom system enables communication between the pro- of connected shots that show how the game is progressing. duction control room and the camera operators. The role of This is also understood by the camera operators, replay op- the vision mixer in producing sports programming on live erator and graphics who have to be able to provide TV involves a degree of spontaneous selection [21], pick- timely footage that supports this narrative structure. To do this, they need to understand the way that narrative flow is

3 developed in the production of broadcast footage, but must accounts for the unfolding game action, camerawork and also be sensitive to feedback or communication from the the verbal interactions that are pertinent to the coordination vision mixer to adapt their camerawork to support the vi- of the TV production process. sion mixer in developing this narrative. In this, sport foot- age is similar to the production of TV news, involving ANALYSIS: SHOT SELECTION FOR BROADCAST sense making and the co-construction of meaning with both In the following sections, we first discuss why particular the audience and production team [20]. camera shots are selected and how the vision mixer makes her selection between the cameras. Second, we show how METHOD AND SETTING the selection of images for broadcast is practically Data collection on the live TV production process involved achieved, revealing problems that are faced by the vision a number of sources and participants, and took place during mixer and how she resolves them. Third, we examine a spe- three ice hockey matches in 2007 in Sweden. The majority cific instance of interaction between the camera operators of the empirical data collected and presented in this paper and the vision mixer, to show how the camera operators and has involved video-recording within an outside broadcast vision mixer establish an agreement on when to stay with a (OB) studio (in a custom-fitted bus). This was supple- shot, and when they need to look for new topics. We distin- mented with ethnographic observations and interviews to guish the production of in-play footage from the production aid the analysis, although the empirical analysis presented activities involved in game pauses and broadcast intermis- below relies on the video corpus. In line with Jordan and sions, and focus here on in-play footage only. As a topic, Henderson [14], we report here not on participants’ ac- these in-play activities are very complex, mobile and un- counts of action, but on the observable mechanisms of ac- predictable, and these problems generate challenges both tion, so that our theorising is responsive to the recorded for the coordination of the production team and the mainte- phenomenon itself [ibid.: 51] and not reconstructed from nance of meaningful broadcast footage. our participants’ own interpretations of their actions. In Variations in camera angles and shot selections occur as the addition to the studio data collection, we also examined the production team collaborate around their orientation to the work of the remote camera operators in their rink-side posi- specific tasks given to the camera operators. In the case of tions. We have also examined the final product of their col- in-play footage, interview data revealed a very basic separa- laboration: the broadcast match program. tion in the way the camera operators are expected to film Data collection focussed mainly on the production control the developing events: C1 always tries to provide a broad room gallery, where the broadcast images were selected, in frame, i.e. what we refer to as the overview, whereas both the OB bus situated just outside of the arena. Inside the C2 and C3 search for more tightly framed close-up footage. arena, two manned cameras (C1 and C2) were positioned But this functional separation between the camera operators on a high vantage point up on the grandstand to provide an is not sufficient to simply weave into the live broadcast as it unobscured overview of the rink (fig. 2); these cameras arises, and this requires additional image assembly by the were fixed, but free to pan and tilt. These two cameras pro- vision mixer to produce coherent and interesting broadcast vided the bulk of the footage, from wide long shots to footage of the game to its audience. close-ups. The manned rinkside camera (C3) moved on In order to illustrate the co-ordination of the live production wheels on a small platform slightly elevated above the ice. process, we focus mainly on one of the fundamental fea- With its low perspective, it covered a near-180 degree view, tures of sports broadcasts: how the production team enables roughly including all face-off zones (see fig. 2), but was a seamless narrative of the game by alternating between obstructed by the rink in the near corners of the field. overview and detail [12] within in-play situations. To do In total, the study generated a substantial body of video this, we discuss a specific instance of an in-play situation data. Ice hockey matches last approximately 2.5 hours, and where the technical producer deselects the overview camera with two recording cameras, the three games resulted in to broadcast a close up view of game play. An in-play detail over 15 hours of tape recordings. One camera was aimed at shot denotes an occasion where a frames a the monitors in the control room, whilst the other was close up view during the on-going game and where the vi- aimed at the vision mixer from the side. All participants sion mixer selects it for broadcast. Our data shows that such freely agreed to their participation in data collection and we in-play shots are rare and very brief compared to the over- have accorded them anonymity. These recordings were re- view shots. During most of the game, the vision mixer se- peatedly viewed in team analysis sessions, and core events lects C1, which provides a wide view of about a third of the transcribed and categorized. The first part (of three) of one ice rink at any given time. Occasionally however, she se- of these games was selected for particular analysis, and lects detailed shots of in-play situations (C2 or C3), allow- forms the basis of the following analysis. Whilst video re- ing the viewers to see close-up shots of the action. cording is increasingly used in data collection during work- place studies in HCI and CSCW, there is, as yet, no com- Selecting in-detail shots for broadcast From a narrative perspective, there are several reasons for mon transcription coding scheme for remote, multipartici- broadcasting in-play detail shots. In most circumstances, pant activity similar to that used in Conversation Analysis. the only way to understand the progress of match play is to Consequently, we have developed a coding scheme that see the interactions between players over the whole rink. Individual players’ activities need to be understood in the context of the interaction of many players, the details of which are only meaningful if viewers have an overview of the broader game. This overview is readily provided by C1, which frames a view of the play including most of the play- ers and the puck. Yet although this overview shows the team at work, it misses out on other aspects of team-based game play. In this respect, detail shots allow an apprecia- tion of individual skills, emotional expressions, and so on, which can only be seen with a close up or zoom lens shot. As much of the skill in ice hockey lies in one-on-one play, Figure 2: Location of in-play detail shots (C2 and C3) to maintain a sense of richness in the narrative, the detailed skilled reading of the game by the production team, in the activities of individual game play need to be shown. Over- same way as these might be understood by a knowledgeable view shots provide a poor level of detail, necessitating the supporter. This allows the production team to make reason- vision mixer and camera operators to provide footage that able and reliable assumptions about events in the rink, and supports this narrative production at local and global levels, also about the behaviour of the other members of their whilst maintaining a smooth sense of transition between team. In the case of a defender bringing up the puck, an in- these different levels of focus. detail shot can also be reliably selected because the next We have documented all of the occasions where the vision action in the game is largely predictable. In such cases, the mixer chooses to include a detailed view from C2 or C3 to defender will be looking for the possibility of making a pass help reveal the narrative and interactional concerns that to a player in the attacking line (i.e. a centre forward or emerge in the live production process. A careful analysis of winger). The vision mixer can broadcast the in-detail shot, our empirical material reveals 28 selections of in-play detail whilst preparing to select C1 at the moment of the pass, shots during the first period of play. When those occur- which would reveal where the puck went, and at the same rences were plotted vis-à-vis their specific location on the time, the audience can also be prepared for making sense of rink (fig. 2) we identified several common patterns in the that next action (i.e. the subsequent pass). The camera op- data. First, there are the selections of C3, of which all (ex- erators can also orient their activities to provide footage to cept on one occasion, 5) occurred immediately in front of meet the vision mixer’s image requirements. Thus, the pro- the camera operator’s position (6,8,16,20,25). We refer to duction team orients to a pattern within the game around this pattern as rink-side. The remaining in-play detail shots which they can reliably co-ordinate to provide an in-detail were selected from C2, and can be clearly differentiated as view of the action. This is a situation that is recognizable by displaying either tackles (3,4,11,12,18,19,22,23,28), or a the vision mixer, camera operators and the TV audience as player from the back bringing up the puck from their own ‘a back player bringing up a puck’, where the next action zone (1,2,5,7,9,10,13-15,17,21,24,26,27). In the following will be ‘delivering a pass forward.’ discussion, we examine why these situations are selected In these instances, the shots selected for broadcast not only while a diversity of other situations were not. We argue that have to account for the whereabouts of the players, but also the vision mixer’s selection of a close up camera is guided the puck. Problematically for the production team, the puck by narrative concerns, as well as by the practical constraints moves much faster than the players. Thus, if an in-detail of the live production process. view is provided of a player and a puck, it is very likely that It appears that many in-play detail shots are selected when one of these two objects will not be in the frame for very the players are moving slowly and where this may have a long. In order to resolve this, as soon as an important narra- potential impact on future game outcomes. This account for tive feature of the game is unavailable in the detail shot, the both tackle cases, where the players’ movements across the vision mixer will select an overview from C1. However, the ice are very limited, and when a defender skates slowly viewer and the vision mixer still need to locate the puck in towards the attacking zone. When the players are moving the bigger picture. This is easy both for an interested TV relatively slowly, camera operators have to move the cam- audience and for the vision mixer from C1, and hence it is era less than if they were skating fast. Camera operators and an obvious choice of camera selection when such a separa- the vision mixer can therefore more easily predict that this tion event occurs. particular selection can provide footage of game action that Nevertheless, the live footage shows that the VM does not will not be lost from view, with a consequent ‘empty’ always to an overview camera at such separation points. broadcast footage or the need for a sudden and visually jar- Whilst the position of the puck itself determines the out- ring cut to another camera. come of the match, the empirical data suggests that to pro- Following this point about predicting the potential impact vide a meaningful and interesting narrative of the game is of changes in the state of play, we observe that the regulari- not always seen by the VM as simply to ‘follow the puck’. ties, or patterns, in game behaviour lend themselves to This can be observed during several of the rinkside selec-

5 tions. Here, the selection of C3, showing close up footage activities and image framing made available to the vision of fast passing between players, without showing the puck, mixer by each of the camera operators. Verbal communica- was the preferred choice. This makes explicit the narrative tions are also shown. demands of displaying the speed of the game. These im- What we see in the analysis of camera selections–from all ages, although not C3’s main task in the production, are cameras, whether selected or not–is illuminating: we can repeatedly selected for broadcast when possible. We argue see what is shot on each camera as events develop over that on those occasions, the low angle and close distance time. C1 provides overview shots throughout the initial between C3 and the players provides shots that demonstrate sequence, in which several players and the puck are visible. the intensity of the game when the game action is fast mov- C2 searches for, and provides, more detailed shots. First, he ing and visually intense. These shots are visually very dif- focuses on the red defence players who make the initial ferent to the angled and zoomed images presented by C2, pass (31:03). When the puck has been passed away, he and thus are selected whenever the vision mixer can predict zooms out (31:06) and pans away for a four second long that they will become available. search, until he focuses on the defending yellow player In summary, the work of the production team is to co- (31:08). C2 stays with this shot throughout the tackle, but develop and orient towards a narrative structure in the then chooses to follow the attacking player as he skates out broadcast footage. This includes presenting an overview of of the situation. C3 provides the same type of shot as C2, the game, but at the same time, to include views on individ- but becomes obscured by other players. Just as the two ual player’s activities when exciting or relevant to antici- players collide, vision mixer says “two now” (31:10), and pated game developments. Yet it is difficult to provide a selects that camera for broadcast. The cut into C2 is ele- broadcast that includes both of these, and the production gantly timed in a way previously discussed as ‘on the ac- team constantly takes risks in this. But these risks are col- tion’, in this case, the collision between two bodies. She laboratively managed by the team as a whole, as they mutu- stays with this camera for one second to show the tackle ally orient to patterns of behaviour within the game and and when the attacking player starts to skate away. She then their expectations of the actions of other team members. says “one now”, and selects C1 again for an overview shot. Within-media co-ordination: mutual orientation to topic The detailed broadcast clip from C2 was only a second Below, we extend our analysis of when shots are selected, long, but gave a detailed view into the tackle. We will first and present a detailed transcript from a single tackle situa- discuss how the camera operators and the VM collaborate tion, revealing how this is managed as an ongoing collabo- to make this possible. An important feature in the co- rative achievement by its members, and more specifically, production is to achieve mutual understanding between the to show how the vision mixer communicates with the cam- camera operator and the vision mixer so that searching for era operators, how they respond, the selection of cameras interesting topics is postponed, and that the camera will stay for broadcast, what the different cameras deliver, and the on a shot as long as it is broadcast. By this, they avoid practical limitations on the production and selection of broadcasting irrelevant or poorly framed footage of the footage for broadcast. Here, the role of the camera opera- game. In this case, we argue that C2 proposed a shot to the tors is to select and film shots for broadcast. Since the topic vision mixer by focusing on the defending yellow player of their concern (that is the game action) is highly mobile, (31:08). From that moment, C2 has the yellow player in the the camera will be moving almost all of the time. This is the frame, and in focus. Since the player is constantly moving case both when cameras are focused on the action, and towards the end of the rink, the camera moves accordingly. when the camera operator is searching for a shot. The ex- Here, Broth’s proposal-acceptance mechanism that differ- cerpt in Table 1 illustrates both these practical constraints, entiates between a moving and a steady camera to convey and how camera selection is still possible, without creating the camera operator’s intention to search or stay with a a misunderstanding between the VM and camera operators, shot, is not possible. Yet somehow, the vision mixer identi- and their subsequent broadcast of poor quality footage. fies C2s shot as a viable proposition and selects it. Both players are by then at the same place for a second, until one The excerpt begins with the red team attacking, as a red skates away (31:11), which requires both C1 and C2 to player passes the puck to the yellow end of the rink. The move their cameras. We argue that C2 could be selected puck shoots between the defending yellow players and be- because the vision mixer identified a proposition, i.e. the hind their goal cage. A red player and a yellow player chase way in which the cameraman consistently framed an indi- after the puck. The defender looks over his shoulder before vidual player (31:08). Thus, the proposition was made be- decelerating at the rink side to get hold of the puck, which cause they could both recognise the content as stable and he then passes on. The attacking player stops skating and that he was going to stay with this player. Effectively, and glides towards the defender at the rink-side, ending with in addition to doing his job of searching out interesting giving him a hard tackle towards the rinkside. The puck shots to broadcast, the camera operator ‘points’ with his passes away, and the game continues. In the excerpt, ‘Time’ camera, making an indexical gesture to the VM as to what refers to the tape time indicator, showing minutes and sec- he intends to do (i.e. provide stable footage for the duration onds of footage; ‘Broadcast’ shows the camera currently of the unfolding event) and that he is ready to go live. selected for live broadcast, and C1, C2 and C3 show the Time Broadcast C1 C2 C3 Communication 31:03 C1 Overview of red team’s zone Frames a close-up shot of the Attempts to cover passing on left side of the rink, show- player passing the puck. player, but this shot is blocked ing a defender grabbing the by other players. puck.

31:05 C1 The defender makes a long Pans right to follow the puck. Switches framing: pans swiftly pass towards the yellow end from the passing attacking of the rink. Camera pans right player, zooms in to find the to cover the pass. puck

31:06 C1 Overview of yellow team’s Switches framing: pans As above zone: shows defending player swiftly, zooms in on yellow skating to get the puck. At- player tacking player approaches from behind at high speed.

31:08 C1 Defender passes the puck Focuses and frames on yel- Zoom in and frames the defend- forward. low defender gliding towards ing player. the rinkside. Attacking player enters the frame just before the tackle.

31:09 C1 Overview of the yellow As above. Frames the yellow defender. team’s zone. Attacking red player tackles the defending yellow player.

31:10 C2 Moves away from the tackle Close up on the red player Shot is blocked by another Vision mixer: “two, to follow the puck, which tackling the yellow player player at the time of impact of now” leaves toward the near corner Stays on the tackle situation the tackle. Stays on the tackle just after the impact of the as it dissolves. situation as it dissolves. tackle.

31:11 C1 Overview of the yellow Follows the attacking player Follows the attacking player Vision mixer: “One, team’s zone (on left of cam- while he skates away. Cam- while he skates away. Camera now” era) era does not follow the puck, is not following the puck, which still is in play. which still is in play.

Table. 1: Excerpt showing camera choice and interaction during in-play detail focus Such topic-oriented coordination depends on joint recogni- or after the tackle but just as it starts (indeed, she cut on the tion of a specific and narrative-relevant event, in this case a precise frame prior to the tackle). These activities, taken tackle, and that they do this before it happens is empirically together, support the argument that the organisation of the available in the excerpt. We argue that C2’s framing of the production is based on mutual orientation towards specific yellow player (31:08), despite his being without the puck is topics of narrative concern. This is made possible since the done since the camera operator recognises it as being an camera operators and the vision mixer do not simply cap- important part of the game’s narrative. The ways in which ture what is happening on the ice, but predict what is going they ‘recognise’ the game-play as an upcoming tackle is to happen. So, their coordination is based on identifying visible since both C2 and C3 proposed a detailed shot of the relevant topics. This is important in both establishing an defence player even before the attacking player reached agreement on when camera operators should stay focussed, him. They also both left enough space in the frame for the and also when they should go looking for other topics. attacking player to enter it before the tackle. This indicates Yet selection of cameras by the VM does not only include an understanding of the event that is to come. Furthermore, interpreting the players’ intentions, but also the practical the vision mixer cuts from C1 into C2 just as the bodies constraints of the situation at hand. When the vision mixer collide. This would be hard to do if she was not orienting actually selected C2, the camera operator had just stopped towards it as an upcoming situation. It is also noteworthy moving and zooming (in Table 2). We suggest that this is of that she makes the cut exactly ‘on the action’, i.e. not before

7 importance for the selection, but for different reasons than dictability provides an opportunity for computationally Broth’s interpretation of the meaning of camera movements supporting broadcast image selection. We can see in- [3] in which camera stability indicates readiness to go live. stances where image or pattern recognition might be useful Since ice-hockey is highly mobile, it can be practically dif- in the data, so for example, when players are filmed moving ficult to provide detailed shots of action in the game. If a slowly this allows selection of an in-detail shot. The prob- player makes a pass or a shot it is very hard for C2 and C3 lem here for automated editing is that identifying material to provide detailed shots of the puck being passed, and so for broadcast relies on a deep understanding of the game. they do not attempt to do so. We argue that C2 could be The team as a whole has to recognise emerging game action selected in this case because the game play provided a tem- as being of a particular type of activity, They must then porarily immobile situation. Thus, the steady camera foot- orient towards this activity in guiding their own subsequent age provided by C2 was taken as a practical opportunity by activities to help the viewers appreciate the overall picture the vision mixer that it would be possible to broadcast. of play or the performance of individual players. Moreover, Thus, the team has to orient towards emerging topics, rather many situations are often extremely brief and need to be than just what is currently happening on the ice. recognised as likely to take place before they actually occur Summing up, we argue that mixing overview shots with in order to select an appropriate and timely in-detail shot. This makes automation incredibly hard to achieve, and as detail shots was made possible through mutual orientations ‘intelligent’ image analysis is still in its infancy, it will be to topics, which was possible because the temporal unfold- ing of actions was a recognisable feature, and because this unlikely to offer much support in these circumstances. narrative feature was only used during brief moments with Our analysis also demonstrates the interpretative flexibility a relatively immobile character. Thus it was possible to dis- around the rules that the production team follows–or rather, play one-to-one actions with more emotional features, such as we emphasise, the rules that they orient towards. In this, as face expressions and body impacts. So, although it would there is an inherently situated dimension to creating a narra- initially appear that they are simply filming in their as- tive through the choice of broadcast video during the game. signed roles, the camera operators are actively engaged in For example, in interviews, we were told the general rule showing–through the ways that they frame their footage– for filming was to follow the puck. However, the pattern that they are both ready (or not) to ‘go live’, and are attend- that we see in Table 1 diverges from this: the cameras stay ing to narrative concerns in the ongoing game play. We call on the tackling player through the tackle, and only then the form of shot proposal that is used by camera operators a move on to display the puck. This also happens occasion- mediated indexical gesture: a form of communication that ally in the ‘rink-side’ pattern. So, there are other concerns is enabled and limited though the particular media at their in editing than just showing the puck that interest the pro- disposal, which is both the topic of their work as well as the duction team: there are rules that are somewhat conflicting, means though which they are able to convey their inten- and there is an interpretative flexibility of what to show in a tions. Understanding this lies at the heart of the production given situation. This also makes automated editing prob- of live sport coverage. lematic, in that rule-based processes are largely inflexible and it is inappropriate to simply cut the camera footage to IMPLICATIONS FOR DESIGN track and broadcast video footage showing the puck. The design implications relate to three areas of non- professional live video production. First, we consider the The intricate nature of a sports event makes it difficult to automated selection of candidate broadcast footage, draw- produce through full automation, although there are oppor- ing on our analysis to show when this might be appropriate, tunities here to offer live video editors automated support in and how it could be practically achieved. Second, we ex- making decisions to broadcast relevant footage. While cur- plore the issue of task allocation between team members. rent systems allow live video editing from several video This relates specifically to camera operators whose task streams to provide variations in between detail shots and responsibilities may be unstable or where they have differ- overview shots in the constrained setting of an office meet- ent interests in the game to the rest of the production team. ing [23], our analysis suggests that such design approaches Finally, we consider camera-mediated coordination mecha- will struggle with complex settings and camera configura- nisms. In this respect, our analysis suggests that it is both tions such as those seen at sports events. possible and useful to orchestrate the organisation of the We suggest that the live editing process could be aug- live production process through the camera and editing mented or supported through partial automation. Although software itself, rather than mediating communication they are not absolute rules that are invariably followed, through additional independent technologies. These impli- regularities in the editing process, such as those observed in cations are not intended to be specific to ice hockey, but to the in-detail framing and selection of slow moving players, live sports in general under conditions of fast-moving game could be brought to the attention of the video editor as a play, although they are intimately connected to our analysis suggested shot, for example, when the broadcast had settled and illustrated with data from the hockey study. on an overview for a long period of time. This option to 1. Recurrent activities and recognisable patterns select a particular video stream offers editors contextually There is a degree of regularity in game play, and this pre- relevant footage that they could opportunistically choose to use under appropriate conditions. Alternatively, they could to overhear a verbal commentary of the game to assess the choose not to select if this material if it was deemed not to broadcast foci), and that they were in a position allowing meet broadcast criteria, such as relevance to narrative con- them to offer relevant footage (e.g. providing the editor cerns or quality thresholds. with the locations of the camera operators). 2. Task negotiation and job allocation 3. Camera-mediated coordination mechanisms In order to effectively cut between overview and close-up We have shown that camera use provides a central role in footage, camera operators need to be available to film, be video collaboration, and this suggests a foundation on reflexively oriented to narratively relevant topics of interest which to implement an adapted form of camera-based inter- and in position to offer relevant footage. This is not a prob- action that might also be applicable to support for the non- lem for the professional camera operatives in our data, but professional production process. Our video data shows that these conditions cannot be presupposed when mixing live the conditions that the live production team examined work amateur footage. We have seen that the position of cameras under are highly time-limited, with a rapidly moving and and the knowledge that the vision mixer has of the func- dynamic game to take to air, asymmetrical patterns of col- tional roles of their operators and their expertise in reading laborative interaction and resources for communication the game leads to reasonable expectations of their future between the remote camera operators and vision mixer, and actions. This is important in her making sense of when a demand to produce televisually exciting material. This camera operators will choose to select or search for footage, situation is not likely to differ for amateur production and how they will interpret and reflexively orient towards teams, yet they are likely to be operating under substantially the unfolding broadcast narrative of the game. less well organised conditions, and will require additional We have also shown how the mixing of overview shots support for a successful outcome. with detail shots in play requires skilful action. The vision The function of interactions between the vision mixer and mixer did not select in-detail shots whenever possible, but the camera operator centres on in acquiring agreements on only under particular conditions in which they were narra- what the operator intends or should to do next, i.e. staying tively relevant, and practically possible to co-ordinate with- with the current footage or to search out a new topic. We out camera jumps. Yet a very real problem for editors is have shown how the complex and mobile character of the that there may be many topics of interest for amateur cam- topic at hand provides for a very lean and brief form of in- era users within a sports event, from video of their family teraction, akin to ‘grabbing’ a camera briefly for an in- and the setting itself in addition to the sport. This suggests detail broadcast. This is not a trivial collaborative problem, an opportunity for design in providing support for building but is successfully managed with very limited communica- a common narrative. tion through a one-way audio link, in combination with As our analysis shows, mixing between overview shots and mediated indexical gestures through camera use. This ob- servation lends itself to using in-detail shots depends on a clear functional separation be- lightweight in-camera for co-ordinating amateur productions. Thus a tween camera operators in framing the topic at hand. In mechanisms person making a live broadcast could ‘grab’ remote camera professional production this is relatively straightforward in that their tasks are pre-allocated, but for amateurs, this may footage providing a detailed shot for a brief moment, whilst be more flexible and open to negotiation. Amateur teams indicating to the camera operator that this was in progress with a signal such as a tally light to indicate that they were might decide on task allocations before the event, in line with the professional team. However, and depending on currently ‘live’. This would not require camera users to negotiate or articulate the shot proposals or selections how formal or stable these video collaborations are, partici- through other co-ordination technologies in these instances. pants might need to do ad hoc negotiations on this during the production, and might require dynamic allocations as We could extend this to provide a different coloured tally users entered and left the location or moved around within light to indicate to users that they had been pre-selected as potential next selections, signifying that they should stay on it. Whilst a basic and important feature in the coordination shot. Similarly, camera operators could make use of buttons of TV production is the stable and known positions of the on the camera to indicate that they were cameras, in an amateur production the positions and avail- ready to go live and awaiting potential selection. abilities of camera may have to be negotiated and articu- lated. Here, we can see value in supporting articulation DISCUSSION AND CONCLUSION work around how requests for a particular type of footage Methodological issues arise from our strategy applying might be dynamically sought out or allocated. In reference Sacks’ Gloss to a professional production team and extend- to the three issues identified above, interactive technology ing this to designing for amateurs. The general tenets of offers opportunities to support the vision mixer in determin- user-centred design (UCD) would require us to understand ing camera availability (e.g. through providing contextual the user, but we argue that UCD is not relevant here: there awareness on the user’s current activities), in supporting is no current technology for supporting amateur users, or a collaborative orientation to recognised topics of interest set of existent practices that they might collaborate through. (e.g. by providing a video backchannel to devices of the What can be seen is the general relevance of Sacks’ Gloss unfolding broadcast narrative of the game, or allowing them in uncovering the problematic nature of the work and its co-

9 ordination, although we recognise its limitations in support- by the people, for the people. Sebastopol, CA: O'Reilly. ing design, for e.g. in the professional practices that arise 9. Girgensohn, A. et al. (2000) A semi-automatic approach from training and enculturation which are not likely to be to home video editing. In Proc. ACM UIST, 81-89. apparent in amateur and ad hoc teams. Nevertheless, such 10. Gruneau, R. (1989) Making spectacle: A case study in analysis highlights these specifically professional practices, television sports production. In Wenner (Ed.), Media, and allows us to consider their particular role in the unfold- Sports and Society. Thousand Oaks, CA: Sage, 134-154. ing collaborative action. 11. Heath, C., Luff, P. and Svensson, M.S. (2002) The role played by video in the collaborative production of Overseeing organizations: configuring action and its live TV is rather different to the work of social scientists environment, Brit. Jnl. Sociology, 53(2), 181-201. examining the use of technology to mediate interaction 12. Holland, P. (2000) The Television Handbook, 2nd ed. through discourse (i.e. in and through the media). This is London: Routledge. because, in our case, it is the media through which commu- 13. Hua, X.S., Lu, L., et al (2003) AVE-Automated home nication takes place that also provides the focus of work, video editing. In Proc. ACM Multimedia, 490-497. and thus defines the mechanisms through which interaction 14. Jordan, B. and Henderson, A. (1995) Interaction occur. Live video production is unusual in that it tightly analysis: foundations and practice. Journal of the couples topic and medium: video does not just provide a Learning Sciences, 4(1), 39-103. communicative back channel for collaborative interaction, 15. Kirk, D., Crabtree, A. and Rodden, T. (2005) Ways of but it is also the task through which this collaborative action the Hands. In Proc. 9th Conf. on ECSCW, 1-21. develops. The gestures used by camera operators to indicate 16. Kirk, D., Sellen, A., Harper, R. and Wood, K. (2007) their readiness to shoot footage and consequently inter- Understanding videowork. In Proc. ACM CHI, 61-70. preted by image editors in selecting live video streams are 17. Krein, M.A. and Martin, S. (2006) 60 Seconds to Air: therefore critical in co-ordinating the video production television sports production basics and research review. process (see also [3]). Live sports footage is especially un- In Raney and Bryant (eds) Handbook of sports and usual in that for practical reasons, camera operators cannot media. LEA: NJ. use either voice or rich symbolic gestures (e.g. nods or 18. Luff, P. et al. (2003) Fractured Ecologies: Creating shakes), but are only able to make indexical gestures (liter- Environments for Collaboration. HCI, 18(1), 51-84 ally, pointing though their viewfinders) at their topic matter 19. Luff, P. et al. (2004) Working Documents. In Aizawa, due to the speed of unfolding events and other situational Nakamura, and Satoh (Eds.) PCM 2004, LNCS 3331, concerns. Thus, coordination is performed with a minimum 81-88. Springer-Verlag: Berlin Heidelberg. of rich interactive sequences and few utterances. These findings from our in-detail analysis of professionals’ prac- 20. Malone, P. (2004) TV News as narrative: the 'real' story. tices on collaboration in, on and through video, in the proc- Ann. Meeting of Int. Comms. Assoc., New Orleans. ess of producing a live TV broadcast have distinct implica- 21. Millerson, G. (1999) Television Production. Woburn, tions for the future design of consumer technologies sup- MA: Focal Press. porting collaborative video production. 22. Nardi, B.A. et al. (1993) Turning away from talking heads: the use of video-as-data in neurosurgery. In Proc. REFERENCES ACM CHI, 327-334. 1. Adams, B. and Venkatesh, S. (2005) IMCE: Integrated 23. Ranjan, A. et al. (2008) Improving meeting capture by media creation environment. ACM Trans. Multimedia applying television production principles with audio and Computing, Commun. and Applications, 1(3), 211-247. motion detection. In Proc ACM CHI, 227-236. 2. Bowers, J. (2001) Crossing the line: a field study of 24. Real, M.R. (1975) Super bowl: mythic spectacle, inhabited television. BIT, 20(2), 127-140. Journal of Communication, 25(1), 31–43. 3. Broth, M. (2004) The Production of a live TV-interview 25. Silk, M., Slack, T., & Amis, J. (2000) Bread, butter, and through mediated interaction. In Proc. Int. Conference gravy: an institutional approach to televised sport on Logic and Methodology. SISWO, Amsterdam. production. Culture, Sport, Society, 3, 1-21. 4. Cohen, K. R (2006). What does the photoblog want? 26. Tazaki, A. (2006) InstantShareCam: turning users from Media, Culture & Society, 27(6), 883-901 passive media consumers to active media producers. 5. Engström, A., Esbjörnsson, M. and Juhlin, O. (2008) Workshop paper, CHI’06. Mobile Collaborative Live Video Mixing. In Proc. ACM 27. Tutt, D. et al. (2007) The distributed work of local MobileHCI’2008, 157-166. action: interaction amongst virtually collocated research 6. Garfinkel, H. and Wieder, D.L. (1992) Two teams. In Proc. ECSCW, 199-218. incommensurable, asymmetrically alternate 28. Verna, T. (1987) Live TV: an inside look at directing technologies of social analysis. In Watson & Seiler and producing. Focal Press: London. (Eds) Text in context. Sage: London, 175-206. 29. Williams, B.R. (1977) The structure of televised 7. Gaver, W. W. (1992). The affordances of media spaces football, Journal of Communication 27 (3), 133-139. for collaboration. In Proc. ACM CSCW, 17-24. 30. Yip, S., Leu, E. and Howe, H. (2003) The Automatic 8. Gillmor, D. (2004). We the media: grassroots journalism Video Editor. In Proc ACM Conf. Multimedia, 596-597.

11