Augmenting Live Broadcast Sports with 3D Tracking Information
Total Page:16
File Type:pdf, Size:1020Kb
Augmenting Live Broadcast Sports with 3D Tracking Information Rick Cavallaro Sportvision, Inc., Mountain View, CA, USA Maria Hybinette ∗ Computer Science Department, University of Georgia, Athens, GA, USA Marvin White Sportvision, Inc., Mountain View, CA, USA Tucker Balch School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, USA Abstract In this article we describe a family of technologies designed to enhance live television sports broadcasts. Our algorithms provide a means for tracking the game object (the ball in baseball, the puck in hockey); instrumenting broadcast cameras and creating informative graphical visualizations regarding the object’s trajectory. The graphics are embedded in the live image in real time in a photo realistic manner. The demands of live network broad- casting require that many of our effects operate on live video streams at frame-rate (60 Hz) with latencies of less than a second. Some effects can be composed off-line, but must be Preprint submitted to IEEE MultiMedia Magazine complete in a matter of seconds. The final image sequence must also be compelling, ap- pealing and informative to the critical human eye. Components of our approach include: (1) Instrumention and calibration of broadcast cameras so that the geometry of their field of view is known at all times, (2) Tracking items of interest such as balls and pucks and in some cases participants (e.g., cars in auto racing), (3) Creation and transformation of graphical enhancements such as illustrative object highlights (e.g., a glowing puck trail) or advertisements to the same field of view as the broadcast cameras and (4) Techniques for combining these elements into informative on-air images. We describe each of these technologies and illustrate their use in network broadcast of hockey, baseball, football and auto racing. We focus on vision based tracking of game objects. Key words: Computer Vision, Sporting Events 1 Introduction and Background We have developed a number of techniques that provide an augmented experience for viewers of network (and cable, e.g., ESPN) broadcast sporting events. The final televised video contains enhancements that assist the viewer in understanding the game and the strategies of the players and teams. Some effects enable advertise- ments and logos to be integrated into the on-air video in subtle ways that do not distract the viewer. Example visuals include (1) graphical aids that help the audi- ence easily locate rapidly moving objects or players across the ice in hockey or the ball in football and baseball; (2) information such as the speed or predicted trajec- tories of interesting objects such as a golf ball, golf club or a baseball or bat; (3) ∗ Corresponding author. Address: University of Georgia Email address: [email protected] (Maria Hybinette). URL: http://www.cs.uga.edu/˜maria (Maria Hybinette). 2 highlights of particular locations within the arena, such as in soccer, marking the offside line – a line that moves dynamically depending on the game play. Fig. 1. In the K-Zone system multiple technologies, including tracking, camera instrumen- tation and video compositing come together at once to provide a virtual representation of the strike zone over home plate and a occasionally a trail on the ball. A cornerstone of our approach is a careful calibration of broadcast cameras (those that gather the images that are sent to the viewing public) so that the spatial cov- erage of the image is known precisely [Honey and et al (1999); Gloudemans and et al (2003)]. A separate but equally important aspect of the approach is the sensor apparatus and processing that provides a means for locating and tracking important objects in the sporting event, for instance, the locations of the puck and players in a hockey game [Honey and et al (1996)]. The raw video images and their augmentations (the graphics) are merged using a technique similar to the chroma-key process [White and et al (2005)]. However, our compositing process is more capable than standard chroma-keying because our approach allows translucent (or opaque) graphical elements to be inserted between foreground and background objects. A novelty is that our approach allows for spec- ifying multiple ranges of colors to be keyed, and some specifically not keyed. This enables the addition of graphics such as corporate logos or other graphical details 3 such as a virtual off-sides line in soccer to the ground on a field in such a way that the graphics appear to have been painted on the grass. This work is driven by the demands of network television broadcasting and is ac- cordingly focused on implementations that run in real time and that are econom- ically feasible and practical. Nevertheless, many of these ideas can be applied in other arenas such as augmented reality [Azuma and et al (2001)]. We will describe several applications of the approach; some incorporate computer vision to track ob- jects while others use non-vision sensors but serve to illustrate important aspects of our general approach. In the following sections we describe the evolution of our tracking system and broadcast effects from 1996 to 2006. Our initial system was successfully deployed during the 1996 NHL all-star game, enhanced televised live hockey with a visual marker (a bluish glow) over the puck in the broadcast image [Cavallaro (1997); Honey and et al (1996, 1999)]. Since that time, we have applied our technolo- gies to broadcast football, baseball [Cavallaro and et al (2001a,b)], soccer, skating, Olympic sports and NASCAR racing events [Cavallaro and et al (2003)]. In this paper we focus on individual tracking technologies and the core instrumentation and compositing techniques shared by the various implementations. 2 Overview: Adding Virtual Elements to a Live Broadcast Most of our effects depend on a real-time integration of graphical elements together with raw video of the scene to yield a final compelling broadcast image. Before we discuss methods for integrating dynamically tracked elements into a broadcast image, it is instructive to consider the steps necessary for integrating static graphical 4 A&?*8 A&?*8&% E4$'+/";3!F&"+ B*C3D/%&$ )'$*%#*(+ ,6'$-*37&83 :-*;& )'$*%#*(+ B*C3D/%&$ 94&'*+$' ,*-&'* <$'&;$'5"% >*" )*#=;'$5"% !"#$%&'( 0/?+ @$$- 2*++&3 2*++& ,$-45+*+/$" ,$-4$(/+$' !11&#+3:-*;& .'*46/#*? .&$-&+'/# !11&#+ !?&-&"+( 0'*"(1$'-*+/$" :-*;& !"F/'$"-&"+3 2$%&? Fig. 2. An overview of the video and data flow from the field of play (left) to final compos- ited image (right). elements such a logo, or a virtual line marker with raw video. Figure 2 provides a high-level overview of the flow of video and data elements through our system to insert a virtual graphical element (in this case a corporate logo) into a live football scene. For the effect to appear realistic, the graphic must be firmly “stuck” in place in the environment (e.g., the ground). Achieving this component of realism proved to be one of the most significant challenges for our technology. It requires accurate and precise calibration of broadcast cameras, cor- rect modeling of the environment and fast computer graphics transformations and compositing. 5 For now, consider the sequence of steps depicted along the bottom of the diagram. A corporate logo is transformed so that it appears in proper perspective in the scene. The transformation step is accomplished using a model of the environment that describes the location of the camera and the relative locations of and orientations of virtual graphical objects to be included. Encoders on the camera are used to calculate the current field of view of camera and perform the appropriate clipping and perspective transformation on the graphical elements. We must also compute a matte that describes at each pixel whether foreground or background elements are present. The matte is used by the compositor to determine an appropriate mixture of the effect image and raw video for the final image. 2.1 Adding Dynamic Objects to the Image Given the infrastructure described above it is fairly straightforward to add dynamic elements. The locations of the graphical elements can be determined using any of a number of tracking algorithms as appropriate to the sport. As an example, we have recently developed a system for tracking the locations of all the competitors in a motor racing event. Each vehicle is equipped with a GPS receiver, an IMU, and other measurement devices. The output of each device is transmitted to a central processor that uses the information to estimate to position of each vehicle [Cavallaro and et al (2003)]. Given the position information and the pose of the broadcast cameras we are able to superimpose relevant graphics that indicate the positions of particular vehicles in the scene (see Figure 3). 6 Fig. 3. Objects that are tracked dynamically can be annotated in the live video. In this ex- ample Indy cars are tracked using GPS technologies, then the positions of different vehicles are indicated with pointers. 2.2 Camera Calibration Fig. 4. The broadcast camera is instrumented with precise shaft encoders that report it’s orientation and zoom. Yards lines, puck comet trails and logos must be added to the video at the proper locations and with the right perspective to look realistic. To determine proper place- ment, the field of view (FOV) of the broadcast camera is computed from data col- lected from sensors on the cameras. Others have also addressed this problem and propose to exploit redundant information for auto calibration [Welch and Bishop 7 (1997); Hoff and Azuma (2000)]. To determine the FOV of the video image, the tripod heads broadcast cameras are fitted with optical shaft encoders to report pan and tilt (a trip head is shown in Fig- ure 4). The encoders output 10,000 true counts per revolution (we obtain 40,000 counts per revolution by reading the phase of the two optical tracks in the encoders).