<<

COMPUTING PRACTICES Tracking Pitches for Broadcast Television

A cable network’s desire to enhance its telecasts resulted in K Zone, a computerized video tracking system that may have broader applications.

André n baseball, a ’s fame and fortune During a baseball game, dramatic changes in light- Guéziec depend on his mastery of the . ing conditions and the movement of objects and Triangle Software Pitches that pass outside the strike zone players can result in a shifting pattern of light and as balls and can be safely ignored. color that makes it especially difficult to track a Those that pass through it untouched, how- pitched ball. Further, several ballparks have a net Iever, count as strikes, three of which will retire the in place behind home plate, which contributes fur- batter to his team’s . Players, fans, and ther to the visual clutter that the image-process- sports journalists thus have an intense interest in ing system must filter when tracking the compiling statistics about these pitches—as do the baseball. umpires who determine each ’s status when it Meeting these challenges required developing a crosses the plate. complex system that fuses high-end computer During the 2001 season, the graphics with a sophisticated algorithm for calcu- strike zone received special attention when officials lating flight trajectories. The ESPN K Zone system decided to enforce the game’s original strike zone def- uses computer-generated graphics to create a inition, placing the zone’s upper limit between the shaded, translucent box that outlines the strike zone batter’s shoulders and belt. In the past, major-league boundaries for viewers. Behind the flashy graphics, umpires had rarely called a strike above the belt. K Zone—named after a synonym for the strike League officials and journalists thought that the effect zone—is a sophisticated computing system that of enforcing the original definition would be so sig- monitors each pitch’s trajectory.1 nificant it might change the hierarchy of hitters and . Further, 2001 turned out to be a particu- K ZONE TAKES SHAPE larly exciting year for baseball as Barry Bonds pur- In February 2001, ESPN contracted with Sport- sued—and ultimately surpassed—Mark McGuire’s vision to build a system for analyzing baseball -season home record set in 1998. pitches during its Major League Baseball broad- These developments made tracking pitches accu- casts. ESPN wanted a system that would determine rately more important than ever. Tracking the flight electronically, within one to two centimeters, of a pitch during a live broadcast presents two major whether each pitch qualified as a strike or a ball. challenges, however: and image-processing The system would then draw a representation of reliability. Speed is an issue because ensuring rapid the strike zone on the TV screen, superimposed over calculation of the trajectory practically requires real- the replayed broadcast video, to clearly show the time processing of the 60-fields-per-second video. pitch’s status. ESPN chose Sportvision for the pro- Ensuring image-processing reliability, on the ject because of the company’s track record in graph- other hand, requires overcoming several obstacles. ically enhancing sports broadcasts.

38 Computer 0018-9162/02/$17.00 © 2002 IEEE Figure 1. K Zone during a televised game. The pitch- tracking effect is an integral and unobtrusive part System overview of the telecast. Figure 1 shows K Zone in action during a televi- sion broadcast. ESPN insisted that the effect appear on the program video as an integral part of the scene, not as a separate graphical animation. To fulfill this requirement, the developers minimized the graphics so that they would not obscure any part of the game. The overall broadcast enhancement system uses three subsystems to produce the final televised graphics: cameras that observe each pitch. An operator uses • The camera pan-tilt-zoom encoding subsystem a third camera and PC to locate the strike zone’s calibrates the broadcast cameras in real time. top and bottom boundaries. All these components • The measurement subsystem detects the base- fit conveniently in one short equipment rack. ball’s trajectory, measures the batter’s stance, During operation, each PC processes the video and determines if the pitch is a strike or a ball. for one camera in real time. The processing uses a • The graphic overlay subsystem uses these mea- four-way multithreaded software architecture. One surements to produce the televised graphics. thread reads the video frame into memory, a sec- To draw them in the proper position, this sub- ond displays the video, a third handles the image system needs the real-time calibration data that processing, and the fourth writes the video to disk. the camera subsystem provides. We tested the pitch-tracking system extensively before using it in its broadcasts. Technicians The trajectory component, which consists of three checked various camera locations for tracking the PCs connected to three video cameras, tracks a baseball, selected views that permitted the most pitched baseball’s flight toward the strike zone. Two reliable detection, and refined the tracking algo- cameras observe the baseball, while the third observes rithm. These tests took place over several weeks the batter to provide proper sizing for the strike zone. early in the season at baseball games played in For calibrating the broadcast cameras, techni- Oakland, Minneapolis, and New York. cians install an encoder on each camera that mea- sures the pan and tilt angles, zoom voltage, and TRACKING CONSTRAINTS zoom extender positions. The encoders collect these To accomplish pitch tracking, the developers measurements 30 times per second and transmit needed to deal with four primary constraints. them to the graphic overlay subsystem. The graphic overlay subsystem renders a graphic Performance and superimposes it on the broadcast video. This Full-resolution digital NTSC (National Tele- graphic consists of two video streams, the fill, which vision System Committee) or PAL (phase alternat- contains the actual graphic, and the key, which con- ing line) video requires 270 Mbits per second of tains the transparency map that indicates the video bandwidth. Importing, displaying, and exporting pixels the graphic affects. These two streams are this data in real time takes several passes through input to the linear keyer, a piece of video equipment the personal computer’s PCI bus, stretching it that overlays the graphic on the broadcast video. nearly to capacity. Doing all this data transmission The graphic overlay subsystem uses an SGI O2 com- in real time requires carefully optimized software puter to draw a three-dimensional representation engineering. At the very least, multithreading is of the strike zone in the position that the broadcast essential to keep the CPU working on the video- camera’s pan, tilt, and zoom parameters specify. processing pipeline while waiting for the next video Although Sportvision had used the camera and field or frame to arrive. The system then decom- graphic-overlay systems in their broadcasts for sev- poses the video-processing pipeline into tasks exe- eral years, using them with K Zone required mod- cuted independently in a thread- manner. ifications. The measurement subsystem had to be built from scratch. Real-time operation Although ESPN planned to use the system for Measurement subsystem replays, we designed the image-processing pipeline K Zone’s measurement system uses two Pentium to work in real time to keep pace with the video 4 PCs running Windows 2000, linked to two video frame rate, with a delay of two seconds. Such a design

March 2002 39 Figure 2. Broken trajectory mapping. This ultimately unsuccessful algo- rithm seeks the baseball pattern in the color image, meant that the video would contain 60 fields—or resulting in the half frames—per second. The interlacing, which is scattered ball posi- the television standard, displays two fields in alter- tions that the red nating lines on a frame and thus represents two dif- squares denote. The ferent moments in time. algorithm became Each camera has a field of view that covers about ineffective when half the baseball’s flight. As a consequence of the lighting conditions, relatively wide field of view, the baseball’s image the field, and team allows using the effect live, provided the program consists of only a few pixels. Depending on the colors combined to receives a two-second or longer delay when broad- view, background, foreground, and other factors, make parts of the cast. Sports broadcasts commonly apply such delays. the baseball can appear as no more than two pix- uniforms and back- Even when used for replays alone, the pitch- els after detection. In one view, the ball passes ground look more tracking system still needed to process video near—and often over—the white foul line, creat- like a baseball than quickly. Creating a successful replay required exe- ing a white-on-white image. the baseball itself. cuting several steps in rapid succession. In addition Several moving objects and shadows could be to processing the video, using the system required mistaken for the ball as well. The home plate coordinating with ESPN’s television operators to , , and batter typically stay immobile, cue the appropriate footage. So pressing were these then move swiftly and precisely when the ball is constraints that the show’s director would period- pitched, while the computers are busy detecting it. ically cancel replays for lack of time. Baseball uniforms typically have white or gray patches, such as a white handkerchief hanging from Reliability the umpire’s pocket. Helmets exhibit specular high- Image processing and computer vision have been lights that can be mistaken for the ball in some well-established academic fields since the 1970s. views. Although many academics have published appli- Figure 2 shows how a later-abandoned image- cations in this field, few have been highly reliable processing algorithm that uses multiresolution pat- and successful in practice. Academic emphasis tern matching created a misleading image in the tends to be directed more toward pure innovation video sequence taken from the centerfield position. and mathematical elegance—demonstrated with a Compare this image with Figure 3, generated by an few carefully chosen test cases—than toward reli- algorithm that I developed. The system draws the ability. Partially because taking measurements only successive detected positions of the baseball, seconds before airing them on television can be whether successfully tracked or not, superimpos- extremely stressful, engineers who build applica- ing colored squares on a corresponding video tions for commercial broadcasting tend to empha- sequence’s images. In Figures 3b and 3c, the still size reliability and repeatability. frame represents two fields and shows two succes- Developing the image-processing system required sive , one for each field. The base- meeting the particularly significant challenge of ball locations in the images appear as small green designing an algorithm that would work in all pos- blobs of just a few pixels, with the interlacing caus- sible lighting conditions for an event staged out- ing the color to skip every other line. doors. The image-processing system needed to function in sunny or overcast lighting, during the TRACKING ALGORITHM day and at night, on both well-lit and shadowed Instead of using pattern matching, the K Zone subjects, on scenes composed of different viewing tracking algorithm exploits the kinematic properties angles or involving markedly different back- of the baseball’s flight. The first step, however, is to grounds, and on images filtered through a fore- qualify a potential baseball position in the image. ground net. A potential ball position corresponds to a num- ber of adjacent pixels, or blobs, that satisfy simple Efficient detection criteria in terms of size, shape, brightness, and color. Developing a successful tracking algorithm pro- Specifically, the algorithm eliminates anything that vided the key to making K Zone work. To track is too colorful, looking instead for a grayish shape pitches effectively, the system needed to isolate the with perhaps a little red dirt or green from the grass ball and follow it throughout its trajectory. We on it. More importantly, these adjacent pixels must began with color cameras working in the standard be significantly different from what occupied that and less costly interlaced mode at 30 Hz, which section of the image in previous fields.

40 Computer Figure 3. Successful trajectory mapping. The final algorithm chooses the correct ball positions, shown as red Background subtraction squares, from To estimate the probability of obtaining a given among a variety of pixel value, the tracking system draws samples of candidates, shown 2 intensity values from previous frames or fields. A as green patches, Gaussian distribution works well for a background (a) whose shape, color, pixel that does not change over time. To handle and brightness variations over time, the system uses a mixture of patterns resemble Gaussian distributions centered about recently those of the observed pixel values for a given location. Pitch baseball. K Zone tracking updates these models continuously to provided three views monitor changes in the background caused by mov- of each pitch: ing shadows and lighting variations such as switch- (a) batter close-up ing from natural to artificial light. If the current (for sizing the strike pixel deviates significantly from a predicted value, zone only), (b) high the system registers the change as motion. The var- home, and (c) high ious pixel locations a ball occupies when it passes first base. In these through the camera’s field of view generally satisfy (b) stills, used for real- this criterion. life detection, the Size, shape, color, and background differencing shutter speed nec- criteria—although significantly constraining—can- essary to keep the not by themselves provide enough data to distin- ball’s image from guish a pitched ball from various other moving blurring causes elements in a video sequence. Specifically, depend- some of the shots ing on the view, perhaps 10 to 15 blobs of pixels to be dark. satisfy these criteria at a given time, with at most one corresponding to the ball’s correct location. Figure 3, for example, shows all ball candidates as green blobs of pixels. Narrowing the cameras’ field of view could resolve this problem at least partially. (c) But doing so would result in more cameras show- ing less of the trajectory so that they could track larger and thus less ambiguous pitched balls.3 Even lapse of perhaps one tenth of a second—for poten- if there had been time to implement this upgrade, tial ball positions that form a consistent trajectory. it would have increased the system’s cost and com- Once the algorithm has seeded a trajectory, it plexity significantly. works in the second state, in which it tests all the potential blobs for a fit with the existing trajectory. Finite state machine The actual mathematics of the trajectory use a The system does not necessarily select the cor- Kalman filter, as the “Using Kalman Filtering to rect blob instantly from among all candidates. Track Trajectories” sidebar describes. Rather, the selection depends on the past and future Since the process uses regular interlaced NTSC blob sets, and the consistency the system can find video, a still frame contains two successive posi- among them. The selection algorithm uses a finite tions of the baseball, shown as green blobs in the state machine with two states. In the first state, the figures. Background and foreground variations can algorithm looks for physically plausible trajecto- complicate trajectory acquisition. Figure 3b, for ries. It checks the angle, the velocity in pixels per example, shows how the foregrounds with net or field, and the trajectory’s deviation with respect to no net, and the backgrounds with grass or dirt, vary a locally linear fit of the samples. If the tracking sys- dramatically within the same trajectory. tem can match enough candidates in successive fields, they serve as seeds for starting a trajectory DETERMINING THREE-DIMENSIONAL in the image’s two-dimensional plane. TRAJECTORIES To acquire a plausible trajectory, the system Our two pitch-tracking computers work as a delays the image processing by a small number of team to compute each final trajectory. One observes fields so that the algorithm can look both ahead the view from high home, the other from high first and back a few fields—corresponding to a time base, exchanging the two-dimensional positions

March 2002 41 Using Kalman Filtering to Track Trajectories

Invented in 1960 by Rudolph E. Once this is established, a recursive value in Equation 3 is too different from 1 2 Kalman, Kalman filtering is a commonly algorithm estimates xk optimally. In this the measurement z, given the current used technique for removing measure- case, “optimally” relates to minimizing uncertainty on the prediction: H P HT + ment errors and estimating a system’s the mean squared in Equation 2 R. Also, if the algorithm can choose variables. A vector represents each system over all the measurements. between several potential measurements parameter and measurement. Linear The Kalman filter algorithm uses the fol- to feed the filter, with only one being the equations must describe the system and lowing notations: := designates the assign- correct measurement, the correct choice measurement evolutions over time. The ment operator, P is the covariance matrix will probably be close to the predicted Kalman filter provides optimal estimates associated with the state, Q is the process value within the uncertainty region. of the system parameters, such as posi- noise covariance matrix, R is the mea- Typically, P gradually decreases as the tion and velocity, given measurements surement noise covariance matrix, and K algorithm incorporates more measure- and knowledge of a system’s behavior. is the Kalman gain matrix. The algorithm ments: Confidence in the state builds up. In general, the Kalman filter assumes consists of two main steps, as follows. Equation 7 shows that if K is large— that the following two relations can The time update or prediction step: which is the case if R is small, meaning describe a system: that there is little noise in the measure- x := A x; (3) ments—the new measurement z is T xk = Ak xk − 1 + wk (1) P:= A P A + Q; (4) weighted heavily. Instead, if K is small,

zk = Hk xk + vk (2) the value the current state x predicts has and the measurement update step, which a higher weight. Thus K works as a gain.

xk is the state vector, such as a position adapts the state to the new measurement and velocity, perhaps an acceleration or value:

other parameters, while zk is the measure- References T ment, such as a position. x0, wk, and vk are K1 := H P H + R; (5) 1. G. Welch and G. Bishop, “An Introduction T − 1 mutually uncorrelated vectors, and wk K := PH K1 ; (6) to the Kalman Filter,” http://www.cs. (process noise, or process evolution) and x := x + K (z − H x); (7) unc.edu/~welch/kalman/kalmanIntro.html. − vk (measurement noise) are white noise P:= (I K H) P; (8) 2. L. Levy, “The Kalman Filter: Navigation’s sequences. The first equation determines Integration Workhorse,” http://www.cs. the evolution of the state over time, and After a time update, the algorithm can unc.edu/~welch/media/pdf/Levy0997 the second relates measurement and state. reject a measurement if the predicted _kalman.pdf.

the tracking algorithm computes in each view in The third computer and camera provide a close- real time over a local Ethernet connection. K Zone up view of the batter. A TV camera operator uses a uses synchronized time-code generators to inscribe joystick to locate the strike zone’s top and bottom a field-accurate time code into each field of each on the live video. Since the rule book says a strike camera view. The pitch-tracking computers use the occurs when any part of the baseball enters any part time codes to tag each two-dimensional position of the strike zone, K Zone computes the intersection unambiguously. of the strike zone’s volume—a pentagonal prism— One of the two pitch-tracking computers inter- with a cylinder that has the same radius as the base- sects two lines of sight from each camera to com- ball, centered on the computed trajectory. The bine two two-dimensional positions that correspond tracking system reports the intersection as a strike to the same time code into a three-dimensional posi- and the absence of an intersection as a ball. In either tion. More specifically, the computer calculates the case, the system reports the baseball trajectory’s point of closest approach between the two straight point of impact with the front plane of the strike lines. Each two-dimensional position, or pixel, in a zone and draws this intersection on the TV screen. camera view can be associated with a straight line in three-space that essentially depicts the path of the USER AND VIEWER FEEDBACK photons that that pixel. Figure 4 illustrates the K Zone launched on 1 July 2001 during ESPN’s process of using intersecting line-of-sight pairs to Sunday Night Baseball, and the network used it to locate a baseball’s successive positions. Associating augment every edition of the program aired that each pixel with a line of sight is called camera cali- summer. Estimates from ESPN’s published ratings bration4 and presents several challenges in terms of indicate that about 13 million viewers tuned into accuracy, especially in a large environment such as the show each week and watched the graphical rep- a ballpark. resentation of the strike zone as each pitch either To determine the final trajectory, the system feeds hit the zone for a strike or missed it for a ball. successive three-dimensional positions into another Even though replays presented only a small frac- Kalman filter. tion of the 300 to 400 pitches thrown each game, the

42 Computer system tracked every pitch with an extremely low failure rate. In this context, failure means that the system did not accurately detect the pitched baseball’s trajectory. During the entire season, at worst the sys- tem missed only a handful of pitches per game, and Figure 4. Locating a frequently none at all. Significantly, operator error identifying faces, for instance, whereas computers baseball’s succes- accounted for most of the few reported failures. obviously have none of these human perceptual sive positions. To ESPN used the system intensively, showing 17 skills built in. K Zone, built on a tight deadline that locate a baseball’s replays in K Zone’s debut game, then 20 to 30 allowed only four months for its development, aug- position in three replays per game thereafter. Viewers and critics mented existing technology to create a sophisticated dimensions, K Zone alike responded positively to K Zone’s visual effects and reliable system that has proven commercially finds the point of through messages posted to various Internet viable and a valuable enhancement for sports fans. closest approach forums. Several critics went further, urging ESPN to Further, ESPN is so pleased with the technology it between the two use the system on controversial umpire calls. is seeking an Emmy nomination for K Zone. This video cameras’ lines Although the network rarely used K Zone in this development model may well prove effective in of sight: high above capacity early in the season, it did so occasionally future computer vision projects, regardless of the home plate and high later in the season. application domain. above first base. BROADER APPLICATIONS A potential future application for the technolo- Acknowledgments gies used in K Zone may lie in computerized med- I thank Sportvision’s Rick Cavallaro, Mike ical-image analysis. The medical literature docu- Cramer, Matt Lazar, Jim McGuffin, Alon Moses, ments that radiologists sometimes make mistakes in and Marv White; J.R. Gloudemans; and Sport- reading x-rays and other imaging studies—as do vision, ESPN, and Reality Check. experienced, well-trained, and dedicated umpires when watching a baseball flying straight toward them at 90 miles per hour. References Although much progress has been made lately in 1. A. Guéziec, “Tracking a Baseball Pitch for Broadcast the automated processing and understanding of Television,” http://www.trianglesoftware.com/pitch medical images, the process of examining a chest _tracking.htm. x-ray, even digitally, remains surprisingly similar to 2. A. Elgammal, D. Harwood, and L. Davis, “Non- what it was several decades ago. Healthcare parametric Model for Background Subtraction,” providers could apply some of the computer vision http://fizbin.eecs.lehigh.edu/FRAME/Elgammal/ techniques used to detect a baseball to these analy- bgmodel.html. ses, either directly or in a closely related form. 3. G. Pingali, Y. Jean, and A. Opalach , “Ball Tracking Image differencing, which relates to background and Virtual Replays for Innovative Tennis Broad- differencing, is commonly used in digital subtrac- casts,” Proc. 15th Int’l Conf. Pattern Recognition tion angiography, for example, to show a patient’s (ICPR), IEEE CS Press, Los Alamitos, Calif., 2000. arterial network after injection of a contrast mate- 4. O. Faugeras, Three-Dimensional Computer Vision, rial. Likewise, technicians could compute an MIT Press, Boston, 1993. anatomical shape’s motion and trajectory on image sequences to determine how a clinical condition André Guéziec is the founder and CEO of Triangle evolves over time or in response to treatment. Software (http://www.trianglesoftware.com). He is the main developer and designer of baseball track- omputer vision is coming of age, after decades ing in K Zone. His interests include image process- of mostly academic pursuits. We finally have ing and computer vision, notably applied to medical C the computational power—at a reasonable imaging, 3D shape modeling, and processing (par- price—and proper expertise to apply this technol- ticularly simplification), as well as modeling, simu- ogy to challenging technical problems. However, lation, and animation of road traffic. Guéziec while solving some problems might seem easy intu- received a PhD in computer science from the Uni- itively, in reality, doing so may be tremendously dif- versity of Paris at Orsay, where he specialized in ficult. In particular, the visual nature of these issues medical image analysis. He is a senior member of may give a false impression of simplicity: Humans the IEEE and holds several US and international are typically very skilled at pattern recognition, when patents. Contact him at [email protected].

March 2002 43