Automatic Classification of Tennis Video for High-Level Content-Based
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Classication of Tennis Video for Highlevel z Contentbased Retrieval y G Sudhir John C M Lee and Anil K Jain Technical Rep ort HKUSTCS August Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay Kowlo on Hong Kong y Department of Computer Science Michigan State University East Lansing MI USA z This research is supp orted in part by the Sino Software Research Centre of The Hong Kong University of Science and Technology under grantSSRCEG and by RGC Direct Allo cation Grant under grantDAGEG Email fsudhircmleegcsusthk jaincpsmsuedu Abstract This rep ort presents our techniques and results on automatic analysis of tennis videos for the purp ose of highlevel classication of tennis video segments to facilitate content based retrieval Our approach is based on the generation of a image mo del valid under perspective pro jection for the tennis courtlines in a tennis video We rst derive this mo del by making use of knowledge ab out dimensions of a tennis court typical camera geometry used when capturing tennis videos and the connectivity information ab out the court lines Based on this mo del we develop a court line detection algorithm and a robust playertracking algorithm to track the tennis players over the image sequence In order to select only those video segments which contain tennis court from an input raw fo otage of tennis video we also prop ose a colorbased tennis court clip selection metho d Automatically extracted tennis court lines and the players lo cation information form crucial measurements for the highlevel reasoning mo dule which analyzes the relative p ositions of the tennis players with resp ect to the court lines and the net in order to map the measurements to highlevel events semantics like baselineral lies serveand vol leying netgames passingshots etc Results on real tennis video data are presented demonstrating the validityand p erformance of the approach Intro duction Automatic classication and retrieval of video information based on content is a very challenging research area Recently there have b een many research eorts addressing various relevant issues see for a recent survey The most dicult question faced at the outset is what do es content mean Or more sp ecicallyhow should one characterize visual content present in video and how to extract it for the purp ose of generating useful highlevel annotations to facilitate contentbased indexing and retrieval of video segments from digital video libraries It is generally accepted that content is to o sub jective to be characterized completely as it often dep ends on the objects background context domain etc in the video This is one of the main reasons why the problem of contentbased access is still largely unsolved On the other hand advances in video and storage technology have enabled i representation of visual data in digital form for archiving and browsing ii ecient storage using compression algorithms and iii fast retrieval with randomaccess capability using storage media like CDROMs Laser Discs Video CDs and the latest DVDVideo and DVDROM systems Recent literature contains a numb er of approaches to characterize visual content based on color texture shap e and motion While these approaches have their merit of b eing applicable to generic image and video data they also have a ma jor limitation that is they can characterize only lowlevel information The end users will almost always liketointeract at highlevel when accessing or retrieving images and video segments from a database and the imp ortance of useful highlevel annotations can not b e over emphasized Atthe same time manual generation of highlevel annotations is quite cumb ersome time consuming and exp ensive Hence there is a dire need for algorithms that are able to automatically infer highlevel content using the lowlevel information extracted from data While this is extremely dicult for images and video in general limiting the analyses to sp ecic domains can help overcome many hurdles in automatic generation of highlevel semantic annotations as relevant to those sp ecic domains This rep ort presents acase study wherein domainsp ecic knowledge and constraints are exploited to accomplish the dicult task of automatic annotation of video segments with highlevel semantics to enable contentbased retrieval There have b een some recent attempts to exploit contextual or domain know ledge for automatic generation of highlevel annotations for video segments Most of them attempt to relate or map the lowlevel information measured from the image and video data to highlevel content p ertinent to the sp ecic domains Zhang et al present a metho d to parse and index news video based on a syntactic mo delbased approach While they parse the news video into ancherp erson shot and news story shot using some image pro cessing techniques and succeed in automatically creating a more detailed description for a structured news video they do not attempt to describ e the content of the news video for the obvious reason that it falls into the domain of generic video and hence is far to o dicult a problem Gong et al address the problem of automatic parsing of TV so ccer programs They make use of the standard layout of a so ccer eld domain knowledge and partition it into nine categories such as mid eld and p enaltyarea Then they apply various image pro cessing techniques like edge detection and line identication on TV images for automatic determination of camera lo cation A similar attempt has b een made byYow et al wherein they analyze a video library of so ccer games They rep ort sp ecial techniques for automatically detecting and extracting so ccer highlights by analyzing image content and for presenting the detected shots by panoramic reconstruction of selected events A contextbased technique for automatic extraction of embedded captions is rep orted by Yeo and Liu They apply their metho d directly to MPEG compressed video and extract visual content which are highlighted using the extracted captions Saur et al present a metho d for automatic analysis and annotation of basketball video They use the lowlevel information available directly from MPEG compressed video of basketball as well as the prior knowledge of basketball video structure to provide highlevel content analysis like closeup views fast breaks steals etc In this rep ort we present our approach and results for an automatic analysis of tennis video for the purp ose of highlevel classication of tennis video segments to facilitate contentbased retrieval We exploit the available domain knowledge ab out tennis video and demonstrate that it is p ossible to generate meaningful semantic annotations applicable to the domain Our goal is to automate the generation of useful and highlevel annotations like baselineral lies passingshots netgames and serveandvol ley games to p ertinent segments of the video We have chosen these annotations b ecause they form some of the most common play events in a tennis video Video is an imp ortant source in the sp ort of tennis particularly in the areas of teaching coaching and learning how to play the game They are used to teach the rules of the game analyze and summarize tennis matches and tournaments demonstrate prop er playing techniques do cument the history of the sp ort and prole current and past champions Annotating raw unstructured Videotap es are the dominant video sources to day However with the advent of digital com tennis video with these highlevel content can help professional tennis players and coaches to retrieve video segments in a more meaningful manner from a digital library of tennis video For example a player who wants to improvedevelop his serveandvol ley abilities would only be interested in retrieving and studying a variety of serveandvol ley video segments Or a tennis coach giving a visual demonstration of passing techniques andor stratagies to players is mainly interested in a collection of passingshot video segments as part of his video instructional material If the ab ove mentioned highlevel annotations are available for indexing a digital library of tennis video then players or a coach can get contentbased access to relevant video segments In this rep ort we present our techniques to achieve this ob jective Exp erimental results on real tennis video data are presented to demonstrate the merit of our approach The rest of the rep ort is organized as follows In Section we present an overview of our system We present a colorbased algorithm in Section to select video clips containing tennis court only from the input raw fo otage of tennis video for further analysis In Section we derive a quantitative mo del for a tennis court which is valid on the image domain under the assumption of p ersp ective pro jection mo del for the camera Based on this mo del Section presents our tennis courtline detection algorithm In Section we present our playertracking algorithm to track the two tennis players over the image sequence and in Section we describ e briey our metho d of mapping lowlevel positional information ab out the tennis courtlines and the lo cations of the two tennis players to highlevel tennisplayevents in the video segments Finally in Section we present some pression and storage technology and the asso ciated advantages of random and networkbased access capabilities we foresee that digital video will play a dominant role in future of the highlevel annotation results wehave obtained and conclude the rep ort in Section