A Trainable System for Object Detection in Images and Video

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY AI Technical Rep ort No May A Trainable System for Ob ject Detection in Images and Video Sequences Constantine PPapageorgiou cpapaaimitedu This publication can b e retrieved by anonymous ftp to publicationsaimitedu Abstract This thesis presents a general trainable system for ob ject detection in static images and video sequences The core system nds a certain class of ob jects in static im ages of completely unconstrained cluttered scenes without using motion tracking or handcrafted mo dels and without making any assumptions on the scene structure or the numb er of ob jects in the scene The system uses a set of training data of p ositive and negative example images as input transforms the pixel images to a Haar wavelet representation and uses a supp ort vector machine classier to learn the dierence between inclass and outofclass patterns To detect ob jects in outofsample im ages we do a brute force searchover all the subwindows in the image This system is applied to face p eople and car detection with excellent results For our extensions to video sequences we augment the core static detection system in several ways extending the representation to ve frames implementing an approximation to a Kalman lter and mo deling detections in an image as a density and propagating this density through time according to measured features In addition we present a realtime version of the system that is currently running in a DaimlerChrysler exp erimental vehicle As part of this thesis we also present a system that instead of detecting full pat terns uses a comp onentbased approach We nd it to b e more robust to o cclusions ere lighting conditions for p eople detection than the full rotations in depth and sev body version We also exp erimentwithvarious other representations including pixels and principal comp onents and show results that quantify howthenumb er of features color and graylevel aect p erformance c Massachusetts Institute of Technology Contents Intro duction Ob ject Detection Pattern Classication and Machine Learning Previous Work Face Detection People Detection Car Detection Our Approach Thesis Contributions Outline The Static Detection System Architecture Training Data Representation Wavelets The Wavelet Representation Supp ort Vector Machines SVMs and Conditional Densities Exp erimental Results Exp erimental Results Test Pro cedure Exp eriments Pixels Wavelets PCA Signed vs Unsigned Wavelets Complete vs Overcomplete Color vs GrayLevel Faces People and Cars Discussion Dierent Classiers Feature Selection Training with Small Data Sets Detection by Comp onents Intro duction Classier Combination Algorithms Previous Work System Details Overview of System Architecture Details of System Architecture Results Exp erimental Setup Exp erimental Results A RealTime System and Fo cus of Attention A RealTime Application Sp eed Optimizations Fo cus of Attention Integration With The DaimlerChrysler Urban Trac Assistant Future Work Fo cus of Attention Exp eriments Integration Through Time Pure Pattern Classication Approach Unimo dal DensityTracking Kalman Filter Po or Mans Kalman PMK Multimo dal DensityTracking Condensation Estimating P zjx Results Discussion of Time Integration Techniques Pattern Classication Approach Po or Mans Kalman Heuristic CondensationBased Approach Conclusion AWavelets A The Haar Wavelet A Dimensional Wavelet Transform A Quadruple DensityTransform B Supp ort Vector Machines B Theory and Mathematics B Practical Asp ects C RawWavelet Data List of Figures Static image versus video sequences Examples from the training database of p eople images System architecture Examples from the training database of face images Images of p eople from our database of training examples Examples from the training database of cars Haar wavelets Fine scale edges are to o noisy to use as features Co ded average values of wavelet features for faces Co ded average values of wavelet features for p eople Co ded average values of wavelet features for cars Comparing small versus large margin classiers Mapping to a higher dimensional feature space Summary of detection algorithm Face detection results People detection results Car detection results Shifting the decision surface to change accuracyfalse p ositive rate tradeo Histogram equalization and scaling co ecient represen tation ROC curves for face detection ROC curves for p eople detection ROC curves for car detection Comparing dierent classiers for p eople detection Manually selected features overlayed on an example p erson ROC curves comparing dierentnumb ers and typ es of features for p eople detection ROC curves comparing dierent p ositive and negative training set sizes for p eople detection Schematic outline of the op eration of the comp onent based detection system Illustration of the imp ortance of geometric constraints in comp onent detection Geometric constraints placed on dierent comp onents Examples of heads legs and left and right arms used to train the comp onent classiers ROC curves of the comp onent classiers Comparison of metho ds of combining classiers Images demonstrating the capability of the comp onent based p erson detector Results of the systems application to images of partially o ccluded p eo ple and p eople whose b o dy parts blend in with the background Examples of comp onent based p eople detection in large cluttered images Side view of the DaimlerChrysler Urban Trac AssistantUTA View of the cameras in UTA Quantifying the reduction in p erformance from using graylevel features Lo oking out from UTA The top salient region in each frame of a video sequence ROC curves for system using saliency mechanism Multiframe training data for pattern classication approachtopeople detection in video ROC curves comparing static and pattern classication approachto p eople detection in video sequences Illustration of PMK technique for dynamic detection OC curves comparing static and Po or Mans Kalman technique for R p eople detection in video sequences Illustration of Condensation in action SVM output decays smo othly across lo cation and scale Comparing global thresholding to p eak nding with lo cal thresholding ROC curves comparing static and Condensationbased p eople detec tion in video sequences Chapter Intro duction Until recently digital information was practically limited to text Now we are in the midst of an explosion in the amount of digital visual information that is available In fact the state of technology is quickly moving from where databases of images are standard to where the proliferation of entire video databases will b e de rigeur With this increase in the amount of online data available there has b een a corre sp onding push in the need for ecient accurate means for pro cessing this informa tion Searchtechnology thusfar has b een almost exclusively targeted to pro cessing textual data Yaho o Inc Compaq Computer Corp Lycos Inc Excite Inc indeed the amount of online data was heavily weighted towards text and the automatic pro cessing of image information has signicantly lagged Only recently as the amount of online image data has explo ded have there b een systems that provide indexing and cataloging of image information in mainstream search services IBM Virage Inc These systems provide various means for searching through mainly static online image libraries but are fairly limited in their capabilities We can exp ect that as the standard in available online visual informa tion transitions from images to entire video libraries eectivetechniques and systems for searching this data will quickly b ecome imp erative Consider this hyp othetical problem a digital historian is lo oking for all of the CNN video fo otage from the past veyears that shows Bill Clinton and Monica Lewinsky together in the same scene Manually searching this data would b e a daunting if not imp ossible task A system ould that would b e able to automatically search for sp ecic ob jects and situations w b e indisp ensable In addition to the Internet as a motivator and forum for video search the im proved cheap er pro cessing p ower of computers has op ened the do or to new applica tions of image and video searching that

Load more