Application of Stochastic Grammars to Understanding Action

Application of sto chastic grammars to understanding action by Yuri A Ivanov MS Computer Science State AcademyofAir and Space Instrumentation St Petersburg Russia February Submitted to the Program in Media Arts and Sciences Scho ol of Architecture and Planning in partial fulllmentofthe requirements for the degree of MASTER OF SCIENCE IN MEDIA ARTS AND SCIENCES at the Massachusetts Institute of Technology February c Massachusetts Institute of Technology All Rights Reserved Signature of Author Program in Media Arts and Sciences January Certied by Aaron F Bobick LG Electronics Inc Career Development AssistantProfessor of Computational Vision Program in Media Arts and Sciences Thesis Sup ervisor Accepted by Stephen A Benton Chairp erson Departmental Committee on Graduate Students Program in Media Arts and Sciences Application of sto chastic grammars to understanding action by Yuri A Ivanov Submitted to the Program in Media Arts and Sciences Scho ol of Architecture and Planning on January in partial fulllmentofthe requirements for the degree of Master of Science in Media Arts and Sciences Abstract This thesis addresses the problem of using probabilistic formal languages to describ e and understand actions with explicit structure The thesis explores several mechanisms of pars ing the uncertain input string aided byastochastic contextfree grammar This metho d originating in sp eechrecognition lets us combine a statistical recognition approach with a syntactical one in a unied syntacticsemantic framework for action recognition The basic approachistodesignthe recognition system in a twolevel architecture The rst level a set of indep endently trained comp onentevent detectors pro duces the likeli ho o ds ofeach comp onentmodel The outputs of these detectors provide the input stream for a sto chastic contextfree parsing mechanism Any decisions ab out supp osed structure of the input are deferred to the parser whichattempts to combine the maximum amountof the candidate events into a mostlikely sequence according to a given Sto chastic ContextFree Grammar SCFG The grammar and parser enforce longer range temp oral constraints dis ambiguate or correct uncertain or mislab eled lowlevel detections and allowthe inclusion of a priori knowledge ab out the structure of temp oral events in a given domain The metho d takes into consideration the continuous character of the input and p erforms structural rectication of it in order to accountfor misalignments and ungrammatical symbols in the stream The presented technique of sucharectication uses the structure probability maximization to drivethesegmentation Wedescribea metho d of achieving this goal to increase recognition rates of complex sequential actions presenting the task of segmenting and annotating input stream in various sensor mo dalities Thesis Sup ervisor Aaron F Bobick Title Assistant Professor of Computational Vision Application of sto chastic grammars to understanding action by Yuri A Ivanov The following p eople served as readers for this thesis Reader Alex Pentland Toshiba Professor of Media Arts and Sciences Media Lab oratoryMassachusetts Institute of Technology Reader Walter Bender Asso ciate Director for Information Technology Media Lab oratoryMassachusetts Institute of Technology Acknowledgments First and foremost I would liketothank myadvisor professor Aaron Bobick for giving me the opp ortunitytojoin his team at MIT and intro ducing me to machine vision I really appreciated his op enmindedness readiness to help and listen and giving me the freedom to see things myown wayEvery discussion with him taughtmesomething new I consider myself very fortunate to have him as myadvisor b oth p ersonally and professionally he has given me more than I could ever ask for Thanks Aaron Iwould liketothank myreaders Walter Bender and Sandy Pentland for taking the time from their busy schedules to review the thesis and for their comments and suggestions This work would not b e p ossible without the input from all members of the High Level Vision group in no particular order Drew Stephen Lee Claudio Jim John Chris and Martin My sp ecial thanks to Andrew Wilson who had the misfortune of b eing my ocemate for the last year and a half and who had the patience to listen to all myramblings and answer all myendless questions to the degree I never exp ected But I guess this whole thing wouldnt even have happ en if George Slowhand Thomas hadnt come along and told me that while I am lo oking for b etter times they pass Thanks bro Thanks to Mom and Dad for giving me all their loveandalways b eing there for me regardless what mo o d I was in Thanks to Margie for going the extra ten miles every time I needed it And nally thanks to mywifeofhasitbeenthat long already years Sarah Thank you for b eing so supp ortive loving patient and understanding Thank you for putting up with all my crazy dreams and making them all worth trying For you I only have one question You know what Contents Intro duction and Motivation Motivation Structure and Content Bayesian View AI View Generative Asp ect Example I Explicit Structure and Disambiguation Example I I Cho osing the Appropriate Mo del Approach Outline of the Thesis Background and Related Work Motion Understanding Pattern Recognition Metho ds of Sp eech Recognition Related Work Sto chastic Parsing Grammars and Languages Notation Basic denitions ExpressivePower of Language Mo del Sto chastic ContextFree Grammars EarleyStolckeParsing Algorithm Prediction Scanning Completion Relation to HMM Viterbi Parsing Viterbi Parse in Presence of Unit Pro ductions Temp oral Parsing of Uncertain Input Terminal Recognition Uncertaintyin the Input String Substitution Insertion Enforcing Temp oral Consistency MisalignmentCost Online Pro cessing Pruning Limiting the Range of Syntactic Grouping Implementation and System Design Comp onent Level HMM Bank Syntax Level Parser Parser Structure Annotation Mo dule Exp erimental Results Exp erimentIRecognition and Disambiguation Recognition and semantic segmentation RecursiveModel Conclusions Summary Evaluation Extensions Final Thoughts List of Figures Example Parsing strategy Error correcting capabilities of syntactic constraints Simple action structure Secretary problem Left Corner Relation graph Viterbi parse tree algorithm Spurious symb ol problem Temp oral consistency Recognition system architecture Example of a single HMM output Comp onent level output SQUARE sequence segmentation STIVE setup Output of the HMM bank Segmentation of a conducting gesture Polhemus Segmentation of a conducting gesture STIVE Recursivemodel List of Tables Complexityofarecursivetask Left Corner Relation matrix and its RT closure Unit Pro duction matrix and its RTclosure Chapter Intro duction and Motivation In the last several years there has b een a tremendous growth in the amountofcomputer vision researchaimed at understanding action As noted byBobick these eorts have ranged from the interpretation of basic movements suchasrecognizing someone walking or sitting to the more abstract task of providing a Newtonian physics description of the motion of several ob jects In particular there has b een emphasis on activities or b ehaviors where the entitytobe recognized maybeconsidered as a sto chastically predictable sequence of states The greatest number of examplescomeform work in gesture recognition where analogies to sp eechand handwriting recognition inspired researchers to devise hidden Markov model metho ds for the classication of gestures The basic premise of the approachisthat the visual phenomena observed can b e considered Markovian in some feature space and that sucient training data exists to automatically learn a suitable mo del to characterize the data Hidden MarkovModels havebecome the weap on of choice of the computer vision communityin the areas where complex sequential signals havetobelearned and recognized The p opularityofHMMs is due to the richness of the mathematical foundation of the mo del its robustness and the fact that many imp ortantproblems are easily solved byapplication of this metho d While HMMs are well suited for mo deling parameters of a sto chastic pro cess with as sumed Markovian prop erties their capabilities are limited when it comes to capturing and expressing the structure of a pro cess If wehavea reason to b elieve that the pro cess has primitivecomponents it might be benecial to separate the interrelations of the pro cess comp onents from their internal statistical prop erties in the representation of the pro cess Suchaseparation lets us avoid overgeneralization and employ the metho ds whichare sp ecic to the analysis of either the structure of the signal or its content Motivation Here weestablish the motivation for our work based on four lines of argument First we show how

Application of Stochastic Grammars to Understanding Action

Computer Vision Stochastic Grammars for Scene Parsing

UNIVERSITY of CALIFORNIA Los Angeles Human Activity

Using an Annotated Corpus As a Stochastic Grammar

GRAMMAR IS GRAMMAR and USAGE IS USAGE Frederick J

W. G. M., a Stochastic Model of Language Change Through Social

The Plasticity of Grammar

Calibrating Generative Models: the Probabilistic Chomsky-Schutzenberger¨ Hierarchy∗

Stochastic Definite Clause Grammars

Stochastic Attribute-Value Grammars

Unsupervised Language Acquisition: Theory and Practice

A Stochastic Grammar of Images

Unsupervised Structure Learning of Stochastic And-Or Grammars