Tensor)Algebraic Frameworkfor Computer Graphics,Computer Vision, and Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
AMULTILINEAR (TENSOR)ALGEBRAIC FRAMEWORK FOR COMPUTER GRAPHICS,COMPUTER VISION, AND MACHINE LEARNING by M. Alex O. Vasilescu A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto Copyright c 2009 by M. Alex O. Vasilescu A Multilinear (Tensor) Algebraic Framework for Computer Graphics, Computer Vision, and Machine Learning M. Alex O. Vasilescu Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2009 Abstract This thesis introduces a multilinear algebraic framework for computer graphics, computer vision, and machine learning, particularly for the fundamental purposes of image synthesis, analysis, and recog- nition. Natural images result from the multifactor interaction between the imaging process, the scene illumination, and the scene geometry. We assert that a principled mathematical approach to disentan- gling and explicitly representing these causal factors, which are essential to image formation, is through numerical multilinear algebra, the algebra of higher-order tensors. Our new image modeling framework is based on (i) a multilinear generalization of principal compo- nents analysis (PCA), (ii) a novel multilinear generalization of independent components analysis (ICA), and (iii) a multilinear projection for use in recognition that maps images to the multiple causal factor spaces associated with their formation. Multilinear PCA employs a tensor extension of the conventional matrix singular value decomposition (SVD), known as the M-mode SVD, while our multilinear ICA method involves an analogous M-mode ICA algorithm. As applications of our tensor framework, we tackle important problems in computer graphics, com- puter vision, and pattern recognition; in particular, (i) image-based rendering, specifically introducing the multilinear synthesis of images of textured surfaces under varying view and illumination conditions, a new technique that we call “TensorTextures”, as well as (ii) the multilinear analysis and recognition of facial images under variable face shape, view, and illumination conditions, a new technique that we call “TensorFaces”. In developing these applications, we introduce a multilinear image-based rendering algorithm and a multilinear appearance-based recognition algorithm. As a final, non-image-based ap- plication of our framework, we consider the analysis, synthesis and recognition of human motion data using multilinear methods, introducing a new technique that we call “Human Motion Signatures”. ii Acknowledgements First and foremost, I am grateful to my advisor, Professor Demetri Terzopoulos. Although initially resisting the direction of my research, he eventually became its biggest supporter. This thesis would not have materialized in its present form without his valuable input and guidance. I am indeed fortunate to have had him as my advisor. I thank the other members of my PhD Thesis Committee, Professors Geoffrey Hinton, Alan Jep- son, and David Fleet, for their constructive critique, which improved the quality of this dissertation. I also appreciate their flexibility in accommodating my departmental and senate defenses into their busy schedules. I am grateful to Professor Amnon Shashua of The Hebrew University of Jerusalem for agreeing to serve as the external examiner of my thesis on very short notice and for reviewing my dissertation in record time. My thanks also go to Greg Ward, who provided encouragement and comments on a draft of my SIGGRAPH paper on “TensorTextures”. Many other people at the University of Toronto (UofT), New York University (NYU), the Mas- sachusetts Institute of Technology (MIT), and elsewhere deserve special mention and my appreciation: At the UofT, I thank Professor John Tsotsos for his interest in my work as well as for his kindness and encouragement. I had countless enjoyable discussions about computer vision and related topics with Chakra Chennubhotla. I also thank Genevieve` Arboit, Andrew Brown, Kiam Choo, Florin Cutzu, Petros Faloutsos, Steven Myers, Michael Neff, Victor Ng, Alberto Paccanaro, Sageev Oore, Faisal Qureshi, Brian Sallans, Corina Wang, and Howard Zhang for their comradery. Genevieve,` Howard, Kiam, and Steven generously volunteered their time and bodies for motion capture data acquisition. My work on human motion signatures started with an overambitious term project for a graphics course taught by Professors Eugene Fiume and James Stewart. Julie Weedmark kindly assisted in dealing with the administrative and financial obstacles that were unjustly imposed by the UofT School of Graduate Studies. At NYU, I thank the MRL/CAT faculty, Professors Davi Geiger, Ken Perlin, Chris Bregler, Denis Zorin, Yann LeCun, Jack Schwartz, and last but not least Mike Uretsky for encouragement and inter- esting discussions. Professor Michael Overton deserves a special thanks for pointing me to the work on tensors by Dr. Tamara Kolda at Sandia. Finally, I have fond memories of interactions with Aaron Hertzmann, Evgueni Parilov, Wei Shao, Elif Tosun, Lexing Ying, and many other members of the MRL and CAT. I particularly thank Jared Silver for his valuable assistance with 3D modeling, animation, and iii A/V production, and Svetlana Stenchikova for exploring some programming issues. At MIT, I am grateful to Professor Rosalind Picard for bringing me into the Media Lab and for her generous support during my two years there. I thank Professors John Maeda, Alex Pentland, and Marvin Minsky for helpful discussions. I also had memorable conversations with Ian Eslick, my office mate at the Media Lab. Several years earlier, while I was at the UofT, I was inspired by Professor Josh Tenenbaum’s bilinear models, and he generously provided me some of his code. I would like to gratefully acknowledge my amicable interactions with members of the “tensor de- composition” research community. In particular, I thank Tamara Kolda, Lieven de Lathauwer, Pietr Kroonenberg, Rasmus Bro, Pierre Comon, Jos ten Berge, Henk Kiers, Lek-Heng Lim, and Charlie van Loan. I am also grateful to (the late) Richard Harshman and Gene Golub for their interest in my work and their encouragement. Motion data were collected at the Gait Laboratory of the Bloorview MacMillan Medical Centre in Toronto. The data acquisition work was done with the permission of Professor Stephen Naumann, Di- rector of the Rehabilitation Engineering Department and with the helpful assistance of Mr. Alan Morris. The 3D Morphable faces database was obtained courtesy of Prof. Sudeep Sarkar of the University of South Florida as part of the USF HumanID 3D database. The portions of this research that were carried out at the UofT were funded in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada through grants to Demetri Terzopou- los, and at NYU in part by the Technical Support Working Group (TSWG) of the US Department of Defense through a grant to him and me. Finally, I am grateful for the essential financial support that my parents, George and Ioanna Vasilescu, provided to me while I was a student and I greatly appreciate their sacrifices. iv Contents Abstract ............................................. ii Acknowledgements ...................................... iii Contents ............................................ vi List of Tables .......................................... vii List of Figures ......................................... ix List of Algorithms ....................................... x Love and Tensor Algebra ................................... xi 1 Introduction 1 1.1 Data Analysis and Multilinear Representations ..................... 2 1.2 Thesis Contributions ................................... 4 1.3 Thesis Overview ..................................... 8 2 Related Work 10 2.1 Linear and Multilinear Algebra for Factor Analysis ................... 10 2.2 Bilinear and Multilinear Analysis in Vision, Graphics, and Learning .......... 13 2.3 Background on Image-Based Rendering ......................... 15 2.4 Background on Appearance-Based Facial Recognition ................. 16 2.5 Background on Human Motion Synthesis, Analysis, and Recognition .......... 17 3 Linear and Multilinear Algebra 19 3.1 Relevant Linear (Matrix/Vector) Algebra ......................... 20 3.2 Relevant Multilinear (Tensor) Algebra .......................... 21 3.2.1 Tensor Decompositions and Dimensionality Reduction ............. 25 4 The Tensor Algebraic Framework 32 4.1 PCA and ICA ....................................... 33 4.2 Multilinear PCA (MPCA) ................................. 36 4.3 Multilinear ICA (MICA) ................................. 40 4.4 Kernel MPCA/MICA, Multifactor LLE/Isomap/LE, and Other Generalizations .... 43 v 5 Multilinear Image Synthesis: TensorTextures 48 5.1 Multilinear BTF Analysis ................................. 51 5.2 Computing TensorTextures ................................ 54 5.3 Texture Representation and Dimensionality Reduction ................. 55 5.4 Multilinear Image-Based Rendering ........................... 57 5.5 Rendering on Curved Surfaces .............................. 58 5.6 Additional Results .................................... 59 6 Multilinear Image Analysis: TensorFaces 64 6.1 The TensorFaces Representation ............................. 65 6.2 Strategic Dimensionality Reduction ........................... 68 6.3 Application to a Morphable Faces Database ....................... 70 6.4 The Independent TensorFaces Representation .....................