Team Members Are Proposed As Key Personnel
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Modal Approaches to Real-Time Video Analysis Team Members: All team members are proposed as Key Personnel. Garrett T. Kenyon* [email protected] 505-667-1900 Dylan Paiton** [email protected] 505-720-5074 Pete Schultz [email protected] Vicente Malave [email protected] 619-800-4113 *lead developer, **alternate lead developer Description of Capability to be Delivered We intend to provide users with a high-performance (MPI-based, multi-core, GPU-accelerated) software tool for using biologically-inspired, hierarchical neural networks to detect objects of interest in streaming video by combining texture/color, shape and motion/depth cues. The system will be capable of processing video data or still imagery. The software to be provided will build upon an existing high-performance, open-source, NSF-funded neural simulation toolbox called PetaVision. Last fall, several members of our team worked closely together in an analogous boiler room atmosphere while competing in the DARPA NeoVision2 shootout. The close environment and hard deadline allowed for scientific advances at a more accelerated rate and forced innovative ways to combine traditionally independent algorithms. In particular, we found that by combining previously independent efforts to model different aspects of cortical visual processing, including local color/texture patches, long-range shape processing and relative depth from motion, we obtained better detection and localization accuracy than was possible using any of the individual processing techniques alone. By integrating a variety of new approaches to the implementation of these separate visual processing modalities and combining these modalities into a single simulation environment, we expect to produce dramatic improvements in the state-of-the- art of object detection and localization in high-definition video. This approach is inspired from neuroscience literature, where it has been shown that each of these separate processing modalities is present as an anatomically and physiologically distinct pathway in the ventral processing stream. Novelty of our Approach It has been observed in the primate visual cortex that independent features are processed via separate pathways from early processing areas (V1, V2) through object classification areas (V4, PIT). Although biologically-inspired “deep learning” models have been shown to achieve state- of-the-art results for classification, they are all essentially feedforward models. Our intent is to implement additional neuroscience theories in order to achieve improved object classification and localization. For example, intra-layer lateral interactions have traditionally been ignored. We will implement separate feature extraction models (i.e. color/texture, shape, motion) that incorporate lateral interaction between features. We have demonstrated the feasibility of this procedure with models that mimic processing in V1 and V2 and we wish to explore innovative ways to combine the pathways, similar to what is done in V4 and PIT. Proposed Phase 1 Demonstration At the end of Phase 1, we propose to deliver a proof-of-concept of the following hypotheses, all implemented and tested within the PetaVision framework. Hypothesis I - Texture/Color Patch Classification (“thin stripes” in the processing area V2): Better classification of color/texture features can be achieved by using lateral interactions between learned feature detectors to exploit complex spatial structure in the visual environment. The lateral interactions, which have been shown to be widely prevalent in visual cortex layers, allow the system to learn temporal co-occurrences between feature detectors, thus creating a more robust set of features than those generated by the traditional approach of adding a layer on top of the set of learned feature detectors. Processing of texture/color patches in the visual cortex is thought to be associated with V2 thin stripes, as identified by cytochrome oxidase staining reflecting relative metabolic activity and by optical and electrophysiological recording techniques, which indicate primarily color-opponent responses. Previous work by ourselves and others has shown that dictionaries of color/texture features can be learned by optimizing a cost function that rewards accurate reconstruction of the sensory input using only a sparsely activated subset of the learned dictionary elements. We and others have further shown that these sparsely activated feature detectors can be clustered into semantically meaningful categories, yielding state-of-the-art performance on object detection tasks. Here, we propose to extend these processing hierarchies to construct deep, generative architectures. Hypothesis II - Shape Classification (“inter stripes” or “pale stripes” in V2): Better detection of object shapes can be achieved by hierarchically learning higher-order correlations between edge features within and between multiple spatial scales. Processing of shape in the visual cortex is thought to be associated with V2 inter stripes, where a preponderance of orientation-selective neurons has been observed. Previous work by ourselves and others has shown that lateral synaptic interactions between orientation- selective elements, often referred to as cortical association fields, can support the viewpoint invariant detection of smooth contours. Here, we propose to extend these results by mimicking the ability of cortical neurons, via their extensive dendritic trees, to processes highly non-linear combinations of features and of cortical networks to act at multiple spatial scales: Hypothesis III - Relative Depth from Motion (“thick stripes” in V2): Better resolution of local pattern motion can be achieved by using lateral interactions to resolve aperture effects. Processing of relative depth in the visual cortex is thought to be associated with V2 thick stripes, where motion and stereo-disparity driven responses have been observed. Previous work by ourselves and others has shown that depth from relative motion can be extracted using cortically-inspired models based on motion-energy filters. Here we wish to augment our existing model of the dorsal motion pathway of the primate cortex to include a “ventral stream”, based on observations of V2 and V4 neurons, to compute the relative motion of objects and their depth from the observer. Proposed Phase 2 Demonstration At the end of Phase 2, we propose to combine the 3 processing modalities into a package for object detection and localization in streaming video. The methodology for combining the processing modes is to be inspired from neurophysiological data for observations in the ventral visual areas V4 and PIT. Technical Approach Our overall technical approach has been described in several recent publications, listed below Team Qualifications In addition to the attached CVs, our team qualifications are attested to by the following recent publications involving team members: Paiton, D.M., Kenyon, G.T., Brumby, S.P., Modeling Two Functionally Distinct Ventral Pathways Representing Static Form and Color/Texture, NIPS 2012 (submitted). Schultz, P.F., Bettencourt, L.M.A., Kenyon, G.T., A Symmetry-Breaking Generative Model of a Simple- Cell/Complex-Cell Hierarchy, SSIAI 2012. Paiton, D.M., Brumby, S.P., Kenyon, G.T., Kunde, G.J., Peterson, K.D., Ham, M.I., Schultz, P.F., George, J.S., Combining Multiple Visual Processing Streams for Locating and Classifying Objects in Video, SSIAI 2012. Gintautas, V., Ham, M.I., Kunsberg, B., Barr, S., Brumby, S.P., Rasmussen, C., George, J.S., Nemenman, I., Bettencourt, L. M., Kenyon, G.T., Model Cortical Association Fields Account for the Time Course and Dependence on Target Complexity of Human Contour Perception, PLoS Comput. Biol, 7(10), 2011. Brumby, S.P, Kenyon, G.T., Landecker, W., Rasmussen, C., Swaminarayan, S., and Bettencourt, L. M. A., Large-Scale Functional Models of Visual Cortex for Remote Sensing, AIPR 2009. Garrett Kenyon MS-D454, P-21, Physics Division, LANL, Los Alamos, NM 87545, [email protected], 505-667-1900 Education 1990 Ph.D., Physics, University of Washington 1986 M.S., Physics, University of Washington 1984 B.A., Physics, University of California at Santa Cruz Research and Professional Experience 2001 present Technical Staff Member, Biological and Quantum Physics (P-21), LANL 1992-2000 Postdoc, U. of Texas Med. School, Dept. of Neurobiology and Anatomy 1990-1992 Postdoc, Baylor College of Medicine, Division of Neuroscience Publications ● Paiton, D.M., Kenyon, G.T., Brumby, S.P., Modeling Two Functionally Distinct Ventral Pathways Representing Static Form and Color/Texture, NIPS 2012 (submitted). ● Landecker, W., Thomure, M., Bettencourt, L.M.A., Kenyon, G.T., Mitchell, M., Brumby, S.P., Contribution propagation: Explaining classifications in hierarchical models. ICANN, 2012 (submitted). ● Schultz, P.F., Bettencourt, L.M.A., Kenyon, G.T., A Symmetry-Breaking Generative Model of a Simple- Cell/Complex-Cell Hierarchy, SSIAI 2012. ● Paiton, D.M., Brumby, S.P., Kenyon, G.T., Kunde, G.J., Peterson, K.D., Ham, M.I., Schultz, P.F., George, J.S., Combining Multiple Visual Processing Streams for Locating and Classifying Objects in Video, In SSIAI 2012. ● Gintautas, V., Ham, M.I., Kunsberg, B., Barr, S., Brumby, S.P., Rasmussen, C., George, J.S., Nemenman, I., Bettencourt, L. M., Kenyon, G.T., Model Cortical Association Fields Account for the Time Course and Dependence on Target Complexity of Human Contour Perception, PLoS Comput. Biol, 7(10), 2011 ● Ji, Z., Huang, W.,