Multi-Modal Approaches to Real-Time Video Analysis Team Members: All team members are proposed as Key Personnel. Garrett T. Kenyon*
[email protected] 505-667-1900 Dylan Paiton**
[email protected] 505-720-5074 Pete Schultz
[email protected] Vicente Malave
[email protected] 619-800-4113 *lead developer, **alternate lead developer Description of Capability to be Delivered We intend to provide users with a high-performance (MPI-based, multi-core, GPU-accelerated) software tool for using biologically-inspired, hierarchical neural networks to detect objects of interest in streaming video by combining texture/color, shape and motion/depth cues. The system will be capable of processing video data or still imagery. The software to be provided will build upon an existing high-performance, open-source, NSF-funded neural simulation toolbox called PetaVision. Last fall, several members of our team worked closely together in an analogous boiler room atmosphere while competing in the DARPA NeoVision2 shootout. The close environment and hard deadline allowed for scientific advances at a more accelerated rate and forced innovative ways to combine traditionally independent algorithms. In particular, we found that by combining previously independent efforts to model different aspects of cortical visual processing, including local color/texture patches, long-range shape processing and relative depth from motion, we obtained better detection and localization accuracy than was possible using any of the individual processing techniques alone. By integrating a variety of new approaches to the implementation of these separate visual processing modalities and combining these modalities into a single simulation environment, we expect to produce dramatic improvements in the state-of-the- art of object detection and localization in high-definition video.