IEEE Applied Imagery Pattern Recognition Workshop

Dedicated to facilitating the interchange of ideas between government, industry, and academia

45th AIPR Workshop

Imaging and Artificial Intelligence: Intersection and Synergy

Cosmos Club, Washington, D.C. October 18-20, 2016

AIPR 2016 is sponsored by the IEEE Computer Society Technical Committee on Pattern Analysis and Machine Intelligence with generous support from Cybernet Systems Corporation, Descartes Labs, Inc., and The Charles Stark Draper Laboratory, Inc.

AIPR Workshop Committee

Chairman Peter Doucette, U.S. Geological Survey Deputy Chair Daniela Moody, Los Alamos National Laboratory Program Chairs Murray H. Loew, George Washington University Kannappan Palaniappan, University of Missouri at

Columbia Secretary Carlos Maraviglia, Naval Research Laboratory Treasurer Al Williams, Self-employed Engineer Local Arrangements Donald J. Gerson, Gerson Photography Publicity Peter Costianes, Air Force Research Laboratory, Emeritus Webmaster Charles J. Cohen, Cybernet External Support John Irvine, Draper Laboratory Christoph Borel-Donohue, Army Research Laboratory Registration Chair Jeff Kretsch, Raytheon BBN Technologies Proceedings Chair Franklin Tanner, Hadron Industries Student Paper Award Chair Paul Cerkez, DCS Corp.

AIPR Committee Members:

Jim Aanstoos, Mississippi State University Robert Mericsko, Booz Allen Hamilton Eamon Barrett, Retired Keith Monson, FBI Bob Bonneau, AFOSR Carsten Oertel, MITRE Filiz Bunyak, University of Missouri- William Oliver, Brody School of Medicine Columbia Kannappan Palaniappan, University of John Caulfield, Cyan Systems Industries Missouri-Columbia Charles Cohen, Cybernet James Pearson, Remote Sensing Consultant Peter Costiannes, AFRL, Emeritus Surya Prasath, University of Missouri-Columbia Paul Cerkez, DCS Corp. Amit Puntambekar, Intel Corporation Peter Doucette, U.S. Geological Survey Mike Pusateri, LSI Corporation Roger Gaborski, RIT Harvey Rhody, RIT Donald J Gerson, Gerson Photography David Schaefer, GMU Neelam Gupta, ARL Guna Seetharaman, NRL Mark Happel, Johns Hopkins University Karl Walli, AFIT John Irvine, Draper Lab Elmer "Al" Williams, Self-Employed Engineer Steve Israel, Integrity Applications/ONR Alan Schaum, NRL Michael D. Kelly, IKCS Jeff Kretsch, Raytheon BBN Technologies Emeritus Murray H. Loew, GWU Robert Haralick, City University of New York Carlos Maraviglia, NRL Joan Lurie, GCC Inc. Paul McCarley, AFRL Eglin AFB J. Michael Selander, MITRE

2016 AIPR Visionary Sponsors

We are tackling the huge problem of teaching computers how to see. Our underlying technology, deep learning, models the way the human brain works. Though this was just science fiction even five years ago, advances in the speed of computers and the underlying software make it possible now. Further, because sensors can detect beyond the human visual spectrum, computers can see things that humans simply cannot.

Our first project is building a living atlas of the world. We process petabytes of satellite imagery and extract daily insights that drive decision making in agriculture, finance, natural resources, public policy, and more. We work to create real-time global awareness to better understand changes over time and what is happening in our world right now.

In the Press Deep-Learning, Image-Analysis Startup Descartes Labs Raises $3.3M After Spinning Out Of Los Alamos National Lab techcrunch.com – http://tcrn.ch/1bKgXIH After being incubated in a government lab for seven years, Descartes Labs — a deep-learning startup — has raised $3.3 million of its own. A Cloud-Free Satellite Map of Earth technologyreview.com – http://bit.ly/1p2yzXy Stitching together a live world map from many different satellite images lets algorithms keep an eye on the health of crops and

problems like flooding.

We’re Hiring! Our team includes cosmologists, physicists, mathematicians, engineers, computer scientists, and even a philosopher. We need people who are brilliant coders and stellar problem solvers - we welcome different backgrounds. We are headquartered in the fabulous mountains of Los Alamos, NM, and also have an office in San Francisco, CA. If this intrigues you, please reach out to us at [email protected]. We’d love to hear from you!

45th IEEE AIPR Workshop Program, October 2016

Tuesday October 18 Deep Learning & AI 8:00 AM Registration opens Note: For brevity, this schedule lists only the presenter of each paper; all authors and their affiliations are included with each abstract, indexed using the # in the schedule. Jason Matheny, Director, IARPA: Value of Machine Learning 9:00-9:45 AM for National Security

AI and Image Processing I #20 Real-time detection and classification of traffic light signals; 9:45-10:05 AM Asaad Said, Intel #79 Machine vision algorithms for robust pallet engagement and 10:05-10:25 AM stacking; Charles Cohen, Cybernet Systems Corporation

10:25-10:40 AM Coffee Break

Deep Learning for Image Processing I #6 SPARCNN: SPAtially Related Convolutional Neural Networks; 10:40-11:00 AM J.T. Turner, Knexus Research Corporation #47 Crossview convolutional networks; Nathan Jacobs, University of 11:00-11:20 AM Kentucky

Trevor Darrell, EECS, University of California-Berkeley: 11:25-12:00 PM Perceptual representation learning across diverse modalities and domains

12:00-1:30 PM Lunch on your own

Christopher Rigano, Office of Science and Technology, 1:30-2:15 PM National Institute of Justice: Applied Computer Science in Criminal Justice

AI and Image Processing II #19 Trajectory based event classification from UAV videos and its 2:15-2:35 PM evaluation framework; Sung Chun Lee, Object Video, Inc. #56 Transfer learning for high resolution aerial image classification; 2:35-2:55 PM Yilong Liang, Rochester Institute of Technology #22 Visualization of high dimensional image features for 2:55-3:15 PM classification; Katie Rainey, SPAWAR Systems Center Pacific

3:15-3:30 PM Coffee Break

3:30-4:05 PM John Kaufhold, Deep Learning Analytics: Deep Learning Past, Present and Near Future

Deep Learning for Image Processing II

#60 Hierarchical temporal memory for visual pattern recognition; 4:05-4:25 PM Jianghao Shen, George Washington University #52 Hierarchical autoassociative polynomial network (hap net) for 4:25-4:45 PM robust face recognition; Theus Aspiras, University of Dayton #63 Convolutional neural network application to plant detection, 4:45-5:05 PM based on synthetic imagery; Larry Pearlstein, The College of New Jersey

5:10-5:30 PM Poster Preview 5:30-6:00 PM Cosmos Club Tour

6:00 PM Poster Session and Reception

Tuesday Poster Session: Poster numbers, titles, and presenting author (full list of authors appears on the Abstract page)

2 CSANG: continuous scale anisotropic Gaussians for robust linear structure extraction; V. B. Surya Prasath, University of Missouri-Columbia 7 Seismic signal analysis using multi-scale/multi-resolution transformations; Millicent Thomas, Northwest University 9 Assessment of three-dimensional printing using visible light sensing and a CAD model; Jeremy Straub, University of North Dakota 17 Object identification for inventory management using convolutional neural network; Nishchal K. Verma, Indian Institute of Technology Kanpur, India 30 Multi-scale transformation based image super-resolution; Soundararajan Ezekiel, IUP 33 Keypoint density-based region proposal for fine-grained object detection and classification using regions with convolutional neural network features; JT Turner, Knexus Research Corporation 35 General purpose object counting algorithm using cascaded classifier and learning; Ricardo Fonseca, Universidad Tecnica Federico Santa Maria 44 Adding a selective multiscale and color feature to loft base line; Noor Al-Shakarji, University of Missouri Columbia 45 Auto-calibration of multi-projector systems on arbitrary shapes; Mahdi Abbaspour Tehrani, UC- Irvine 54 Boosted ringlet features for robust object tracking; Evan Krieger, University of Dayton 58 Video haze removal and Poisson blending based mini-mosaics for wide area motion imagery; Rumana Aktar, University of Missouri-Columbia 62 Deep Learning and Artificial Intelligence Completeness; Mihaela Quirk, Quirk & Quirk Consulting 66 Stream implementation of the flux tensor motion flow algorithm using Gstreamer and CUDA; Dardo Kleiner, CIPS, Corp. @ NRL 78 Enhanced GroupSAC: Efficient and Reliable RanSAC scheme in the presence of depth variation; Kyung-Min Han, University of Missouri

Wednesday High-Performance Computing & Biomedical October 19 8:00 AM Registration opens Richard Linderman, SES, Office of the Assistant Secretary of Defense, 8:45-9:30 AM Research and Engineering: DoD Interests in Autonomy and Artificial Intelligence

Scalable Image Processing #26 Scaling analytics for scientific images from experimental instruments; 9:30-9:50 AM Daniela Ushizima, Lawrence Berkeley National Laboratory #73 Computationally efficient scene categorization in complex dynamic 9:50-10:10 AM environments; Allison Mathis, Army Research Laboratory 10:10-10:30 #53 Uncertainty of image segmentation algorithms for forensic attribution of AM materials; Diane Oyen, Los Alamos National lab

10:30-10:45 Coffee Break AM

Earth & Space Sensing Applications 10:45-11:05 #49 Sparse multi-image control for remotely sensed planetary imagery: the AM Apollo 15 metric camera; Jason Laura, United States Geological Survey 11:05-11:25 #57 On the utility of temporal data cubes for analyzing land dynamics; Peter AM Doucette, USGS 11:25-11:45 #14 Training and evaluating detection pipelines with connected components; AM Reid Porter, Kitware Inc.

11:45 AM- Vijayakumar Bhagavatula, Associate Dean, Carnegie Mellon University: 12:20 PM Innovations in Correlation Filter Design Architectures for Robust Object Recognition

12:20-1:45 PM L unch on your own. Executive Committee meeting

P atricia Brennan, Director, National Library of Medicine, National 1:45-2:30 PM Institutes of Health: NLM Anticipating the Third Century: Image Informatics

Biomedical Image Analytics I #5 The National Library of Medicine Pill Image Recognition Challenge: an 2:30-2:50 PM initial report; Ziv Yaniv, National Library of Medicine #1 Confocal vessel structure segmentation with optimized feature bank and 2:50-3:10 PM random forests; V. B. Surya Prasath, University of Missouri-Columbia 3:10-3:30 PM #24 Registration of serial-section electron microscopy image sets for large volume neural circuit reconstruction; Arthur Wetzel, Pittsburgh Supercomputing Center

3:30-3:45 PM C offee Break

N ick Petrick, Center for Devices and Radiological Health, Food and Drug 3:45-4:30 PM Administration: Validation of Quantitative Imaging and Computer-aided Diagnosis Tools

Biomedical Image Analytics II #12 Developing a retrieval based diagnostic aid for automated melanoma 4:30-4:50 PM recognition of dermoscopic images; Mahmudur Rahman, Morgan State University #23 Exploiting the underlying cepstral coefficients for large scale and fine-tuned 4:50-5:10 PM EKG time-imagery analysis; James LaRue, Jadco Signals

5:15 -5:35 PM Poster Preview 5:45-6:15 PM Cosmos Club Tour

5:45-7:00 PM Poster Session and Reception

7:00-9:00 PM Banquet Terry Sejnowski: Deep Learning II

Wednesday Poster Session: Poster numbers, titles, and presenting author (all authors listed on the Abstract page)

3 Second stage PCA with real and interpolated data for pose angle and translation estimation in handshape recognition; Marlon Oliveira, Dublin City University 8 Use of visible light and infrared sensing capabilities to assess the safety of and detect damage to spacesuits; Jeremy Straub, University of North Dakota 11 Learning adaptable patch sizes for facial action unit detection; Duygu Cakir, Bahcesehir University 21 Gabor filter based entropy and energy features for basic image scene recognition; Balakrishna Gokaraju, University of West Alabama 27 Computer aided diagnostic system for automatic detection of brain tumor through MRI using clustering based segmentation technique and SVM classifier; Asim Ali Khan, Sant Longowal Institute of Engineering and Technology 29 Cloud-based interactive image segmentation using the firefly visualization tool; V. B. Surya Prasath, University of Missouri-Columbia 36 Morphological procedure for mammogram enhancement and registration; Huda Alghaib, Utah Valley University 38 Computational spectral capture and display; Aditi Majumder, UC-Irvine 39 Multi-feature fusion and PCA based approach for efficient human detection; Hussin Ragb, University of Dayton 40 Facial expression recognition based on video; Xin Song, Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology, Beijing 48 Dynamic eye misalignment retroversion system; Constantinos Glynos, National Center of Computer Animation, Bournemouth University 55 Vehicle-pedestrian dynamic interaction through tractography of relative movements and articulated pedestrian pose estimation; Lauren Christopher, Indiana University-Purdue University Indianapolis 71 Exploring image-based indicators of crime and economic well-being in sub-Saharan Africa; Payden McBee, Northeastern University and Draper 76 Face detection at multiple infrared bands; Zhan Yu, Adobe

Thursday October 20 Big 3D Data & Image Quality 8:15 AM Registration opens 8:45-9:25 AM Joe Mundy, Brown University & Vision Systems: 3-D Reasoning from an AI Perspective: History and Future

3D & Big Data I - Session Chair: Jason Schwendenmann, NGA 9:30-9:50 AM #61 A multiple view stereo benchmark for satellite imagery; Marc Ruiz, JHU #70 Reflectance retrieval from spatial high resolution multi-spectral and 9:50-10:10 AM hyper-spectral imagery using point clouds; Christoph Borel-Donohue, ARL #77 Rapid development of scientific virtual reality applications; Scott Sorensen, 10:10-10:30 AM University of Delaware

10:30-10:45 AM Coffee Break

3D & Big Data II - Session Chair: Jason Schwendenmann, NGA #68 Building a living atlas in the cloud to analyze and monitor global patterns; 10:45-11:05 AM Daniela Moody, Descartes Labs #80 Real-time 3d object recognition and measurement applications with 11:05-11:25 AM abstract based algorithm (logistics, transportation); SungHo Moon, CurvSurf #81 Effective dataset augmentation for pose and lighting invariant face 11:25-11:45 AM recognition; Daniel Crispell, Vision Systems, Inc.

11:45 AM- Lunch on your own 12:45 PM

1:00-1:45 PM Steven Brumby, Descartes Labs: From Pixels to Answers: Cloud-based forecasting of global-scale systems using a living atlas of the world

Image Quality I - Session Chair: John Irvine, Draper #69 NIIRS prediction using in-scene estimation of parameters; John Irvine, 1:45-2:05 PM Draper #67 Compression induced image quality degradation in terms of NIIRS; 2:05-2:25 PM Genshe Chen, Intelligent Fusion Technology, Inc. #64 Video coding layer optimization for target detection; Larry Pearlstein, The 2:25-2:45 PM College of New Jersey Image Quality II - Session Chair: John Irvine, Draper 2:45-3:05 PM #72 Optical flow on degraded imagery; Josh Harguess, SSCPAC #75 Automated video quality detection using scene segmentation and mover 3:05-3:25 PM detection; Andrew Kalukin, National Geospatial-Intelligence Agency; #74 Quality assessment of OCT-imagery of retina images in dry-AMD (Age- 3:25-3:45 PM related Macular Degeneration) patients; Nastaran Ghadar, Draper End of Workshop

KEYNOTE and INVITED TALKS and SPEAKER BIOGRAPHIES In order of presentation

Jason Matheny, Director, IARPA

Value of Machine Learning for National Security

This talk will describe how machine learning has affected a range of national security missions, the value of past research, and priorities for future research.

Speaker Bio: Dr. Jason Matheny is Director of the Intelligence Advanced Research Projects Activity (IARPA), a U.S. Government organization that invests in high-risk, high-payoff research in support of national intelligence. Before IARPA, he worked at Oxford University, the World Bank, the Applied Physics Laboratory, the Center for Biosecurity and Princeton University, and is the co- founder of two biotechnology companies. His research has been published in Nature, Nature Biotechnology, Biosecurity and Bioterrorism, Clinical Pharmacology and Therapeutics, Risk Analysis, Tissue Engineering, and the World Health Organization’s Disease Control Priorities, among others. Dr. Matheny holds a PhD in applied economics from Johns Hopkins, an MPH from Johns Hopkins, an MBA from Duke, and a BA from the University of Chicago. He received the Intelligence Community’s Award for Individual Achievement in Science and Technology.

1

Trevor Darrell, EECS, University of California-Berkeley

Perceptual representation learning across diverse modalities and domains

Learning of layered or "deep" representations has provided significant advances in in recent years, but has traditionally been limited to fully supervised settings with very large amounts of training data. New results show that such methods can also excel when learning in sparse/weakly labeled settings across modalities and domains. I'll review state-of-the-art models for fully convolutional pixel-dense segmentation from weakly labeled input, and will discuss new methods for adapting deep recognition models to new domains with few or no target labels for categories of interest. As time permits, I'll present recent results on long-term recurrent network models that can learn cross-modal descriptions and explanations.

Speaker Bio: Prof. Darrell is on the faculty of the CS Division of the EECS Department at UC Berkeley and he is also appointed at the UC-affiliated International Computer Science Institute (ICSI). Darrell’s group develops algorithms for large-scale perceptual learning, including object and activity recognition and detection, for a variety of applications including multimodal interaction with robots and mobile devices. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was previously on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy's GRASP lab.

2

Christopher Rigano, Senior Computer Scientist, Office of Science and Technology, National Institute of Justice, U.S. Department of Justice

Applied Computer Science in Criminal Justice

Speaker Bio: Chris Rigano serves as a senior computer scientist at the National Institute of Justice, Office of Justice Programs, US. Department of Justice, and is responsible for advancing the science of person-based analytics for the federal, state, local, and tribal criminal justice communities. His work includes research in image and video analytics, biometrics and social network analysis. Prior to this assignment, Mr. Rigano worked exploratory analytics as a contractor for the intelligence community. Mr. Rigano holds MS degrees in Computer Science from North Carolina State University and in Software Engineering from Monmouth University. He also holds a BS degree in Computer Science from Monmouth University and a BA degree in Criminal Justice from Iona College. He served in the US Army reaching the rank of Captain. His experience includes working as technical direction agent at the Office of Naval Research for the MITRE Corporation. His career spans multiple research areas in computer communications cyberspace, social network and media analytics.

3

John Kaufhold, Deep Learning Analytics

Deep Learning Past, Present and Near Future In the past 5 years, deep learning has become one of the hottest topics at the intersection of data science, society, and business. Google, Facebook, Microsoft, Baidu and other companies have embraced the technology and in domain after domain, deep learning is outperforming both people and competing algorithms at practical tasks. ImageNet Hit@5 object recognition error rates have fallen >85% since 2011 and now can recognize 1,000 different objects in photos faster and better than you can. All major speech recognition engines (Google’s, Baidu’s, Apple Siri, etc.) now use deep learning. In real time, deep learning can automatically translate a speaker’s voice in one language to the same voice speaking another language. Deep learning can now beat you at Atari and Go. These breakthroughs are visible as both product offerings as well as competitive results on international open benchmarks. This recent disruptive history of deep learning has led to a student and startup stampede to master key elements of the technology—and this landscape is evolving rapidly. The abundance of open data, Moore’s law, Koomey’s law, Dennard scaling, an open culture of innovation, a number of key algorithmic breakthroughs in deep learning, and a unique investment at the intersection of hardware and software have all converged as factors contributing to deep learning’s recent disruptive successes. And continued miniaturization in the direction of internet-connected devices in the form of the “Internet of Things” promises to flood sensor data across new problem domains to an already large, innovative, furiously active, and well resourced community of practice. But with disruptive AI technologies come apprehension—we now enjoy deep learning benefits like Siri every day, but privacy concerns, economic dislocation, anxieties about self driving cars, and military drones all loom on the horizon and our legal system has struggled to keep pace with technology.

Speaker Bio: Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company named one of the four fastest growing companies in Arlington, Virginia in 2015. Dr. Kaufhold also serves as Secretary of the Washington Academy of Sciences and is a regular contributor to the DC Data Community, where he moderates the DC2 Deep Learning Discussion list. Prior to founding Deep Learning Analytics, Dr. Kaufhold investigated deep learning algorithms as a staff scientist at NIH. Prior to NIH, Dr. Kaufhold was the youngest member of the Technical Fellow Council at SAIC. Over 7 years at SAIC, Dr. Kaufhold served as principal investigator or technical lead on a number of large government contracts funded by NIH, DARPA and IARPA, among others, taking a sabbatical at MIT to study deep learning in 2010. Prior to joining SAIC, Dr. Kaufhold investigated machine learning algorithms for medical image analysis and image and video processing at GE's Global Research Center. On a Whitaker fellowship, Dr. Kaufhold earned his Ph.D. from Boston University's biomedical engineering department in 2001. Dr. Kaufhold is named inventor on >10 issued patents in image analysis, and author/coauthor on >40 publications in the fields of machine learning, image understanding and neuroscience.

4

Richard Linderman, SES, Office of the Assistant Secretary of Defense, Research and Engineering

DoD Interests in Autonomy and Artificial Intelligence Recent developments in machine learning, neuromorphic computing, and emerging memory technologies indicate that Artificial Intelligence is experiencing a technological renaissance. The Department of Defense is intrigued by these developments because they have the potential to enhance existing research focusing on autonomy and artificial intelligence. These areas may be the basis of a new DoD offset strategy. These efforts are essential to the Department as it faces key operational challenges of the 21st Century, including rapid response, manpower inefficiencies, and harsh environments. The development and implementation of autonomous technologies poses unique challenges of its own, such as fostering collaboration and trust between humans and autonomous systems, engineering machine perception, reasoning, and intelligence, and enacting test, evaluation, validation and verification schemes.

Speaker Bio: Dr. Richard W. Linderman, a member of the Scientific and Professional Cadre of Senior Executives, is the Deputy Director for Information Systems and Cyber Technologies in the Office of the Assistant Secretary of Defense, Research and Engineering. In this position he provides the senior leadership and oversight of all Department of Defense science and technology programs in the areas of information technology; cyber security; autonomy; high performance computing and advanced computing; software and embedded systems; networks and communications; and large data and data-enabled decision making tools. He also has cognizance over the complete spectrum of efforts in computers and software technology, communications, information assurance, and information management and distribution. Dr. Linderman was commissioned as a second lieutenant in May 1980. Upon completing four years of graduate studies, he entered active-duty, teaching computer architecture courses and leading related research at the Air Force Institute of Technology. He was assigned to Rome Air Development Center in 1988, where he led surveillance signal processing architecture activities. In 1991, he transitioned from active-duty to civil service as a senior electronics engineer at Rome Laboratory, becoming a principal engineer in 1997. During these years, he pioneered three-dimensional packaging of embedded architectures and led the Department of Defense community exploring signal and image processing applications of high performance computers. He conceived and demonstrated the use of PS3 gaming consoles to architect supercomputers with outstanding affordability and power efficiency. Dr. Linderman holds seven U.S. patents with three pending U.S. Patent Applications and has published more than 90 journal, conference and technical papers.

5

Vijayakumar Bhagavatula, Associate Dean, Carnegie Mellon University

Innovations in Correlation Filter Design Architectures for Robust Object Recognition

In many computer vision problems, the main task is to match two appearances of an object (e.g., face, iris, vehicle, etc.) that may exhibit appearance differences due to factors such as translation, rotation, scale change, occlusion, illumination variation and others. One class of methods to achieve accurate object recognition in the presence of such appearance variations is one where features computed in a sliding window in the target image are compared to features computed in a stationary window of the reference image. Correlation filters are an efficient frequency-domain method to implement such sliding window matching. They also offer benefits such as shift-invariance (i.e., the object of interest doesn’t have to be centered), no need for segmentation, graceful degradation and closed-form solutions. While the origins of correlation filters go back more than thirty years, there have been some very interesting and useful recent advances in correlation filter designs and their applications. For example, the new maximum margin correlation filters (MMCFs) show how the superior localization capabilities of correlation filters can be combined with the generalization capabilities of support vector machines (SVMs). Another major research advance is the development of vector correlation filters that use features (e.g., HOG) extracted from the input image rather than just input image pixel values. While past application of correlation filters focused mainly on automatic target recognition, more recent applications include face recognition, iris recognition, palmprint recognition and visual tracking. This talk will provide an overview of correlation filter designs and applications, with particular emphasis on these more recent advances.

Speaker Bio: Prof. Vijayakumar (“Kumar”) Bhagavatula received his Ph.D. in Electrical Engineering from Carnegie Mellon University (CMU), Pittsburgh and since 1982, he has been a faculty member in the Electrical and Computer Engineering (ECE) Department at CMU where he is now the U.A. & Helen Whitaker Professor of ECE and the Associate Dean for the College of Engineering. He served as the Associate Head of the ECE Department and also as its Acting Department Head. Professor Kumar's research interests include Pattern Recognition and Coding and Signal Processing for Data Storage Systems and for Digital Communications. He has authored or co-authored over 600 technical papers, twenty book chapters and one book entitled Correlation Pattern Recognition. He served as a Topical Editor for Applied Optics and as an Associate Editor of IEEE Trans. Information Forensics and Security. Professor Kumar is a Fellow of IEEE, a Fellow of SPIE, a Fellow of Optical Society of America (OSA) and a Fellow of the International Association of Pattern Recognition (IAPR).

6

Patricia Brennan, Director, National Library of Medicine, National Institutes of Health

NLM Anticipating the Third Century: Image Informatics

The US National Library of Medicine is one of the 27 institutes and centers of the National Institutes of Health. NLM is the world’s largest research library in medicine and the life sciences, and is also home to research and development centers focusing on large databases in genomics, imagery and bibliographic data, as well as analytical tools that enable the use of this ‘big’ data by the biomedical and computational communities, and the general public. This talk will address NLM’s current work and future directions in image informatics – the confluence of image processing, machine learning and natural language processing. Among the many motivators for this work are automation in disease screening in resource- poor countries (for malaria and tuberculosis), automated extraction of bibliographic data from biomedical articles to build MEDLINE/PubMed citations, public access to photorealistic versions of rare historical volumes in biomedicine, mitigation of effects of large scale disasters (family reunification), and other goals within the NLM mission. The talk will conclude with an overview of the strategic vision of the NLM and its role in fostering efficient use of large data.

Speaker Bio: Patricia Flatley Brennan, RN, PhD, is the Director of the National Library of Medicine (NLM). The NLM is the world’s largest biomedical library and the producer of digital information services used by scientists, health professionals and members of the public worldwide. She assumed the directorship in August 2016. Dr. Brennan came to NIH from the University of Wisconsin-Madison, where she was the Lillian L. Moehlman Bascom Professor at the School of Nursing and College of Engineering. She also led the Living Environments Laboratory at the Wisconsin Institutes for Discovery, which develops new ways for effective visualization of high dimensional data. Dr. Brennan is a pioneer in the development of information systems for patients. She developed ComputerLink, an electronic network designed to reduce isolation and improve self-care among home care patients. She directed HeartCare, a web-based information and communication service that helps home-dwelling cardiac patients recover faster, and with fewer symptoms. She also directed Project HealthDesign, an initiative designed to stimulate the next generation of personal health records. Dr. Brennan has also conducted external evaluations of health information technology architectures and worked to repurpose engineering methods for health care. She received a master of science in nursing from the University of Pennsylvania and a PhD in industrial engineering from the University of Wisconsin-Madison. Following seven years of clinical practice in critical care nursing and psychiatric nursing, Dr. Brennan held several academic positions at Marquette University, Milwaukee; Case Western Reserve University, Cleveland; and the University of Wisconsin-Madison. A past president of the American Medical Informatics Association, Dr. Brennan was elected to the Institute of Medicine of the National Academy of Sciences (now the National Academy of Medicine) in 2001. She is a fellow of the American Academy of Nursing, the American College of Medical Informatics, and the New York Academy of Medicine.

7

Nick Petrick, Center for Devices and Radiological Health, Food and Drug Administration

Validation of Quantitative Imaging and Computer-aided Diagnosis Tools

Advances in biology are improving our understanding of the mechanisms underlying diseases and response to therapeutic interventions. These advances provide new opportunities to match patients with diagnostics and therapies that are more likely to be safe and effective, and enable “precision medicine.” We are also seeing advances in medical imaging and computer technologies allowing the extraction of biologically relevant information from medical images giving rise to a wide range of quantitative imaging (QI) and computer-aided diagnosis (CAD) tools for use in clinical medicine. QI tools extract specific features or metrics from medical images relevant to, for example, disease state and treatment response. Radiological CAD tools are computerized algorithms that incorporate pattern recognition and data analysis capabilities (i.e., combine values, measurements, or features extracted from patient radiological data to discover patterns in the data). CAD systems typically merges information from multiple QI features, either directly (e.g., support vector machine classifier) or indirectly (e.g., deep learning methods). A distinction between QI and CAD is that in QI, the emphasis is on establishing a specific imaging metric’s association with a disease condition while the emphasis in CAD is on how the output aids the clinician in decision-making. In order to advance QI and CAD tools into clinical use, there is a strong need to establish appropriate and widely accepted performance assessment methods. This talk will provide an overview of our latest ongoing regulatory research related to technical and clinic performance assessment methods for QI and CAD tools. These assessment techniques are not specific medical image analysis but generalize to a wide range of pattern recognition and machine learning tools.

Speaker Bio: Nicholas Petrick is Acting Division Director for the Division of Imaging, Diagnostics and Software Reliability within the U.S. Food and Drug Administration, Center for Devices and Radiological Health and was appointed to the FDA Senior Biomedical Research Service. He earned his B.S. degree from Rochester Institute of Technology in Electrical Engineering and his M.S. and Ph.D degrees from the University of Michigan in Electrical Engineering Systems. His interests include imaging biomarkers, CAD, image processing, assessment methods and x-ray imaging physics.

8

Terrence Sejnowski, Salk Institute and University of California at San Diego

Banquet Talk: Deep Learning II

Deep Learning is based on the architecture of the primate visual system, which has a hierarchy of visual maps with increasingly abstract representations. This architecture is based on what we knew about the properties of neurons in the visual cortex in the 1960s. The next generation of deep learning based on more recent understanding of the visual system will be more energy efficient and have much higher temporal resolution.

Speaker Bio: Dr. Terrence Sejnowski received his Ph.D. in physics from Princeton University. He was a postdoctoral fellow at Princeton University and the Harvard Medical School. He served on the faculty of Johns Hopkins University and was a Wiersma Visiting Professor of Neurobiology and a Sherman Fairchild Distinguished Scholar at Caltech. He is now an Investigator with the Howard Hughes Medical Institute and holds the Francis Crick Chair at The Salk Institute for Biological Studies. He is also a Professor of Biology at the University of California, San Diego, where he is co- director of the Institute for Neural Computation and co-director of the NSF Temporal Dynamics of Learning Center. He is a pioneer in computational neuroscience and his goal is to understand the principles that link brain to behavior. His laboratory uses both experimental and modeling techniques to study the biophysical properties of synapses and neurons and the population dynamics of large networks of neurons. New computational models and new analytical tools have been developed to understand how the brain represents the world and how new representations are formed through learning algorithms for changing the synaptic strengths of connections between neurons. He has published over 500 scientific papers and 12 books, including The Computational Brain, with Patricia Churchland. Sejnowski is the President of the Neural Information Processing Systems (NIPS) Foundation, which organizes an annual conference attended by over 2000 researchers in machine learning and neural computation and is the founding editor-in-chief of Neural Computation published by the MIT Press. He is a member of the Institute of Medicine, National Academy of Sciences and the National Academy of Engineering, one of only ten current scientists elected to all three national academies.

9

Joe Mundy, Brown University & Vision Systems

3-D Reasoning from an AI Perspective: History and Future

The field of artificial intelligence (AI) has from its very inception recognized the importance of grounding reasoning about the world in 3-d space. Scene understanding cannot be addressed without an underlying representation of 3-d space and its properties. In the first few decades of the research, the use of 3-d geometric models was prominent in describing scene content. At the same time, a broad base of logical reasoning was developed as the key approach to knowledge representation. The two research threads intersected in an area of research called geometric reasoning, as will be described in the presentation. Starting in the mid 90’s the role of 3-d representation begin to be deemphasized in favor of relying on massive datasets of 2-d images for training classifiers without any particular internal representation of the scene. This trend was further accelerated in the last decade by the resurgence of neural networks for the third time over AI’s existence. In this case, significant success was achieved because of the virtually exhaustive datasets and fast GPU-based hardware to support adapting millions of network parameters. In spite of this success in data-driven methods there is still a huge gap in understanding the underlying nature of scenes, and to express this understanding in an ontological/logical form. It is argued that such understanding is essential to deal with scene content and events that are not often visually observed but that can be characterized by generalization and abstract reasoning. The talk will provide examples and a conclusion about the way forward.

Speaker Bio: Dr. Mundy received his B.E.E. (1963) and M.Eng. (1966) and Ph.D. (1969) from Rensselaer Polytechnic Institute. He has published over 100 papers and articles in computer vision and solid state electronics. Dr. Mundy joined General Electric’s Research and Development Center (CRD) in 1963. His early projects at CRD include: High power microwave tube design, superconductive computer memory devices, the design of high density integrated circuit associative memory arrays, and the application of transform coding to image data compression. He is the co-inventor of varactor bootstrapping, a key technique still widely used today in the design of CMOS integrated circuits. From 1972 until 2002, Dr. Mundy led a group involved in the research and development of image understanding and computer vision systems. In the early 1970’s his group developed one of the first major applications of computer vision to industrial inspection – inspecting incandescent lamp filaments at the rate of 15 parts/sec. and achieved classification performance of less than one error per thousand. His more recent research themes at CRD included: industrial photogrammetry for machine control, theory of geometric invariance, change detection in satellite imagery and CT classification of lung cancer lesions. In 1988, Dr. Mundy was named a Coolidge Fellow, GE’s highest technical honor. He applied the fellowship to a sabbatical at Oxford University, working with Sir Michael Brady and Prof. Andrew Zisserman on applications of invariant theory to computer vision. This work led to the Marr Prize award in 1993. In 2002 Dr. Mundy joined the School of Engineering at Brown as Prof. of Engineering (research). His research at Brown, under sponsorship of DARPA and NGA, includes video and image analysis, with emphasis on change detection and 3-d volumetric modeling. In 2011, Dr. Mundy co- founded Vision Systems Inc. and is President and CEO. At VSI, Dr. Mundy has managed the DARPA Tailwind and Visual Media Reasoning projects that are aimed at aerial video processing and scene analysis. Dr. Mundy also provides general project management of current VSI efforts in 3-d modeling from satellite imagery, geo-location of ground-level imagery and facial recognition.

10

Steven Brumby, Descartes Labs

From Pixels to Answers: Cloud-based forecasting of global-scale systems using a living atlas of the world

Decades of Earth observation missions have accumulated petabytes (8×1015 bits) of science-grade imagery, which are now available in commercial cloud storage. These cloud platforms enable far more than just efficient storage and dissemination – adjacent massive computing resources constitute on-demand super-computing at a scale that was previously only available to national laboratories, in principle enabling automated real-time analysis of all this imagery. The remaining limitations are the need for remote sensing science and machine learning algorithms and software can that exploit this data-rich environment, and ground data to train and validate models. While good sources of ground data remain elusive, recent advances in deep learning algorithms and scientific software tools have reached a critical threshold of new capability: we can now combine all the available satellite imagery into a living atlas of the world, and use that atlas to build a forecasting platform for global-scale systems.

We report here on work processing petabytes of compressed raw image data acquired by the NASA/USGS Landsat, NASA MODIS, and ESA Sentinel programs over the past 40 years. Using commodity cloud computing resources, we convert the imagery to a calibrated, georeferenced, multi-resolution format suited for machine learning analysis. We use this multi-sensor, multi- constellation dataset for global-scale agricultural forecasting, environmental monitoring and disaster analysis. We apply remote sensing and deep learning algorithms to detect and classify agricultural crops and then estimate crop yields. In regions with poor ground data, still the case for most of the world, we explore simpler indicators of crop health for famine early warning. Our dataset and approach supports monitoring water resources and forestry resources, and characterization of land use within cities.

Speaker Bio: Steven is Co-Founder and Chief Strategy Officer of Descartes Labs. Prior to Descartes, Steven was a Senior Research Scientist at Los Alamos National Laboratory (LANL) Information Sciences Group, and led LANL’s deep learning computer vision projects for satellite imagery, aerial video, and social media video analysis at petabyte scale. His PhD is in theoretical particle physics from the University of Melbourne.

11

12

AIPR 2016 ABSTRACTS

CONTRIBUTED PAPERS In numerical order

Note: Numbering of papers is monotonic but not strictly consecutive; all papers in the regular oral and poster sessions are listed below. A cross-reference of paper number to presentation session appears at the end of the list.

Paper 1

Confocal Vessel Structure Segmentation with Optimized Feature Bank and Random Forests

Yasmin M. Kassim, University of Missouri-Columbia; V. B. Surya Prasath*, University of Missouri- Columbia; Kannappan Palaniappan, University of Missouri

In this paper, we consider confocal microscopy based vessel segmentation with optimized features and random forest classification. By utilizing vessel-specific features tuned to capture curvilinear structures such as multiscale Frobenius norm of the Hessian eigenvalues, laplacian of Gaussians we obtain better segmentation results in challenging imaging conditions. We obtain binary segmentations using random forest classifier trained on physiologists marked ground-truth. Experimental results on mice dura mater confocal microscopy vessel segmentations indicate that we obtain better results compared to global segmentation approaches.



Paper 2

CSANG: Continuous Scale Anisotropic Gaussians for Robust Linear Structure Extraction

V. B. Surya Prasath*, University of Missouri-Columbia; Sruthikesh Surineni, University of Missouri- Columbia; Ke Gao, University of Missouri-Columbia; Guna Seetharaman, US Naval Research Laboratory; Kannappan Palaniappan, University of Missouri

Robust estimation of linear structures such as edges and junction features in digital images is an important problem. In this paper, we use an adaptive robust structure tensor (ARST) method where the local adaptation process uses spatially varying adaptive Gaussian kernel that is initialized using the total least-squares structure tensor solution. An iterative scheme is designed with size, orientation, and weights of the Gaussian kernel adaptively changed at each iteration step. Such an adaptation and continuous scale anisotropic Gaussian kernel change for local orientation estimation helps us obtain robust edge and junction features. We consider GPU implementation which obtained $20\times$ speed improvement over traditional CPU implementation. Experimental results on noisy synthetic and real images indicate we obtain robust edge detections and further comparison with other edge detectors show our approach obtains better accuracy.

13

Paper 3

Second stage PCA with real and interpolated data for pose angle and translation estimation in handshape recognition

Marlon Oliveira*, Dublin City University; Alistair Sutherland, Dublin City University

Handshape recognition is a challenging task because hands are deformable objects. Some techniques for handshape recognition using Computer Vision have been studied. The key problem is how to make hand gestures understood by computers/mobile devices. In this paper we will present a study about Principal Component Analysis (PCA) used to reduce the dimensionality and extract features of images of the human hand. These hands represent the alphabet of Irish Sign Language. We propose to apply PCA in more than one stage, creating a second stage PCA with even lower dimensions. In this second stage, we will interpolate data using splines. These data are either manifolds or eigenspaces. Different blurring levels, using a Gaussian filter, are applied to these images in order to reduce the nonlinearity in the manifolds within the eigenspaces. Some comparison of the influence of blurring and the number of eigenvectors considered will be shown. Finally we will apply Nearest Neighbour (NN) in order to classify the correct shape, angle and translation and show the accuracy.



Paper 5

The National Library of Medicine Pill Image Recognition Challenge: An Initial Report

Ziv Yaniv*, National Library of Medicine

In January 2016 the U.S. National Library of Medicine announced a challenge competition calling for the development and discovery of high-quality algorithms and software that rank how well consumer images of prescription pills match reference images in its authoritative RxIMAGE database. Potential benefits of this capability include confirmation of the pill in settings where pill documentation and the pill itself have been separated, such as in a disaster or emergency; and confirmation of a pill when the prescribed pill changes from brand to generic, or for any other reason the shape and color of the pill change. NLM plans to use challenge submissions in developing a software system that matches the best RxIMAGE image to the submitted pill image, possibly acquired with a tablet or cellphone camera. The submitted picture must be of a single pill that lies on a uniformly colored or white surface.

The challenge looks at two kinds of images: high-quality “reference images” and “consumer quality images” like those from smart phones. For each pill there are two reference images – front and back or “side A” and “side B”. One thousand pills were used for the challenge training data set; for each pill there were two reference images and five consumer-quality images. A challenge submission is to construct a matrix in which for each consumer-quality image (row), column entries rank how well each reference image is likely to be of the same pill. A separate data set of similar size and distribution of pill shape and color will be used to judge the challenge. The Mean Average Precision

14

(MAP) quality measure will be used to determine the winners. We will report on the results in the final manuscript.



Paper 6

SPARCNN: SPAtially Related Convolutional Neural Networks

JT Turner*, Knexus Research Corporation; Kalyan Gupta, Knexus Research Corp; David Aha, Navy Center for Applied Research in AI, Naval Research Laboratory

The ability to accurately detect and classify objects at varying pixel sizes in cluttered scenes is crucial to many Navy applications. However, detection performance of existing state-of-the-art approaches such as convolutional neural networks (CNNs) degrade and suffer when applied to such cluttered and multiobject detection tasks. We conjecture that spatial relationships between objects in an image could be exploited to significantly improve detection accuracy, an approach not yet considered by any existing technique (to the best of our knowledge). We introduce a detection and classification technique called Spatially Related Detection with Convolutional Neural Networks (SPARCNN) that learns and exploits a probabilistic representation of interobject spatial configurations within images from training sets for more effective region proposals to use with state-of-the-art CNNs. Our empirical evaluation of SPARCNN on the VOC 2007 dataset shows that it increases classification accuracy by 8% when compared to a region proposal technique that does not exploit spatial relations. More importantly, we obtained a higher performance boost of 18.785% when task difficulty in the test set is increased by including highly obscured objects and increased image clutter.



Paper 7

Seismic Signal Analysis using Multi-Scale/Multi-resolution Transformations

Millicent Thomas*, Northwest University

Abstract: Fast Fourier Transforms have been used since the early 1960s as a method of processing signals. Since the 1990s wavelet transformation has also been routinely used as method for signal processing. Their limitations include the inability to detect contours, curves and directional information of a signal. In the past few years, new approaches such as Multi-scale and multi- resolution transformations have become more prevalent as methods to extrapolate information from signals and images. With roots in the Fourier transform and wavelet, multi-scale/multi- resolution transformations have recently given rise to even more advanced methods such as the bandelet, Contourlet, ridgelet, and curvelet. Additionally, advancements have been made to the standard wavelet to improve their analytical capabilities, resulting in methods such as the dual-tree discrete wavelet transformation and the dual density discrete wavelet transformation. Many studies have been done to develop and validate algorithms for wavelet multi-scale analysis but other types of

15 transformations have not been as thoroughly studied. We propose a novel method of seismic signal analysis using these multi-scale transformations to identify and extrapolate seismic information from them. Until now there has only been research done on applying Curvelets or Contourlets to image processing. The proposed analysis method makes Contourlets and Curvelets a viable alternative to the DWT and FFT during seismic signal processing. Further, the outcome can then be statistically analyzed for use in applications such as earthquake prediction. The initial results are promising, indicating that the Contourlet and Curvelet methods yield a better prediction than the DWT and FFT.



Paper 8

Use of visible light and infrared sensing capabilities to assess the safety of and detect damage to spacesuits

Jeremy Straub*, University of North Dakota

Spacesuits are designed to protect their occupants from hostile environments. They can support the very low pressure and low temperature environment of Earth orbit, the surface of Mars and beyond. Through use, suits may become damaged or contaminated. Their assessment presents a number of challenges. Suits may need to be inspected before entering the airlock (and potentially exposing the occupants of the vehicle or habitat to the prospective contaminants). Other suits may be attached to an exterior hatch, removing the contamination concern, but making access for inspection problematic. In either case, the suits bulk and movement limitations make human inspection problematic. This paper presents initial work on the development of a 3D scanning-based sensing and assessment solution for spacesuit assessment and damage detection. First, it presents a discussion of alternate approaches. Specifically, these include visual assessment and physical approaches (such as pressuring / over pressurizing the suits to detect current and prospective leaks). Each approach is considered and its benefits and issues are discussed. Next, three approaches for spacesuit assessment 3d scanning are presented. The first is an airlock-based solution that utilizes this chamber (and cameras built in to it) to assess the spacesuit for damage (and potentially, for planetary applications where planetary protection is a concern, contamination) during the exit process and contaminants and damage on entry. Under this approach, the relevant action would be stopped if an issue was detected. A second approach scans the suits prior to airlock entry (both inside and outside the station). A third approach is then presented which uses externally positioned and actuated sensors to assess spacesuits stored outside the station or habitat. All three approaches are assessed. The paper concludes with a discussion of the importance of damage and contamination scanning and a discussion of ongoing and future work in this area.

16

Paper 9

Assessment of Three-Dimensional Printing Using Visible Light Sensing and a CAD Model

Jeremy Straub*, University of North Dakota

The assessment of additive manufacturing (commonly known as 3D printing) is an open challenge. Even the most expensive 3D printers have quality control and configuration problems, many of which necessitate operators to perform manual actions. Less expensive printers have more plentiful problems and, due to these and other factors, a significant number of 3D print jobs end in failure, wasting energy and material. Previous work presented a solution for the assessment of 3D printed objects (both on an in-process and post-completion basis) using comparison to expected results in the form of a pristine object. While useful in demonstrating the assessment technique, this approach requires the production of a pristine object (including manual inspection to confirm this) and is problematic for interior surface and fill-material assessment. This technique is also not suitable for bespoke item production or mass customization applications. This paper presents an adaptation of this technique based on the use of a CAD model in place of the pristine object. After providing an overview of the technique, it then presents and assesses algorithms for the calibration of the printer-camera system and the coarse and precise alignment of sensed and actual objects. Algorithms for assessing completed and in-process objects are also presented. Using this anomaly data, an expert system-based approach is utilized to classify and devise a response to identified anomalies. This expert system is described in detail, with particular focus on the input data that is provided to it from the sensed data processing algorithms and the composition and operation of the expert system’s rule-fact network. The translation of the expert system recommendation back into actionable controller commands is also discussed. The efficacy of this approach is assessed and a discussion of empirical success and failure examples is presented. The paper concludes by discussing ongoing and future work.



Paper 11

Learning Adaptable Patch Sizes for Facial Action Unit Detection

Duygu Cakir*, Bahcesehir University; Nafiz Arica, Bahcesehir University

Facial Action Coding System (FACS) provides the description of all visual facial changes in terms of Action Units (AU), which express the movements of facial muscles. In this study we introduce a sparse learning method for AU detection based on facial landmarks. The proposed method learns the active landmarks for each AU while finding their best representative patch sizes in a unified framework. The experiments performed on the publicly available CK+ dataset show that the proposed approach outperforms the state-of-the-art studies by gaining the highest F1 scores for most of the Action Units. The experiments also show that the subtle muscle movements belonging to the upper face require smaller landmark patches while the lower face AUs are detected better in larger patches.

17

Paper 12

Developing a Retrieval Based Diagnostic Aid for Automated Melanoma Recognition of Dermoscopic Images

Mahmudur Rahman*, Morgan State University

Skin cancer in the form of malignant melanoma is one of the most dangerous types which is responsible for over 9,000 deaths each year in the US. The detection of the malignant melanoma in its early stages with dermoscopic images reduces the mortality considerably, hence this a crucial issue for the dermatologists. However, their interpretation is time consuming and subjective, even for trained dermatologists. Therefore, there is currently a great interest in the development of computer- aided diagnosis (CAD) systems due to the new advances in and pattern recognition that can assist the clinical evaluation of dermatologists. However, these systems are mainly non-interactive in nature and the prediction represents just a cue for the dermatologist, as the final decision regarding the likelihood of the presence of a melanoma is left exclusively to him/her. In the last several years, developing CAD schemes that use content-based image retrieval (CBIR) approach to search for the clinically relevant and visually similar medical images (or regions) depicting suspicious lesions has also been attracting research interest. CBIR-based CAD schemes have potential to provide radiologists with “visual aid” and increase their confidence in accepting CAD-cued results in the decision making. Although preliminary studies have suggested that using CBIR might improve dermatologists’ performance and/or increase their confidence in the decision making, this technology is still in the early development stage with lack of benchmark evaluation in ground-truth data sets to compare retrieval effectiveness. This work is focusing on by addressing the various issues related to the development of such a CBIR system and participating in the Skin Lesion Analysis Towards Melanoma Detection challenge of ISBI 2016. Besides serving as a smart computer-aided decision support tool for clinical diagnosis, it has huge potential for offering powerful services in teaching and research related to skin cancer.



Paper 14

Training and Evaluating Detection Pipelines with Connected Components

Reid Porter*, Kitware Inc. ; Arslan Basharat, Kitware Inc. ; Roddy Collins, Kitware Inc. ; Matt Turek, Kitware, Inc., NY; Anthony Hoogs, Kitware

A continuing trend in machine learning is end-to-end training, where system-level performance metrics, in combination with large training sets, are used to optimize the processing pipeline for the task at hand. The application where machine learning has been perhaps most successful, is image classification, and in this case the system level performance metric is relatively well defined in terms of the number of mistakes. In this paper we consider a more general class of applications related to analyzing wide-area overhead imagery and video. The system level performance metrics for these applications are more varied and less well defined, and typically involve multiple algorithmic components within a larger processing pipeline. A common sub-pipeline used in many of these

18

applications involves translating pixel level predictions (heat maps) into point locations (often geo- coordinates) and associated bounding boxes (or object boundaries). A number of standard computer vision sub-pipelines are used for this task, but these are often applied as a post-processing step and are typically not included in end-to-end training. In this paper we suggest a framework based on the Rand Index that can provide a well-defined performance metric for this sub-pipeline. We provide an algorithm that can efficiently compute the metric, and show how it can be tailored to provide similar behavior to other metrics used in the literature. Perhaps most importantly, we show how the performance metric can be used in training to provide efficient end-to-end learning of this common sub-pipeline.



Paper 17

Object Identification for Inventory Management using Convolutional Neural Network

Nishchal K. Verma*, Indian Institute of Technology Kanpur, India; Teena Sharma, Indian Institute of Technology Kanpur, India; Shreedharkumar D. Rajurkar, Indian Institute of Technology Kanpur, India; Al Salour, The BOEING Company, USA

In this paper, a real time inventory monitoring system using Convolutional Neural Network (CNN) has been proposed. The objective is to develop a reliable and efficient method for object counting and localization in the inventory using vision interface. Firstly, the Region of interest (ROI) is extracted from the image of inventory using visual features i.e. stable regions of input image. The centroids for these stable regions are estimated using Connected Component Analysis (CCA). These centroids are used to get bounding box in the stable regions. The bounding box generated for their respective ROIs are fed to CNN followed by Softmax layer for classification. Herein, single layer CNN architecture is used. The Softmax layer is used to classify the bounding boxes and final box is drawn for approximate location of object with distance, measured using Kinect sensor. The CNN training has been performed using STL-10 Dataset. The Softmax layer is trained for four classes with their respective labels. The test has been performed for four different classes. The developed algorithm performed well irrespective of resolution and color variations in input image and able to count objects with respect to depth parameter.



Paper 18

Stereo-Vision based Object Grasping using Robotic Manipulator

Aquib Mustafa, Indian Institute of Technology Kanpur, India; Nishchal K. Verma*, Indian Institute of Technology Kanpur, India; Al Salour, The BOEING Company, USA

This work presents a vision based object grasping methodology for a robotic manipulator. The method allows the manipulator to grasp a stationary, randomly localized object in any environment. The first part of the work deals with the development of kinematical model viz. forward and inverse

19 kinematics of the robotic manipulator through analytical approach.The second part incorporates Maximally stable extremal regions (MSER) based feature detection method with density based clustering and homography transform to localize the object in real world. The grasping of an object requires detection of its centroid with respect to manipulator base frame.This involves a two step process, a) determining 3D coordinates with respect to camera frame using intrinsic calibration parameters and pixel values, and b) transformation of camera frame into base frame through extrinsic transformation matrix. The coordinates of centroid are then used with developed inverse kinematics for guiding manipulator to reach the location of given object. The accuracy and robustness of the algorithm is tested with the robotic manipulator of five degree of freedom. The results obtained prove the effectiveness of grasping algorithm in the real world scenario.



Paper 19

Trajectory Based Event Classification from UAV Videos and its Evaluation Framework

Sung Chun Lee*, Object Video Inc.

Videos from an Unmanned Aerial Vehicle (UAV) platform have become more available in the surveillance community. As the number of video samples are increasing, automated analysis of these videos have become important. The information of the detected events from videos can be used to create a database that could store large quantities of video, and make it easily to find ”video content of interest” (e.g. ”find all of the footage where vehicles are making turns”) – this is known as ”content based searching”. In this paper, I describe the task of vehicle movement event classification in videos taken from UAV. I further limit our discussion to events involving a single vehicle though many of the methods described can easily generalize to multiple vehicle events. Given object track information from UAV video sequences, the system processed the noisy trajectories to make reliable so that it can classify each trajectory as movement event using trajectory curvature and speed. I also implemented an evaluation methodology framework and tested the proposed method with a public UAV dataset.



Paper 20

Real-time Detection and Classification of Traffic Light Signals

Asaad Said*, Intel

Detection of Traffic Light Signals (TLS) is an important capability needed for Advanced Driver Assist Systems (ADAS) as well as autonomous vehicle systems. Urban scenes, in practice, often have complex background characteristics that can amplify distractions affecting the driver. Proper detection of TLS and perception of its status can ensure safe navigation at intersections. This paper introduces a novel and real-time solution to detect TLS and recognize their status in complex traffic scenes solely based on image processing techniques and onboard camera. The proposed approach

20

uses innovative techniques to significantly decrease compute requirements for segmenting TLS’s color in any given RGB image. Each segmented region is represented by significant features that are fed to a cascaded classifier. The developed classifier accurately detects TLS among all candidates. The proposed system was tested on seven TLS videos to collect the statistics. The used videos cover wide range of different scenarios such as; day-time, cloudy, rainy, and night time videos. The total number of candidate regions after segmentation is 113,827, which represents 33,564 positives and 80,263 negatives. The final precision and recall using the proposed approach is 95% and 94.7%, respectively. The final code has been implemented and optimized in C++ to process 720p and 1080p videos in real-time (12ms to 28ms per frame) on an Intel NUC (2.4GH dual-core). The proposed system outperforms other existing approaches in both accuracy and speed.



Paper 21

Gabor Filter Based Entropy and Energy Features for Basic Image Scene Recognition

Balakrishna Gokaraju*, University of West Alabama; Anish Turlapaty, VR Siddhartha Engineering College

Recognition of basic image categories is effortless for Human Visual System (HVS) and necessarily not include object classification. In the HVS, the receptor cells on retinal surface are sensitive to light intensity and are responsible for visual information processing. The primary goal of computer vision research is to imitate HVS performance. The consensus of researchers is that Gabor-like linear spatial filtering plays a crucial role in the function of mammalian vision systems. Motivated by this biological finding, a 2D Complex Gabor (CG) function has been recognized as a very useful tool in this field. Our research objective is to develop a supervised learning based hierarchical classification framework built upon Gabor features. Specifically, we experimented on the Oliva Toralba data-set from the Corel stock photo library. This data set consists of 2688 natural and artificial scene color images, each of size (256X256X3), from 8 sub-categories. In this paper, we restrict our goal to categorization of images into natural and artificial groups. The methodology consists of feature extraction and binary classification stages. In the first stage, we extract the Gabor filter based global features (depending on the overall layout and scene structure but invariant to object details) from each image. Initially, in the feature extraction process, a 2D-CG filter is applied to pre-processed images producing Gabor images in terms of spatial frequency and orientation. Guided by Gabor uncertainty principle, we choose the resolution of spatial frequencies and orientations; at each of these coordinates energy and entropy features are computed from image's real and imaginary components. We investigate the viability of using the global features with a SVM classifier for basic scene categorization. Next, after applying the PCA based dimensionality reduction, a two-class SVM discriminant function based on quadratic kernel is applied to the Gabor features for image classification. By using 10-fold cross validation, we obtained a classification accuracy of 95.56% and kappa accuracy of 0.9139.

21

Paper 22

Visualization of high dimensional image features for classification

Marissa Dotter, Point Loma Nazarene University; Katie Rainey*, SPAWAR Systems Center Pacific; Donald Waagen, AMRDEC WDI

In image classification tasks, the image is rarely represented as only a collection of raw pixels. Myriad alternative representations, from Gaussian kernels to bags-of-words to layers of a convolutional neural network, have been proposed both to decrease the dimensionality of the task and, more importantly, to move into a space which better facilitates classification. This work explores several methods for visualizing high-dimensional image representations in two-dimensional space. These visualizations aid in understanding what classifiers might be better suited to a given data set. We demonstrate these visualizations on several image data sets and compare the results with accuracies obtained from classifiers.



Paper 23

Exploiting the Underlying Cepstral Coefficients for Large Scale and Fine-Tuned EKG Time-Imagery Analysis, including R-R, P-R, R-T, and, R-PVC interval imaging.

James LaRue*, Jadco Signals; Richard Tutwiler, Professor Emeritus Applied Research Lab Penn State University/CEO LiveMotion3D LLC; Dennison LaRue, Clemson University Biomedical Engineering Student

This paper presents a three part expansion for feature extraction of the electrocardiogram (EKG) based on a deeper exploitation at what cepstral processing can reveal, specifically in two distinct forms of time-varying-cepstral-power images. These two images have been reviewed for readability by qualified EKG Technicians and have received positive reviews in light of their perception of a tremendous gain in large-scale, yet fine-tuned, EKG strip interpretation ability. Our demonstration will give a brief review of EKGs in the time domain before moving to the image domain where we will emphasize the pattern recognition aspects of our approach. Part ONE of this paper will define, and give real biological applications for, the cepstrum, and introduce our framing style for visualization. Part TWO of this paper will use Record 105 from the Physionet MIT-BIH (mitdb) data base to demonstrate this approach as an image of EKG cepstral-feature terrain in the context of R-R, P-R, R-T, as well as R-PVC intervals since this record contains pre-mature ventricular contractions (PVCs), in addition to normal sinus rhythm (NSR). Part THREE of this paper will apply what we have learned from the first two parts to a variation of recurrence plots. In this instance, the recurrence plot itself is embedded within a correlation matrix formed from a cepstral domain image. In this way we will point to our next stages of pattern recognition which involve yet another layer of exploitation of (dynamic) sub-matrices through two-dimensional segmentation and extraction which, when placed back in context of time domain, may further reveal clues to understanding the complex and seemingly acyclical nature of Heart Rate reset.

22

Paper 24

Registration of serial-section electron microscopy image sets for large volume neural circuit reconstruction

Arthur Wetzel*, Pittsburgh Supercomputing Center

The recent convergence of improved tissue preparation and sectioning methods along with advances in serial-section electron microscopy (EM) enables production imaging of sample volumes up to cubic millimetre scales. This size is sufficient for full brain studies of insects and larval fish or for localized analysis in mamalian brain circuits consisting of more than 100,000 neurons. The resulting EM image sets, consisting of many terabytes spanning thousands of sections, must be precisely registered into anatomically consistent volumes to permit accurate tracing of thin but long range neural pathways. Even with the best data acquisition both the registration and analysis computations must deal with wide variations of image content, distortions and a variety of preparation and other artifacts. This paper describes the registration of a full brain larval zebrafish dataset acquired in Dr. Florian Engert's laboratory at Harvard University. The full 16,000 section 8 TB dataset, aquired over several months, was transmitted to Pittsburgh for alignment using a new Signal Whitening Fourier Transform Image Registration method (SWiFT-IR) under development at the Pittsburgh Supercomputing Center. Unlike other methods commonly used for EM connectomics registration SWiFT-IR adjusts its spatial frequency response during the image matching steps to maximize a signal-to-noise measure that is the main indicator of alignment quality. This alignment signal is more robust to variations in biological content and typical data distortions than either phase only or standard Pearson correlations allowing more precise alignment and statistical confidence. These improvements in turn enables an iterative registration procedure based on projections through multiple sections rather than more typical adjacent pair matchings. The projection model approach iteratively drives the alignment into a smooth form with no separate spring tensioning or other smoothing steps. The presention will show the resulting high-quality deformable registration over the entire zebrafish dataset at all scales from full anatomy down to resolution limited detail. We describe the SWiFT-IR processing steps along with potential failure modes, the remaining role of human in the loop oversight, and techniques for handling difficult areas and future enhancements.



Paper 26

Scaling Analytics for Scientific Images from Experimental Instruments

Daniela Ushizima*, Lawrence Berkeley National Laboratory; Alexander Hexemer, Lawrence Berkeley National Laboratory; Dilworth Parkinson, Lawrence Berkeley National Laboratory; Chao Yang, Lawrence Berkeley National Laboratory

Recent estimates for daily data generation are around 2.5 quintillion (10^18) bytes. As an example, the instrument upgrades at the Department of Energy (DOE) facilities, including accelerator and detector technology, are yielding an unprecedented increase in the quantity and complexity of raw data, especially in multi-domain experimental facilities, such as synchrotrons or neutron sources. The

23 variety of equipment enables thousands of researchers to pursue investigations across a wide range of areas, from archeology and biology to physics and nano-science. While the science is diverse, the underlying image analyses required to measure fundamental quantities about the experiments, follow a relatively smaller set of patterns. These patterns or motifs consist of statistical schemes and data formats, which when incorporated to emerging machine learning (ML) algorithms, can leverage large repositories of curated data, and enable discovery at scientifically relevant solution spaces. Our presentation will describe computational motifs using ML algorithms, and their application to image- centric data, including numerical schemes for automated characterization of materials components. Our use-cases include records of high-resolution data, coming from scientific experiments, particularly those reliant on advanced instruments and simulations. We will address our three main research and development activities: 1) Data sources: by coupling Observational, Simulation, and User-interaction data, we are able to retrieve more relevant outcomes, even when using vast amounts of multimodal experimental records; 2) Software tools: we have deployed ML algorithms using deep learning, e.g., convolutional neural networks for conventional and heterogeneous architectures, such as the low-power consumption IBM True North neuromorphic chip; 3) Use- cases: we have analyzed large datasets of x-ray scattering data (HipGISAXS), running on massively parallel machines, and, more generally, analysis of image-based experiments for quantifying material composition and structures (e.g. QuantCT/IDEAL), exploring graph-based methods. In addition, we will show how we have extended algorithms from materials science applications to human cell inspection and classification (e.g. SPVD), including the development of searchable databases, and exploratory tasks in remote collaborations using Azure. Finally, we will discuss how these images across domains can be processed using similar tools and/or motifs, and how higher levels of concurrency promoted by new computing systems may provide online feedback to steer experiments and collect more information from data.



Paper 27

Computer Aided Diagnostic System for Automatic Detection of Brain Tumor through MRI Using Clustering Based Segmentation Technique and SVM Classifier

Asim Ali Khan*, Sant Longowal Institute of Engineering And Technology

Due to the acquisition of huge amount of brain tumor magnetic resonance images (MRI) in the clinics, it is very difficult for the radiologists to manually interpret and segment these images within a reasonable span of time. Computer-aided diagnosis (CAD) systems increase the diagnostic abilities of radiologists and reduce the elapsed time for perfect diagnosis. An intelligent computer-aided technique is proposed in this paper for automatic detection of brain tumor from MR images. The proposed technique uses following computational methods; the K-means clustering for segmentation of brain tumor from other brain parts, extraction of features from this segmented brain tumor portion using gray level co-occurrence Matrices (GLCM), and the support vector machine (SVM) to classify input MRI images into normal and abnormal. The whole work is carried out on 64 images consisting of 22 normal and 42 images having brain tumor (benign and malignant). The overall classification accuracy using this method is found to be 99.28% which is significantly good.

24

Paper 29

Interactive Image Segmentation on Cloud Based User Friendly Visualization Tool

Urmila Sampathkumar, University of Missouri-Columbia; Sachin Meena, University of Missouri- Columbia; V. B. Surya Prasath*, University of Missouri-Columbia; Kannappan Palaniappan, University of Missouri

User driven interactive image segmentation provides a flexible framework for obtaining supervised object detection in digital images. We have recently proposed a novel user seed points based interactive segmentation approach with spline interpolation. In this work, we combine interactive segmentation method with a web based tool for visualization, segmentation, and tracking tool called Firefly. It is based on a Client-Server model written in Flex and PHP which uses MySQL database for storing the annotations. Firefly provides an automatic analysis of segmentation results by executing MATLAB scripts in the background. We provide user cases in segmenting natural, and biomedical imagery which requires minimal effort and no storage requirement on the user side.



Paper 30

Multi-Scale Transformation Based Image Super-Resolution

Soundararajan Ezekiel*, IUP

Super-resolution imaging is a technique that can be used to construct high-resolution imagery from low-resolution images. Low resolution images are often of the same scene, but contain aliasing and different sub-pixel shifts, which when combined can increase high frequency components while removing blurring and artifacting. Super-resolution reconstruction techniques include methods such as the Nonuniform Interpolation Approach (NIA) or the Frequency Domain Approach (FDA). The NIA and FDA methods make use of aliasing in low-resolution images as well as the shifting property of the Fourier transform. Problems arise with both approaches, such as use of blurred images for creating non-optimal reconstructions. Many methods of super-resolution imaging use the Fourier transformation or wavelets as transformations, but other multi-scale transformations such as contourlet or bandelet have not been well investigated for super resolution. In this paper, we propose a multi-scale based super resolution technique for use in generating higher resolution imagery. The proposed algorithm makes use of multi-scale techniques as an alternative to the Fourier transformation or wavelets by performing a super-resolution reconstruction on each sub- band level. We evaluate the performance and validity of our algorithm using several metrics, including Spearman Rank Order Correlation Coefficient (SROCC), Pearson’s Linear Correlation Coefficient (PLCC), Structural Similarity Index Metric (SSIM), Root Mean Square Error (RMSE), and Peak-Signal-Noise Ratio (PSNR). Initial results are promising, which indicates that multi-scale methods produce a more robust high resolution image when compared to traditional methods.

25

Paper 33

Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features

JT Turner*, Knexus Research Corporation; Kalyan Gupta, Knexus Research Corp; Brendan Morris, UNLV; David Aha, Navy Center for Applied Research in AI, Naval Research Laboratory

Although recent advances in regional Convolutional Neural Networks (CNNs) enable them to outperform conventional techniques on standard object detection and classification tasks, their response time is still slow for real-time performance. To address this issue, we propose a method for region proposal as an alternative to selective search, which is used in current state-of-the art object detection algorithms. We evaluate our Keypoint Density-based Region Proposal (KDRP) approach and show that it speeds up detection and classification on fine-grained tasks by 100% versus the existing selective search region proposal technique without compromising classification accuracy. KDRP makes the application of CNNs to real-time detection and classification feasible.



Paper 35

General purpose object counting algorithm using cascaded classifier and learning

Ricardo Fonseca*, Universidad Tecnica Federico Santa Maria; Werner Creixel, Universidad Técnica Federico Santa María

Image nalysis is essential through a wide range of scientific areas and most of them have one task in common, i.e. object identification and counting. In general this is tedious and it’s necessary to focus the scientists’ attention to a higher detail task. Thus automated counting algorithms had generated a lot of interest. This proposal identifies objects with similar features on a frame. The inputs are the image where to look at, and a single appearance of the object we are looking for. The object is searched by a sliding window of various sizes. A positive detection is given by a cascaded classifier that compares the windows against the object model using pixel comparisons. Objects model is a collection of templates which are generated from scales and rotations of the first appearance, and new ones are added by a learning stage. We use an online learning algorithm responsible for improving detection performance. This is achieved by updating the object model with new object appearances. The counting object algorithm is intended for a general purpose, so the detection depends only on a threshold value fixed by the user or by the application. This algorithm is capable to handle change in scale, in plane rotation, illumination, partial occlusion and background clutter. This approach could be integrated on different applications because the detection of all object occurrences on an image is a first step to evaluate an object properties, e.g. quality tests and failures detection. The proposed framework was tested on high cluttered background aerial image, for identifying, geo locating and counting plants. Excellent results were achieved, suggesting this is a powerful tool for remote sensing image analysis and has potential applications for a wide range of sciences which require image analysis.

26

Paper 36

Morphological Procedure for Mammogram Enhancement and Registration

Huda Alghaib*, Utah Valley University

Screening mammography often incorporates a computer aided diagnosis (CAD) scheme in its procedure to increase the detection rates of gradual changes in breast tissues. One method for detecting gradual changes in temporal mammograms is through the use of registration algorithms. Images of degraded resolution present an obstacle to accurate registration, however. The performance of registration algorithms and, hence, the performance of the CADs, are directly proportional to the quality of the input mammograms. In this paper, we present a novel approach for increasing the quality of the mammograms based on morphological operations and contrast enhancement prior to performing the registration. The registration algorithm is applied using a structural similarity (SSIM) index. Based on the radiologist’s evaluation, this work provided accurate results that were more helpful in detecting changes over time in the registered mammogram pair.



Paper 37

Interactive Spline Based Segmentation for Refining Globally Convex Active Contours

Sachin Meena, University of Missouri-Columbia; V. B. Surya Prasath*, University of Missouri- Columbia; Kannappan Palaniappan, University of Missouri

We study a globally convex active contours and elastic body spline (EBS) based interactive segmentation method. Automatic active contour methods often fail to accurately obtain object boundaries. We remedy this by incorporating sparse user defined foreground-background seed points within the EBS interactive segmentation approach. By using a globally convex formulation of the active contours we obtain computationally efficient implementation using the split Bregman algorithm. Combined with interactive user feedback we get efficient segmentation boundaries of objects with high precision. Experimental results on natural images, and polyps segmentation from capsule endoscopy show that we obtain better results than automatic and global segmentation approaches.

27

Paper 38

Computational Spectral Capture and Display

Hao Zhang, UC-Irvine; Yuqi Li, UC-Irvine; Gopi Meenakshisundaram, UC-Irvine; Aditi Majumder*, UC-Irvine

Spectroscopy, with the current 2D sensors, has been the study of the 4-dimension function F(x,y,t,l) that defines the intensity of light at the sensor location (x,y) at time t and at wavelength l. The goal of video rate spectroscopy is to capture, process, and display this function at reasonable discretized resolutions of each of the parameters x, y, t and l. Spectral imaging till date has been based on three core technologies: filter based, dispersion based or interferometric based. Filter and dispersion based spectral imaging is more general in nature since they try to recover the entire spectrum as opposed to interferometric spectral imaging that targets and captures a specific signature band. An efficient spectral imager can make interferometric spectral imaging redundant. Current spectral imagers use narrow, channel-independent, spectral band primaries that have the fundamental advantage of simple and accurate spectral reconstruction via linear combination of independent spectral channels. However, general spectral imaging today suffers from two fundamental bottlenecks precisely because of the use of narrow band primaries -- (a) light inefficiency; (b) inadequate spatial and/or temporal resolution. So, it cannot be used in darker or dynamic environments. Further, in terms of display, use of narrow band primaries may create large color gamut but not a spectrally accurate display. In other words, although a spectrally accurate display would alleviate the problems of color mismatches, observer metamerism, and can be used to create controlled spectral illumination in many applications, unfortunately such a display is non-existent today. Although narrow band primaries enable capture and reconstruction of high frequency spectral data, most natural or manmade objects have relatively smooth spectra, in visible, NIR and SWIR ranges, This brings into question the conventional wisdom of light inefficient narrow band filters that deter capturing moving objects. We explore the design, limitations and capabilities of capture and display algorithms and device using a far fewer number of inter-dependent wide-band primaries. Wide band primaries inherently provide higher light efficiency and can reduce the required number of primaries thereby addressing bottlenecks of light throughput, spatial and temporal resolution. This research aims presents new concepts, techniques and tools in computational dispersion free wide-band spectroscopy which includes (a) selection of inter-dependent primaries to assure maximal information capture; (b) spectral reconstruction and reproduction methods to assure maximal accuracy and quality in capture and illumination respectively; (c) reconstruction of different scene parameters (e.g. surface reflectance) using spectral illumination and capture in tandem; and (d) evaluating the performance and accuracy of this new spectroscopic imaging pipeline with respect to the traditional one that uses narrow band primaries. We show that such a spectroscopic pipeline can result in video-rate capture, reconstruction and display of spectral data at high spatial resolution. Spectral data has many applications like detection of oil/mineral; historical manuscript imaging; geographical map creation; monitoring health of crops, poultry and meat; sorting good from the bad in food processing and pharmaceutical industry; surveillance against chemical hazards; astronomical imaging; and monitoring unique marine ecosystems, climate and harmful emissions. Current spectral imaging is a niche expensive ($50,000) technology that operates at a low spatial resolution (0.25 Megapixel), a low temporal resolution (a few minutes per frame) and a low dynamic range (1:1000). The proposed video rate light efficient spectral sensor can have a tremendous impact in all of the aforementioned areas. A commodity spectrally accurate illumination working in tandem with such a spectral sensor can open up several new ways for scene parameter reconstruction (e.g.

28 surface reflectance). Finally, a light spectral sensor can augment existing sensor networks enabling fusion of spectral data with regular 3D data for detection and high resolution monitoring of amorphous objects (e.g. liquids, soot, fire) in a scene contributing tremendously in the domain to scene acquisition and understanding.



Paper 39

Multi-feature fusion and PCA based approach for efficient human detection

Hussin Ragb*, University of Dayton; Vijayan Asari, University of Dayton

Detection of human beings in a complex background environment is a great challenge in the area of computer vision. For such a difficult task, most of the time no single feature algorithm is rich enough to capture all the relevant information available in the image. To improve the detection accuracy we propose a new descriptor that fuses the local phase information, image gradient, and texture features as a single descriptor and is denoted as fused phase, gradient and texture features (FPGT). The gradient and the phase congruency concepts are used to capture the shape features, and a center-symmetric local binary pattern (CSLBP) approach is used to capture the texture of the image. The fusing of these complementary features yields the ability to localize a broad range of the human structural information and different appearance details which allow to more robust and better detection performance. The proposed descriptor is formed by computing the phase congruency, the gradient, and the CSLBP value of each pixel with respect to its neighborhood. The histogram of oriented phase and histogram of oriented gradient, in addition to CSLBP histogram are extracted for each local region. These histograms are concatenated to construct the FPGT descriptor. Principal components analysis (PCA) is performed to reduce the dimensionality of the resultant features. Several experiments were conducted to evaluate the detection performance of the proposed descriptor. A support vector machine (SVM) classifier is used in these experiments to classify the FPGT features. The results show that the proposed algorithm has better detection performance in comparison with the state of art feature extraction methodologies.



Paper 40

Facial Expression Recognition Based on Video

Xin Song*, Department of Computer Science and Technology, School of Computer and Communication Engineering,University of Science and Technology Beijing; Hong Bao, Department of Computer Science and Technology, School of Computer and Communication Engineering,University of Science and Technology Beijing

In recent years, video timing characteristics of facial expression recognition research has become a hot topic. In this article, through analyzing changes in the feature data in real-time detection of facial expression, and combining temporal and spatial features to establish models, we put forward a more

29

rapid and more efficient method to recognize facial expressions. Firstly, the feature extraction method based on Bezier curve is adopted to extract the 2D feature points of each frame in the video stream. Then, we get the changes of temporal characteristic curve through connecting the feature points of each frame image along time slice. Finally we use nonlinear function to fit the changes of the temporal characteristics curve and we establish models for each expression to classify and recognize. The results show that the method which combines the temporal and spatial changes in facial expression recognition, has the advantages of high speed and recognition rate.



Paper 41

Learning matte extraction in green screen images with classifiers and the back-propagation algorithm

Roberto Rosas-Romero*, UDLAP

Alpha matte extraction in green screen images requires user interaction to tune parameters in different pre-processing and post-processing stages to refine mattes. Various matting methods have been proposed to solve the matting problem. This paper considers the problem of automatically extracting a foreground element with its alpha matte (also known as pulling a matte) in green screen images by training a multi-layer perceptron (MLP) with the back-propagation algorithm. The classifier learns to identify green backgrounds, foreground object contours, and the corresponding alpha values for subsequent digital compositing. We developed our own dataset to train and test the MLP for alpha matte extraction. To speed up the generation of the training set, a second method for semi-automatic alpha matte extraction is proposed. Different experiments show that automatic matte extraction, based on the MLP, generates high-quality matting visually and it is also shown that results depend on the training and the architecture of the classifier. The proposed approach effectively handles alpha matte extraction under unsuitable conditions such as unevenly lit green backgrounds. To the best of our knowledge, this is the first effort that applies neural networks to the problem of alpha matte extraction.



Paper 44

Adding A Selective Multi Scale and Color Feature to LoFT Base Line

Noor Al-Shakarji*, University of Missouri Columbia; Filiz Bunyak, University of Missouri-Columbia; Kannappan Palaniappan, University of Missouri

Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the diversity of videos that applied to tracking algorithms, tracking object properly with a robust approach is crucial. Because of the fast changing in object appearance and illumi- nation in most videos, many of tracking algorithms fail to handle color information and scale estimation of the object. In this paper, a new feature, color information and a selective scale estimation for multiscale

30 hessian edge detector, are added to LoFT baseline to boost the tracker fusion step to detect object more accurate. Experiments are performed on VOT 2015 benchmark sequences with signifcant color and scale variations. The results show that the proposed approach significantly improves the performance by 6.06% in accuracy and 23.2% in robustness compared to LoFT baseline.



Paper 45

Auto-Calibration of Multi-Projector Systems on Arbitrary Shapes

Mahdi Abbaspour Tehrani, UC-Irvine; Aditi Majumder*, UC-Irvine; Gopi Meenakshisundaram, UC-Irvine

In this paper we present a new distributed technique to geometrically calibrate multiple casually aligned projectors on a fiducial free arbitrary surface using multiple casually aligned cameras where every point of the surface is seen by at least one camera. Using a multi-step method that uses structured light patterns, we estimate robustly the display's 3D surface geometry, the cameras' extrinsic parameters, and the intrinsic and extrinsic parameters of the multiple projectors. The method uses a completely distributed approach in a loosely coupled, casually aligned general system relying on cross-correlation and cross-validation of the device parameters and surface geometry yielding an accurate estimate of all the device parameters and the display shape. Auto-calibration of the projectors allow us to quickly recalibrate the system in the face of projector movements, by recalibrating only the moved projector and not the entire system. To the best of our knowledge, this is the first work that can achieve accurate geometric calibration of multiple projectors on an arbitrary surface without the constraint that every point on the surface is observed by at least two or more cameras. Thus, our work can enable easy deployment of large scale augmented reality environments playing a fundamental role in increasing their popularity in several applications like geospatial analysis, architectural lighting, cultural heritage restoration, theatrical lighting, training, simulation and visualization.



Paper 47

Crossview Convolutional Networks

Nathan Jacobs*, University of Kentucky; Scott Workman, University of Kentucky; Menghua Zhai, University of Kentucky

Billions of geo-tagged ground-level images are available via social networks and Google StreetView. Recent work in Computer Vision has explored how these images could serve as a resource for understanding our world. However, most ground-level images are captured in cities and around famous landmarks; there are still very large geographic regions with few images. This leads to artifacts when estimating geospatial distributions. We propose to leverage satellite imagery, which has dense spatial coverage and increasingly high temporal frequency, to address this problem. This

31 talk will introduce Crossview ConvNets (CCNNs), convolutional neural networks which fuse satellite and ground-level images to enable more accurate estimates than could be possible by using either modality in isolation.



Paper 48

Dynamic Eye Misalignment Retroversion System

Constantinos Glynos*, National Center of Computer Animation, Bournemouth University; Hammadi Nait-charif, University of Bournemouth

Strabismus is a medical term used to define the eye misalignment conditions that prevent both eyes from focusing on the same target simultaneously. This is a disability that prevents the brain from successfully fusing the visual stimuli gathered from the left and right eyes, while also prohibiting the perception of depth. The purpose of going through treatment is to realign the "bad" (strabismic) eye, so that it fixates on the same target as the "good" (correct) eye. According to specialists in the field of strabismus, the current medical treatment methods, such as surgery and vision therapy, have a low rate of treatment success. The most important constrains are the lack of diagnostic information and the inability of the eyes to converge on a non straight ahead target. This paper presents a novel system (Dynamic Eye Misalignment Retroversion System, DEMRS) that can dynamically adjust the light rays coming into each eye in order to stimulate both foveae, with the correct target, simultaneously. This system is to be used in diagnosis and treatment for children with strabismus under the age of eight and for patients of any age suffering from Convergence Insufficiency (CI) or constant diplopia. In order to test the DEMRS, we have created a simulated implementation of the system using a set of strabismus cases for a known target. The location of the fixation point ranges from 0.3m to 6.0m away in different directions, with the disparity distance assigned to a constant value of 6cm. At this stage, the system can re-align the bifoveal light rays to the correct target successfully.



Paper 49

Sparse Multi-Image Control for Remotely Sensed Planetary Imagery: The Apollo 15 Metric Camera

Jason Laura*, United State Geological Survey

Accurate control and registration of remotely sensed, non-Earth targeting, planetary imagery is both critical for science applications and challenging due to the lack of any or sufficiently accurate apriori positioning information. Applications of controlled imagery products include data mosaicing for the creation of base maps for scientific investigation, image registration as a precursor to dense stereo reconstruction to support missions such as landing site selection, and cross instrument data fusion to support accuracy assessment and validation. Within the planetary science community, the

32

currently leveraged control processes are human time intensive, focusing on the pseudo-automated identification of image correspondences, the application of area-based matching techniques that assume an accurate sensor model, and finally block bundle adjustment. Significant challenges arise during this process due to complex viewing geometries, lighting conditions, and observation orbits that result in significant disparities in scale. This paper describes initial work in automated control of nadir captured, Apollo 15 Metric Camera data across multiple orbits of the Moon. We draw heavily from the computer vision literature for the identification of image correspondence information, focusing on robust feature detection (SIFT), outlier detection (Maximum Likelihood Estimator of the fundamental matrix), and sub-pixel accuracy (Ciratefi algorithm). A graph-based data adjacency method is presented that supports a reduction in bulk data processing by leveraging available apriori information and facilitates the identification of identical correspondences through multiple images. Finally, explore the processing of the derived correspondence network in our block bundle adjustment software and propose potential iterative solutions for an integrated matching-bundle- matching approach ripe for deeper automation. Although this paper reports on analyses with Lunar imagery, the methodology may be extendable to similar imaging challenges for near-Earth objects of future economic interest (e.g., asteroids), as well as some Earth-based landscapes.



Paper 51

Face Detection and Recognition based on Skin Segmentation and CNN

Dr. Nischal Verma, IIT Kanpur; Akhilesh Raj*, Indian Institute of Technology Kanpur; Soumay Gupta, Indian Institute of Technology Roorkee

Face Detection and Recognition is an active research area in surveillance, biometric security and computer vision community. Several methods have been developed for face segmentation and recognition. Many of these are computationally time taking and expensive. This paper proposes another methodology for the same. For face detection, we use a conjunction of RGB and YCbCr color space bounds to segment skin pixels in the image. After noise removal, facial regions are identified by applying the knowledge of general geometric properties of human faces and a Haar cascade for eyes. The method is tested on CMU database, and is found to give accurate results with high detection rates In this paper, we try to recognize faces by feeding region proposals for faces into a Convolutional Neural Network for feature extraction and finally into a Softmax Classifier for classification.

33

Paper 52

Hierarchical Autoassociative Polynomial Network (HAP Net) for Robust Face Recognition

Theus Aspiras*, University of Dayton; Vijayan Asari, University of Dayton

Neural network frameworks have been developed substantially to improve the performance in the classification of faces. Networks like convolutional neural networks (CNN) and extreme learning machine are found to be the best performing networks as found in many research articles for facial recognition and detection. Facial recognition methodologies require nonlinear complexity of classification with robustness against varying lighting, pose, and expression. Our new architecture, the hierarchical autoassociative polynomial network (HAP Net), can be applied to these different neural network frameworks to improve their effectiveness in facial recognition tasks. The network utilizes polynomial weighting and modularity concepts to strengthen the nonlinear capabilities of the neural network. The architecture also encompasses flexibility to be applied to various neural network architectures. The network has been applied to a simple feed-forward neural network for facial recognition and found to have low test error across many databases. Experiments have been performed on UMIST, CMU AMP, and Essex Grimace databases to show the effectiveness of the proposed approach.



Paper 53

Uncertainty of Image Segmentation Algorithms for Forensic Attribution of Materials

Diane Oyen*, Los Alamos National lab

The forensic analysis of nuclear materials relies on the characterization of irregular shapes in microscopy images with the overall goal to attribute the manufacturing process or location of origin of a given sample of material. As in all forensic problems, it is necessary to have a statistically reliable and robust method for quantifying uncertainty. The use of machine learning or artificially intelligent image processing algorithms, such as image segmentation and texture classification, has been limited in forensic analysis because of the difficulty of quantifying uncertainty. Yet, as the amount of image data grows rapidly, machine learning algorithms will increasingly play a critical role in helping forensic analysts to reliably characterize materials in a timely manner. We investigate the uncertainty of image segmentation and texture classification algorithms under varying microscope settings with respect to estimates of particle size, shape and texture of the original sample. In standard forensic analysis, a sample is imaged once using the best microscope settings as determined by technician expertise, but that practice does not offer any way to estimate uncertainty. To quantify uncertainty, we collect multiple images under varying microscopy settings of each of several samples of minerals. Using this data set, we measure the uncertainty of image segmentation tools and texture feature extractors, then quantify forensic features of interest and investigate the variability of results to develop statistically meaningful estimates of uncertainty due to the algorithm. This study contributes an important first step in quantifying uncertainty in machine learning image analysis to provide overall uncertainty quantification of forensic signatures for the attribution of materials.

34

Paper 54

Boosted Ringlet Features for Robust Object Tracking

Evan Krieger*, University of Dayton; Almabrok Essa, University of Dayton; Sidike Paheding, University of Dayton; Theus Aspiras, University of Dayton; Vijayan Asari, University of Dayton

Accurate and efficient object tracking is an important aspect of various security and surveillance applications. In object tracking solutions which utilize intensity-based histogram feature methods for use on wide area motion imagery (WAMI), there currently exists tracking challenges due to object structural information distortions and pavement/background variations. The inclusion of structural target information including edge features in addition to the intensity features will allow for more robust object tracking. To achieve this we propose a feature extraction method that utilizes the Frei- Chen edge detector and Gaussian ringlet feature mapping. Frei-Chen edge detector extracts edge, line, and mean features that can be used to represent the structural features of the target. Gaussian ringlet feature mapping is used to obtain rotational invariant features that are robust to target and viewpoint rotation. These aspects are combined to create an efficient and robust tracking scheme. The proposed scheme is evaluated against state-of-the-art feature tracking methods using both temporal and spatial robustness metrics. The evaluations yield more accurate results for the proposed method on challenging WAMI sequences.



Paper 55

Vehicle-Pedestrian Dynamic Interaction through Tractography of Relative Movements and Articulated Pedestrian Pose Estimation

Rifat Mueid, Indiana University-Purdue University Indianapolis; Lauren Christopher*, Indiana University-Purdue University Indianapolis; Renran Tian, Indiana University-Purdue University Indianapolis

To design robust Pre-Collision Systems (PCS) we must develop new techniques that will allow a better understanding of the vehicle-pedestrian dynamic relationship, and which can predict pedestrian future movements. We have analyzed videos from the TASI (Transportation Active Safety Institute) 110-Car naturalistic driving dataset to extract two dynamic pedestrian semantic features. The dataset consists of videos recorded with forward facing cameras from 110 cars over a year in all weather and illumination conditions. This paper focuses on the potential-conflict situations where a collision may happen if no avoidance action is taken from driver or pedestrian. We have used 1000 such 15-second videos to find vehicle-pedestrian relative dynamic trajectories and pose of pedestrians. Adaptive structural local appearance model and particle filter methods have been implemented to track the pedestrians. Along with that, computer vision techniques are integrated with 2D to 3D projection geometry, which take advantage of a new automatically detected Focus of Expansion (FoE), and these are used to compute a time series of relative horizontal and lateral vehicle-pedestrian distances. We have obtained accurate tractography results for over 82% of the videos. For pose estimation, we have used flexible mixture model, based on deep learning, for capturing co-occurrence between pedestrian body segments. Based on existing

35

single-frame human pose estimation model, we have implemented Kalman filtering with other new techniques to make stable stick-figure videos of the pedestrian dynamic motion. Using this, we have been able to reduce average frame to frame pixel offsets by 86% compared with the previous single frame model. This dramatically improves the smoothness of the walking figure models. The paper enables new analysis on potential-conflict pedestrian cases with 2D tractography data and stick- figure pose representation of pedestrians, which provides significant insight on the vehicle- pedestrian dynamics that are critical for safe autonomous driving and transportation safety innovations.



Paper 56

Transfer learning for high resolution aerial image classification

Yilong Liang*, Rochester Institute of Technology; Sildomar Monteiro, Rochester Institute of Technology; Eli Saber, Rochester Institute of Technology

With developments in satellite and sensor technologies, high spatial resolution aerial images have become available. Classification of these images becomes an important step for many image understanding tasks, such as image retrieval and object detection. For example, in surveillance systems, image classification would help analysts focus on regions of interest in the images. In the computer vision community, convolutional neural networks (CNN) have been proposed for image classification and have shown to achieve state-of-the-art performance on benchmark datasets. The great success of CNN is partly due to the availability of large-scale datasets, such as ImageNet. However, publicly available high-resolution aerial image datasets are relatively smaller compared to those available through ImageNet and, therefore, are not sufficient for training a deep neural network. In order to apply the CNN classification framework for aerial imagery, a possible approach is to use transfer learning, i.e., use a CNN model pre-trained on a larger dataset beforehand, instead of training a new CNN model. There are three major transfer learning approaches for applying CNN to aerial image classification: 1) train a CNN model with a pre-defined architecture from scratch, 2) use a pre-trained CNN as it is, and 3) fine tune the weights on some layers of a pre- trained CNN. In this paper, we apply these three aforementioned approaches to classify the aerial images using different architectures of pre-trained CNN models, including VGG, GoogLeNet, and AlexNet. In the first approach, we randomly initialize the weights of the three CNN architectures and train these weights from scratch using labeled data. In the second approach, we fix the weights using the three pre-trained CNNs and extract CNN features from the images. In the third approach, we retrain the weights of the fully connected layers of the neural network. Empirical results using AlexNet for the “Land Use Land Cover” dataset that consists of 21 classes of images of size 256×256 pixels demonstrate that the pre-trained CNN models perform better than conventional image classification using features such as bag of words (BoW). Additionally, removing dropout layers from pre-trained CNNs improves the performance. For example, the classification accuracy achieves 80% when BoW feature is employed and it increases to 90% (with dropout) and 93% (without dropout) when the AlexNet is applied as a feature extractor. We plan to augment the image dataset with different operations on the original images to see if we can further improve the classification performance. These operations are invariant to image recognition process, such as rotation and noise induction.

36

Paper 57

On the utility of temporal data cubes for analyzing land dynamics

Peter Doucette*, USGS

The analytic power of temporal data cubes (i.e., image data time series) is widely recognized for numerous remote sensing applications. Furthermore, the proliferation of free and open, global-scale image data sets at moderate spatial resolutions (e.g., Landsat, MODIS, Sentinel, ASTER) has spurred considerable innovation with data cube applications in recent years. As the study of interconnected Earth processes has expanded over larger geographic regions and longer time periods (e.g., land and climate change interactions), conventional (desktop) data cube processing becomes computationally impractical. Hence the movement into big data processing architectures, particularly with the advent of cost-effective cloud storage and computing. In regards to data cube structure, key considerations include the “analysis-readiness” level of the data layers, as well as content scalability, and output options to the user. In this paper, we provide perspectives on the utility of data cube implementations to facilitate monitoring and analysis of land change processes, and recommend strategies to address broad community interests. The central issues considered include-- data calibration and validation; data provenance and curation; change monitoring and analysis methods; assessing user needs and data interface experience; platform scalability and product options; uncertainty propagation and visualization; capacity to enable predictive analytics for decision making; and peer review as a means to ensure science integrity and adherence to user needs.



Paper 58

Video haze removal and Poisson blending based mini-mosaics for wide area motion imagery

Rumana Aktar, University of Missouri-Columbia; Urmila Sampathkumar, University of Missouri- Columbia; V. B. Surya Prasath*, University of Missouri-Columbia; Hadi Aliakbarpour, University of Missouri-Columbia; Kannappan Palaniappan, University of Missouri

In this work we consider haze removal from videos using the dark channel prior approach with applications wide area motion imagery (WAMI). Most of the haze-free non-sky outdoor images contain at least one color channel with average intensity close to zero; this channel is called dark channel and provides a good estimate of the transmission of haze which is used to remove haze from image. By utilizing neighborhood consistency in temporal direction of videos we apply haze removal algorithm sequentially. Once the haze-free images are produced, we register them in order to generate mini-mosaics and apply poison blending to remove seam effect from the mini-mosaics. Experimental results on synthetic image sequences and WAMI indicate we obtain good haze-free and mosaiced results.

37

Paper 59

STEAD: Structure Tensor Edge Detector with Anisotropic Diffusion

V. B. Surya Prasath*, University of Missouri-Columbia; Ke Gao, University of Missouri-Columbia; Kannappan Palaniappan, University of Missouri; Guna Seetharaman, US Naval Research Laboratory

Structure tensor is widely used for feature detection in the image and video processing literature. Structure tensor eigenvalues encode local image properties and traditionally the tensor components are computed using Gaussian kernel filtering to make derivative computations robust to noise. In this work, we consider using the anisotropic diffusion filtering instead of isotropic Gaussian filtering for the integration scale to obtain robust edge detection results on natural images. Experimental results indicate we obtain better edge detection results on a variety image databases compared with traditional edge detectors. Our proposed edge detector performs better than others in terms of robustness to noise and accuracy of edge locations.



Paper 60

Hierarchical temporal memory for visual pattern recognition

Jianghao Shen*, George Washington University; Murray Loew, George Washington University

Visual pattern recognition has been an important aspect of artificial intelligence, as intelligence is closely related to understanding and interacting with the visual world. Traditional pattern recognition techniques, however, have several limitations. The first is invariant pattern recognition: where humans can recognize objects easily despite their variations (e.g., translation), it is difficult for the computer to classify them as the same thing. Second, in traditional techniques, the knowledge learned from one object cannot help in learning another new object, whereas humans can learn and express novel objects in terms of previously-seen patterns. This research uses and extends a new technique called Hierarchical Temporal Memory (HTM), a biologically-inspired approach developed by Numenta. Aimed at mimicking the way the human brain learns visual patterns, the HTM structure addresses the traditional limitations using two ideas. For invariant representation, HTM exploits time continuity to claim that two patterns, even when spatially dissimilar, originate from the same object if they come close in time. Also, the assumption that an object in the world is composed of a hierarchical structure of building blocks enables the HTM to learn and express a novel object in terms of the more basic learned building blocks. We demonstrate this by applying the HTM to learning a sequence of continuous translations of a car. It demonstrates good performance in recognizing the car at a variety of scales and translations and the subcomponents learned from the car are used to learn a new object (a house). Further, HTM exhibits faster learning and better generalization ability than traditional methods; specifically, HTM uses the basic building blocks learned from "cars" to learn "house", and achieves greater learning speed by constructing only new configurations of the building blocks. With those two advantages, HTM can be used also for anomaly detection. It will have a lower false-positive rate compared to the traditional template- matching method, since it will treat a novel translation of a learned object as anomalous, while the HTM, having learned invariant representations, correctly recognizes and rejects it as an anomaly.

38

Paper 61

A Multiple View Stereo Benchmark for Satellite Imagery

Marc Ruiz*, JHU; Zach Kurtz, JHU; Myron Brown, JHU

The availability of public multiple view stereo benchmark datasets throughout the last decade has been instrumental in enabling research to advance the state of the art in the field and to apply and customize methods to real-world problems such as autonomous driving. Until now, no public multiple view stereo benchmark dataset has been available for satellite imaging applications. In this work, we describe a public benchmark data set for multiple view stereo applied to 3D outdoor scene mapping using commercial satellite imagery. This data set includes fifty Digital Globe images of a 100 square kilometer area. We also provide high-resolution airborne lidar ground truth data for a 20 square kilometer subset of this area and performance analysis software to assess accuracy and completeness metrics. We report initial results from the top performing available solutions, discuss lessons learned, and encourage continued research by making this benchmark data available to the research community.



Paper 62

Deep Learning and Artificial Intelligence Completeness

Mihaela Quirk*, Quirk & Quirk Consulting

This talk describes and exemplifies strategies used in deep learning, and relates them to classes of artificial intelligence-complete problems. Metrics are proposed for the evaluation of reliability and accuracy of classifications. The talk concludes with a quest for classes of problems where deep learning serves reliably the communities that use deep learning.



Paper 63

Convolutional Neural Network Application to Plant Detection, Based on Synthetic Imagery

Larry Pearlstein*, The College of New Jersey; Mun Kim, The College of New Jersey

Deep convolutional neural networks (DCNN’s) have shown great value in approaching highly challenging problems in image classification and have even been applied to natural language processing for filtering large sets of textual data. Our involves the application of a DCNN to the relatively simple task of discriminating between a small number of plant species based on images. We have studied the effects of the choice of CNN hyper-parameters on accuracy and training convergence behavior. In order to generate a large labeled set of interesting data we have generated realistic synthetic imagery. Since our problem is somewhat constrained we were able to run

39 thousands of training experiments and do accurate estimation of the probability density function of the convergence rate. We also consider the possibility of using decision directed learning to evolve a network that exhibits extremely high accuracy on synthetic data to one that works well on unlabeled natural data.



Paper 64

Video Coding Layer Optimization For Target Detection

Larry Pearlstein*, The College of New Jersey; Alexander Aved, US Air Force Research Lab

A live-video database management system has been proposed for enhancing the effectiveness of video surveillance tasks. Currently analysts can become overloaded by unimportant information. The LVDBMS provides automated event detection, which can be used to provide alerts and call- outs, and thereby draw the analysts’ attention appropriately. One challenge in a video surveillance system is the data rate required to represent digital video. The raw data rate for a single stream of video is typically several Gbits/sec – far too high for carriage on most wireless links and even challenging for powerful cloud computing networks. Accordingly, the use of lossy video compression at a compression ratio of 100:1, or higher, is an essential part of any distributed live video system. Although very high compression ratios are frequently necessitated by system limitations, their use can introduce significant picture distortion. This distortion can interfere with a mission by confounding both human analysis and AED. Previous research has focused on joint optimization of the video coding layer where several streams share the same bandwidth. That work was intimately tied to the use of the traditional Group of Pictures (GOP) structure, and focused on the tailoring of bitrate and forward error correction for the specific case of detection of human subjects. Based on that framework the researchers concluded that more bits should be allocated to streams where more humans had recently been detected. The traditional GOP structure is not compatible with low-delay applications, such as on unmanned airborne vehicles. The current study involves the use of Gradual Decoder Refresh, rather than the traditional GOP, to enable low delay and similarly avoids the use of B frames, which necessitate frame reordering. We extend the previous work by providing the ROC curves for the detection of object motion, as a function of the quantization parameter. We also consider the H.265 video coding standard, in addition to H.264.



Paper 65 WITHDRAWN

A Novel Computer Aided Detection (CAD) System: assessment and comparative study

Waleed Yousef*, Helwan University

A novel Computer Aided Detection (CAD) system for detecting breast cancer has been designed using thousands of digital mammograms that have been collected from several local medical institutions. This system is assessed and compared to commercial systems approved by the Food

40 and Drug Administration (FDA) in terms of the Free Response Receiver Operating Characteristics (FROC). The system outperforms many versions and competes with the most accurate one. In addition, this article discusses the problem of missing a unified standard method of CAD assessment and comparison. Almost each CAD, either in literature or industry, uses its own method of assessment including its own database, FROC True Positive (TP) criterion, among others. Some of the FROC TP criteria are so naive to artificially exhibits a high accuracy. The lack of this ``unified'' method makes it very difficult to put the accuracy of two CADs side by side if not designed and assessed in the same setup or benchmark. As a forward step towards standardizing CAD assessment and comparison, a new assessment methodology is proposed for the special case of the so-called pixel-based (or featureless) CADs; i.e., those CADs classifying each pixel of the mammogram before giving a final decision.



Paper 66

Stream Implementation of the Flux Tensor Motion Flow Algorithm Using GStreamer and CUDA

Dardo Kleiner*, CIPS, Corp. @ NRL; Kannappan Palaniappan, University of Missouri; Guna Seetharaman, US Naval Research Lab

The flux tensor motion flow algorithm is a versatile computer vision technique for robustly detecting moving objects in cluttered scenes. The flux tensor represents the temporal variations of the optical flow field within the local 3D spatiotemporal volume. In this paper we present a prototype implementation of the flux tensor based on the GStreamer multimedia framework and leveraging the computational performance of NVIDIA CUDA to achieve real-time motion detection in ultra-high definition video streams (i.e. UHD 4K @ 60fps) on GPU systems. GStreamer is a pipeline-based software framework that links together a wide variety of media processing elements to complete complex workflows. CUDA is a parallel computing platform and application programming interface (API) model created by NVIDIA. We assert that the modular nature of the GStreamer framework a) represents the algorithm naturally and declaratively and b) allows easy incorporation of the algorithm into other video processing systems. Several GStreamer and CUDA concepts are explored, with an emphasis on those relevant to the flux tensor implementation. A thorough performance analysis is presented, with comparison to other implementations. We further discuss tradeoff decisions made within our implementation, and conclude with avenues of future work and improvements.

41

Paper 67

Compression induced image quality degradation in terms of NIIRS

Erik Blasch, AFRL; Hua-mei Chen, Intelligent Fusion Technology, Inc.; Zhonghai Wang, Intelligent Fusion Technology, Inc.; Genshe Chen%*, Intelligent Fusion Technology, Inc.

Image compression is an important component in modern imaging systems as the size of the raw data collected is getting larger. For example, a commercial Wide Area Motion Imagery (WAMI) image data size is over 144 Megapixels (12,000 × 12,000 pixels), and the next generation WAMI image data size will be at the level of 1.6 Giga-pixels (40,000 × 40,000 pixels). The large file size transmission is limiting without resorting to some compression. Lossless compression is able to preserve all the information, but has limited reduction power. On the other hand, lossy compression, which may result in very high compression ratio, suffers from information loss. In this paper, we model the compression induced information loss in terms of the Video National Imagery Interpretability Rating Scale (VNIIRS). VNIIRS is a subjective quantification of dynamic image collections (NIIRS for static images) widely adopted by the Geographic Information System (GIS) community. Specifically, we present the Compression Degradation Image Function Index (CoDIFI) framework that determines the NIIRS degradation, (i.e., a decrease of NIIRS rating scale) for a given compression setting. The CoDIFI-NIIRS framework enables a user to broker the maximum compression setting while maintaining a specified VNIIRS rating.



Paper 68

Building a Living Atlas in the Cloud to Analyze and Monitor Global Patterns

Daniela Moody*, Descartes Labs; Steven Brumby, Descartes Labs; Mike Warren, Descartes Labs; Sam Skillman, Descartes Labs

The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Historical, multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes per year of high-resolution imagery with daily global coverage. Cloud computing and storage, combined with recent advances in machine learning and open software, are enabling understanding of the world at an unprecedented scale and detail. We have assembled all available satellite imagery from the USGS Landsat, NASA MODIS, and ESA Sentinel programs, as well as commercial PlanetScope and RapidEye imagery, and have analyzed over 2.8 quadrillion multispectral pixels. We leveraged the commercial cloud to generate a tiled, spatio-temporal mosaic of the Earth for fast iteration and development of new algorithms combining analysis techniques from remote sensing, machine learning, and scalable compute infrastructure. Our data platform enables processing at petabytes per day rates using multi-source data to produce calibrated, georeferenced imagery stacks at desired points in time and space that can be used for pixel level or global scale analysis. We demonstrate our data platform capability by using the European Space Agency’s (ESA) published 2006 and 2009 GlobCover 20+ category label maps to train and test a Land Cover Land Use (LCLU) classifier, and generate current self-consistent LCLU maps in Brazil. We train a standard classifier on 2006

42

GlobCover categories using temporal imagery stacks, and we validate our results on co-registered 2009 Globcover LCLU maps and 2009 imagery. We then extend the derived LCLU model to current imagery stacks to generate an updated, in-season label map. Changes in LCLU labels can now be seamlessly monitored for a given location across the years in order to track, for example, cropland expansion, forest growth, and urban developments. An example of change monitoring is illustrated in the uploaded image showing rainfed cropland change in the Mato Grosso region of Brazil between 2006 and 2009.



Paper 69

NIIRS Prediction Using In-scene Estimation of Parameters

John Irvine*, Draper; Charles McPherson, Draper; Joshua Nolan, Draper; David Phillips, Draper; Richard Wood, Draper

The National Imagery Interpretability Ratings Scale (NIIRS) is a useful standard for intelligence, surveillance, and reconnaissance (ISR) missions. Accurate prediction of the NIIRS level of an image is a useful capability that supports collection management, sensor design and trade studies, and image chain analysis. The general image quality equation (GIQE) is a standard model which predicts the NIIRS of the image using sensor parameters such as the ground sample distance (GSD), the relative edge response (RER), and signal-to-noise ratio (SNR). Typically, these parameters are associated with a specific sensor and the metadata corresponding to specific image operations. For many situations, however, these sensor metadata are not available and it is necessary to estimate these parameters from information available in the imagery. For images acquired from in the visible region, we present methods for estimating the GSD, RER, and SNR through analysis of the scene and demonstrate the incorporation of parameter values into NIIRS predictions using the GIQE.



Paper 70

Reflectance retrieval from spatial high resolution multi-spectral and hyper-spectral imagery using point clouds

Christoph Borel-Donohue*, ARL

In order to analyze multispectral and hyperspectral images produced by imaging remote sensing systems it is necessary pixel by pixel to accurately estimate the illumination which is a complex combination of direct and indirect illumination. Applications where this is important are surface albedo estimations, target/material identification, small target detection and change detection. This process is well established for projected pixel dimensions of 6 meters or larger by taking the surface orientation into account and possibly reflections from adjacent objects such as nearby slopes in mountain valleys. With today’s proliferation of high spatial resolution imagery of complex urban imagery from satellites and more and more from unattended aerial systems (UAS) a new approach is

43 necessary. We are currently researching algorithms that can be applied to multi- and hyper-spectral imagery with the aid of co-registered point clouds which are obtained from multi-view structure from motion.



Paper 71

Exploring Image-based Indicators of Crime and Economic Well-Being in Sub-Saharan Africa

Payden McBee*, Northeastern University and Draper

Crime and economic well-being are important factors to consider when assessing the stability of a region. Gaining insight into these factors can be expensive and perilous, depending on the accessibility of the country and the cooperation of the government. However, the abundance of satellite imagery provides an avenue to analyze crime and economic indicators. Using social science theory, we examine the ability of physical structures and surroundings to indicate societal issues. Draper’s previous studies explored the ability of satellite image features to access indicators of governance and economic well-being in Afghanistan and Sub-Saharan Africa. Such research has provided promising insight into identifying feature sets from satellite imagery that are useful in predicting the economic and political landscape of the country. Past work also described an automated technique to quantify the amount of built-up areas in a village. In this research, we extend the quantification to multiple types of built-up areas. This quantification adds another layer of description to our library of spatial characteristics. We then use these new feature sets to deepen and enhance the understanding of indicators of crime and economic-well-being. We use survey data to access the improvement of the predictive power using the additional feature sets.



Paper 72

Optical Flow on Degraded Imagery

Josh Harguess*, SSCPAC; Chris Barngrover, SSCPAC; Michael Reese, SSCPAC

Motion estimation from video is an increasingly important problem with applications in ego-motion estimation of an unmanned vehicle, segmentation from video, object detection and tracking, and others. Recent advances in optical flow have made motion estimation possible in many applications with high-resolution imagery. However, in the presence of noise and compression artifacts, these state-of-the-art optical flow algorithms fail, in various ways, to recover the motion where they would otherwise succeed. We study the effects of adding noise and compression artifacts to a well-known optical flow dataset when processed by several state-of-the-art algorithms and present compelling qualitative and quantitative results due to these degradations.

44

Paper 73

Computationally Efficient Scene Categorization in Complex Dynamic Environments

Allison Mathis*, Army Research Laboratory

We propose a computationally efficient method for object classification in a dynamic scene, suitable for a mobile platform. Certain phenomena of interest, such as trees waving or people walking, produce repeatable and recognizable patterns when represented by an optic flow vector field. These patterns are invariant to a mobile observer’s viewpoint. In this paper we identify such patterns and demonstrate a computationally efficient framework to differentiate between dynamic and static background elements, mobile agents and ego-motion produced by the platform. When navigating through a dynamic environment, it is useful to have an understanding of the nature of the obstacles encountered, which may be used to determine which disturbances are benign and which ought to be avoided, without requiring burdensome analysis. We offer experimental validation of the proposed methodology.



Paper 74

Quality assessment of OCT-Imagery of retina images in Dry-AMD (Age-related Macular Degeneration) patients

Nastaran Ghadar*, Draper; John Irvine, Draper; David O'Dowd, Draper; Kristie Lin, Retina Institute ; Tom Chang, Retina Institute

One of the major issues with automatic analysis of optical coherence tomography (OCT) images is to verify the ability of the automated software to assess the input images. If the quality of an image is indeed low, it is expected that the software will not be able to analyze it precisely and therefore the likelihood of misclassification will be higher regardless of the algorithm. Having a filter to assess the image quality before processing the images through the software, can eliminate the candidates that are likely to be incorrectly classified and therefore the efficacy and prediction results of the software will be based on the algorithm itself and not due to low quality images. We investigated several image quality metrics in the context of analyzing age-related macular degeneration (AMD) using OCT imagery. The results based on 25 patients indicate that quality analysis prior to processing the OCT data can improve the classification performance. Adding quality estimation will eliminate the candidates that are most probable to be misclassified.

45

Paper 75

Automated video quality detection using scene segmentation and mover detection

Andrew Kalukin%*, National Geospatial-Intelligence Agency; Andrew Maltenfort, National Geospatial-Intelligence Agency; Josh Harguess, SSCPAC; John Irvine, Draper

Automated mensuration of objects in scenes can be used as a strategy for automating the assessment of video quality. The Video National Imagery Interpretability Rating Scale (VNIIRS) provides criteria for rating the quality of video according to the level of detail that humans can detect in moving objects. In this research, a commercial off-the-shelf software package for detecting moving objects in video is used to generate image frames in which boxes are drawn around moving objects. A supervised learning algorithm is applied for segmentation of video frames, and vehicles and people are measured using an automated algorithm to find the size of the annotation boxes from the mover detector. The measurements provide an estimate of the ground sampling distance of each video frame. This system of mensuration can be used to rate video quality based on in-scene information only, without relying on source metadata. A crowd-sourcing methodology for comparing the output with human analysis is demonstrated; the method is based on video tagging, and is similar to techniques used to annotate video for sporting events.



Paper 76

Face Detection at Multiple Infrared Bands

Zhan Yu*, Adobe; S. Susan Young, ARL; Jingyi Yu, University of Delaware

Face detection is a long standing problem in image processing and computer vision. Tremendous efforts have been made on both the sensor and algorithm fronts for robustly handling variations in scale, location, orientation, facial expression, occlusions, etc. By far, the de-facto gold standard Viola-Jones algorithm still performs poorly for under low light conditions. In this paper, we explore the face detection problem in the infrared spectrum at different wave- bands. On the data front, we gather and utilize the NVESD-ARL multimodal face database that contains over 100,000 images acquired using three different wavelengths under various environments. On the algorithm front, we tailor the Viola-Jones algorithm to handle IR imagery at different wavebands. We fur- ther explore data dependency between different IR wavebands. Our analysis provides useful guidance for developing future multi-spectrum sensors suitable for FD tasks under low light conditions.

46

Paper 77

Rapid Development of Scientific Virtual Reality Applications

Scott Sorensen*, University of Delaware; Chandra Kambhamettu, University of Delaware

Visualizing and working with large scientific image data can benefit greatly from advances in consumer Virtual Reality and game development tools, but presently only limited applications have leveraged these. We present a set of techniques for mapping scientific images and data manipulation tasks into virtual reality applications with an emphasis on short development time and high quality user experience. Existing techniques often require large teams to develop applications that suffer in quality or require extensive development time. Using off the shelf Virtual Reality hardware and freely available development tools we present a set of techniques for making immersive and interactive data visualization and manipulation. We demonstrate these techniques with a series of Virtual Reality apps to produce high quality immersive applications in incredibly short periods of time. These applications show large datasets and extracted high level information geared towards image understanding. We illustrate that these techniques can be integrated into existing data acquisition, image processing and machine learning work flows. With these techniques we hope to encourage future development of scientific applications for emerging Virtual Reality platforms.



Paper 78

Enhanced GroupSAC: Efficient and Reliable RanSAC scheme in the presence of depth variation

Kyung-Min Han*, University of Missouri; Guilherme DeSouza, University of Missouri

Random SAmple Consensus (RanSAC) is one of the most widely used techniques in computer vision. Nevertheless, RanSAC is known to return imperfect solutions which can some times be improved depending on the target application. Hence, many variation of the RanSAC algorithms have been and are still being created to date. Another important drawback of RanSAC is the pontentialy slow speed of the processing. For example, when inliers are dominated by miss- matchings or a large noise is present in the putative matchings, RanSAC requires many iterations in order to find the best inlier set. In that sense, when a full RanSAC matching is needed for the entire image set, speeding up is a matter of great concern. In this paper, we propose a method to speed up RanSAC without detriment to the number of final inliers. Our method is reminiscent of GroupSAC [NiDel09]. However, our method offers a few other novelties, as for example the way in which it performs grouping of putative matches as well as the selection of samples from those same groups. We compared the performance of our method against GroupSAC over a large number of images in order to validate potential of the proposed method.

47

Paper 79

Machine Vision Algorithms for Robust Pallet Engagement and Stacking

Charles Cohen*, Cybernet Systems Corporation

The Army has a critical need for the automation of logistics assets that reduce the cost of operations, while increasing the efficiency and safety of operations. These vehicles sense the world around them and react to the environment, thereby minimizing the need to modify the working environment. An important feature that needs to be addressed is pallet engagement: the ability to find cargo pallets, sufficiently localize them so that autonomous tactical forklifts can safely engage and interact with them, and then manipulate those containers for stacking and unstacking operations. In this paper, we detail our work with using the Light Capability Right Terrain Forklift (LCRTF) as the autonomous platform for safely engaging Joint Modular Intermodal Containers using machine vision techniques. We will: • Detail the requirements for the system; • Discuss the issues involved with the pallet, the vehicle, and the environment; • Discuss the sensors used; • Detail the machine vision techniques that failed to support this task, and why; • Detail the current effective methods for machine vision based pallet engagement; and • Discuss future efforts. During the talk, video of the system’s performance will be shown. This work is sponsored by the ARDEC- LRED (Armament Research Development and Engineering Center – Logistics Research & Engineering Directorate, the Army Logistics Innovation Agency, and ARDEC – ASEC (Armament Software Engineering Center).



Paper 80

Real-time 3D object recognition and measurement applications with abstract based algorithm (Logistics, Transportation)

SungHo Moon*, CurvSurf

Advances in 3D laser-vision & automation technology enable more practical human-like 3D object recognition. However, the core limitation would be lack of a fully automated 3D object detection and parameter estimation algorithm. Over decade of research and development at CurvSurf, a sophisticated mathematical solution (Abstract based) can provide “human-like” 3D perceptions with accurate dimensional measurements and facilitate much faster shape/pattern recognition. Real- time recognition and measurement of basic geometric topologies, such as Cuboid, Plane, Sphere, Cylinder, Cone, and Torus, shall provide a full 6DoF (3D position, Orientation) including dimensional information (Size, Radius, Volume). Its potential applications include Reverse Engineering, AR (Augmented Reality), Robot Vision, Logistic Managements, Deep Learning, Collision Avoidance, UAV, Autonomous Car, and any 3D object recognition applications which needs “Human-Like” robust perception in real-time. Also, implementing abstract based algorithm that process 3D point-cloud and other complicated big 3D data at a significant faster speed had been one of key challenges in conjunction with deep-learning.

48

Paper 81

Effective Dataset Augmentation for Pose and Lighting Invariant Face Recognition

Daniel Crispell*, Vision Systems, Inc.

The performance of modern face recognition systems is a function of the dataset on which they are trained. Most datasets are largely biased toward ”near-frontal” views with benign lighting conditions, negatively effecting recognition performance on images that do not meet these criteria. The proposed approach demonstrates how synthetic images with simulated pose and lighting variations can be efficiently rendered and effectively leveraged to improve performance of deep neural network-based recognition systems at training time without modifying the structure of the network itself. Specifically, a fast and robust 3-d shape estimation and rendering pipeline which includes the full head and background is presented, as well as various approaches for incorporating the renderings into the training procedure in order to extract from them maximum utility. Quantitative results are presented on the challenging IJB-A dataset, as well as individual examples demonstrating improved robustness under challenging conditions.

49

Cross-reference of Paper Number to Session

Paper No. Session Paper No. Session Paper No. Session

1 WP 28 TP 60 TP

2 TS 29 WS 61 RA

3 WS 30 TS 62 TS 5 WP 33 TS 63 TP 6 TA 35 TS 64 RP 7 TS 36 WS 66 TS 8 WS 38 WS 67 RP 9 TS 39 WS 69 RP 11 WS 40 WS 70 RA 12 WP 42 TP 71 WS 13 WP 44 TS 72 RP 14 WA 45 TS 73 WA 17 TS 47 TA 74 RP 18 TP 48 WS 75 RP 19 TP 49 WA 76 WS 20 TA 52 TP 77 RA 21 WS 53 WA 78 TS 22 TP 54 TS 79 TA 23 WP 55 WS 80 RA 24 WP 56 TP 81 RA 26 WA 57 WA 27 WS 58 TS

Key: T = Tuesday, W = Wednesday, R = Thursday, A = A.M., P = P.M., S = Poster

50