Scene Understanding for Mobile Robots Exploiting Deep Learning Techniques

Total Page:16

File Type:pdf, Size:1020Kb

Scene Understanding for Mobile Robots Exploiting Deep Learning Techniques Scene Understanding for Mobile Robots exploiting Deep Learning Techniques José Carlos Rangel Ortiz Instituto Universitario de Investigación Informática Scene Understanding for Mobile Robots exploiting Deep Learning Techniques José Carlos Rangel Ortiz TESIS PRESENTADA PARA ASPIRAR AL GRADO DE DOCTOR POR LA UNIVERSIDAD DE ALICANTE MENCIÓN DE DOCTOR INTERNACIONAL PROGRAMA DE DOCTORADO EN INFORMÁTICA Dirigida por: Dr. Miguel Ángel Cazorla Quevedo Dr. Jesús Martínez Gómez 2017 The thesis presented in this document has been reviewed and approved for the INTERNATIONAL PhD HONOURABLE MENTION I would like to thank the advises and contributions for this thesis of the external reviewers: Professor Eduardo Mario Nebot (University of Sydney) Professor Cristian Iván Pinzón Trejos (Universidad Tecnológica de Panamá) This thesis is licensed under a CC BY-NC-SA International License (Cre- ative Commons AttributionNonCommercial-ShareAlike 4.0 International License). You are free to share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the ma- terial. The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. NonCommercial — You may not use the material for commercial purposes. ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Document by José Carlos Rangel Ortiz. A mis padres. A Tango, Tomás y Julia en su memoria. Revuelvo la mirada y a veces siento espanto cuando no veo el camino que a ti me ha de tornar... ¡Quizá nunca supiese que te quería tanto, si el Hado no dispone que atravesara el mar!... Patria (fragmento) Ricardo Miró Agradecimientos Al culminar esta tesis doctoral y mirar hace casi 4 años cuando inicié, me percato de lo mucho que ha ocurrido y agradezco profundamente la fortuna de encontrar personas maravillosas dispuestas a brindarme lo mejor de sí en pos de mi mejora como investigador y como persona, y con los cuales estaré eternamente agradecido. Dentro de estas personas debo agradecer a mis directores, Miguel y Jesús, por la paciencia, las charlas, por siempre tener un momento, por aclararme las cosas repetidamente cada vez que el código se iba por donde no debía y por compartir sus conocimientos y experiencias tanto de la vida como de la profesión. Muchas gracias a ambos por llevar este reto conmigo y por siempre tener ese palabra de aliento para seguir adelante. Al grupo de investigación RoVit y al Departamento de Ciencia de la Computación e Inteligencia Artificial. A los chicos del laboratorio Fran, Sergio, Félix, Ed, Vicente y Manolo por esas charlas sobre series y sus buenos consejos para avanzar. A las miembros del DTIC que siempre me vieron como uno más, por todas esas comidas y momentos que me han hecho olvidar la lejanía de mi patria. A Cristina e Ismael en la UCLM en Albacete por abrirme las puertas de su grupo y compartirme sus conocimientos. Al grupo de Data Intelli- gence de la Universidad Jean Monnet de Saint-Etienne en Francia, donde tuve la oportunidad de realizar mi estancia. A Marc y Elissa por su guía y consejos. A María, Leo y Dennis con quienes compartí tantos buenos momentos. Merci Beaocup!. De igual manera a la Escuela de Doctorado de la Universidad de Alicante la cual aportó los recursos para realizar esta estancia. No puedo expresar con palabras mi agradecimiento a mis compañe- ros de piso, Edgardo y Ed, mi familia adoptiva; quienes han tenido que aguantar a este saxofonista aficionado que los fines de semana inundaba el piso con sus a veces desafinadas notas, por esos cines, cenas y vacaciones y viajes. Muchas gracias por permitirme cambiar de tema y olvidar las frus- traciones de un paper. A mis amigos en Santiago para los cuales la lejanía nunca fue excusa para mantener el contacto. A tres personas quienes me vieron iniciar este camino, mas tuvieron la oportunidad de llegar al final junto a mí, mis abuelos Tango y Tomás, y mi Tía Julia. Gracias por sus charlas, por sus cuidos y por ese ejemplo de vida y de siempre seguir adelante. A mis familiares en Panamá por su apoyo incondicional y en especial a ti Quenda, que con tus actos me has enseñado a nunca perder la sonrisa y la esperanza de un mejor porvenir. A mis padres que siempre me han inculcado el avanzar y que desde mis más remotos recuerdos han apoyado mis decisiones sin importar sus dudas o lo raro que les parecieran. Sin ustedes cada letra de este documento no sería posible. A mi hermano y cuñada que a pesar de la distancia siempre estuvieron dispuestos a brindarme su ayuda ante cualquier cosa que nece- sitase. A esa pequeña persona que estuvo allí en cada vídeo llamada, José Andrés, gracias por hacerme reír y permitirme verte crecer. A los profesores de la Facultad de Ingeniería de Sistemas y Compu- tación de la Universidad Tecnológica de Panamá y en especial a los del Centro Regional de Veraguas donde estudié y quienes siempre me anima- ron a mejorar. Finalmente deseo agradecer a la Universidad Tecnológica de Panamá y el IFARHU, las instituciones que depositaron en mí su confianza para llevar a cabo estos estudios. Al servicio de traducción de la Universidad de Alicante quién ha revi- sado que las ideas expresadas en este documento cumplan las reglas que exige el idioma en el cual se redactan. Alicante, 8 de junio de 2017 José Carlos Rangel Ortiz Resumen Cada día los robots se están volviendo más comunes en la sociedad. En consecuencia, estos deben poseer ciertas habilidades básicas para interac- tuar con los seres humanos y el medio ambiente. Una de estas habilidades es la capacidad de entender los lugares por los cuales tiene la capacidad de movilizarse. La visión por computador es una de las técnicas comúnmente utilizadas para lograr este propósito. Las tecnologías actuales en este cam- po ofrecen soluciones sobresalientes aplicadas a datos que cada día tienen una mejor calidad permitiendo así producir resultados más precisos en el análisis de un entorno. Con esto en mente, el objetivo principal de esta investigación es desa- rrollar y validar un método eficiente de comprensión de escenas basado en la presencia de objetos. Este método será capaz de ayudar en la resolución de problemas relacionados con la identificación de escenas aplicada a la robótica móvil. Buscamos analizar los métodos más avanzados del estado-del-arte con el fin de encontrar el más adecuado según nuestros objetivos, así como también seleccionar el tipo de datos más conveniente para tratar este pro- blema. Otro objetivo de la investigación es determinar el tipo de datos más adecuado como fuente de información para analizar la escenas, con el fin de encontrar una representación precisa de estas mediante el uso de etiquetas semánticas o descriptores de características para nubes de puntos. Como objetivo secundario mostraremos la bondad de la utilización de descriptores semánticos generados con modelos pre-entrenados en procesos de mapeado y clasificación de escenas, así como también el uso de modelos de deep learning junto con procedimientos de descripción de característi- cas 3D para construir un modelo de clasificación de objetos 3D, lo cual está directamente relacionado con el objetivo de representación de esta investigación. La investigación descrita en esta tesis fue motivada por la necesidad de un sistema robusto capaz de comprender las ubicaciones en las cuales un robot interactúa normalmente. De la misma manera, el advenimiento de mejores recursos computacionales ha permitido implementar algunas técnicas que fueron definidas en el pasado, las cuales demandan una alta capacidad computacional y que ofrecen una posible solución para abordar los problemas de comprensión de la escenas. Una de estas técnicas son las Redes Neuronales Convolucionales (CNN, por sus siglas en inglés). Estas redes poseen la capacidad de clasificar una imagen basada en su apariencia visual. A continuación, generan una lista de etiquetas léxicas y la probabilidad para cada una de estas etiqueta, la cual representa cómo de probable es la presencia de un objeto en la escena. Estas etiquetas se derivan de los conjuntos de entrenamiento en los cuales las redes neuronales fueron entrenadas para reconocer. Por lo tanto, esta lista de etiquetas y probabilidades podría ser utilizada como una representación eficiente del entorno y a partir de estas asignar una categoría semántica a las locaciones donde un robot móvil es capaz de navegar, permitiendo al mismo tiempo construir un mapa semántico o topológico basado en esta representación semántica del lugar. Después de analizar el estado-del-arte en la comprensión de escenas, hemos identificado un conjunto de enfoques con el fin de desarrollar un pro- cedimiento robusto de comprensión de escenas. Entre estos enfoques existe una brecha casi inexplorada en lo relativo a la comprensión de escenas ba- sadas en la presencia de objetos en estas. Consecuentemente, proponemos llevar a cabo un estudio experimental basados en este enfoque, destinado a encontrar una forma de describir plenamente una escena teniendo en cuenta los objetos que se encuentran en el lugar. Dado que la tarea de comprensión de escenas implica la detección y anotación de objetos, uno de los primeros pasos será determinar el tipo de datos que se emplearán como datos de entrada para nuestra propuesta. Con esto en mente, nuestra propuesta considera la evaluación del uso de datos 3D. Este tipo de datos cuenta con el inconveniente de la presencia de ruido en sus representaciones, por lo tanto, proponemos utilizar el algo- ritmo Growing Neural Gas (GNG, por sus siglas en inglés) para reducir el efecto de ruido en procedimientos de reconocimiento de objetos utilizando datos 3D.
Recommended publications
  • FOR IMMEDIATE RELEASE: Tuesday, April 13, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Tuesday, April 13, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai to Deliver AI/ML Algorithms on Palantir's Data Management Platform for U.S. Army's Ground Station Modernization Fort Lee, New Jersey - Clarifai, a leadinG AI lifecycle platform provider for manaGinG unstructured imaGe, video and text data, announced today that it has partnered with Palantir, Inc. (NYSE: PLTR) to deliver its Artificial IntelliGence and Machine LearninG alGorithms on Palantir’s data manaGement platform. This effort is part of the first phase of the U.S. Army’s Ground Station modernization in support of its Tactical IntelliGence TargetinG Access Node (TITAN) proGram. Palantir was awarded a Phase one Project Agreement throuGh an Other Transaction Agreement (OTA) with Consortium ManaGement Group, Inc. (CMG) on behalf of Consortium for Command, Control and Communications in Cyberspace (C5). DurinG the first phase of the initiative, Clarifai is supportinG Palantir by providinG object detection models to facilitate the desiGn of a Ground station prototype solution. “By providinG our Computer Vision AI solution to Palantir, Clarifai is helpinG the Army leveraGe spatial, hiGh altitude, aerial and terrestrial data sources for use in intelliGence and military operations,” said Dr. Matt Zeiler, Clarifai’s CEO. “The partnership will offer a ‘turnkey’ solution that inteGrates data from various sources, includinG commercial and classified sources from space to Ground sensors.” “In order to target with Great precision, we need to be able to see with Great precision, and that’s the objective of this proGram,” said Lieutenant General Robert P.
    [Show full text]
  • The Ai First Experience
    May 23, 2018 THE AI FIRST EXPERIENCE For Internal Use Only - Do Not Distribute CLARIFAI IS A MARKET LEADER IN VISUAL RECOGNITION AI TECHNOLOGY Proven, award-winning technology Team of computer vision experts $40M+ in venture to date Imagenet 2013 Winners Matt Zeiler, CEO & Founder Machine Learning PhD For Internal Use Only - Do Not Distribute COMPUTER VISION RESULTS 90% Accuracy 1980 2013 For Internal Use Only - Do Not Distribute AI 4 NEURAL NETWORK HISTORY 30x Speedup ● more data ● bigger models ● faster iterations 1980s 2009 Geoff Hinton Yann LeCun For Internal Use Only - Do Not Distribute CLARIFAI: A BRIEF HISTORY V2 API Brand Refresh Clarifai launches v2 API with Custom Clarifai launches brand refresh Training and Visual Search On Prem V1 API Series B Clarifai announces On Prem Clarifai launches v1 API Clarifai raises $30M series B funding for image recognition led by Menlo Ventures 2013 2015 2017 2014 2016 2018 Foundation Video Matt Zeiler wins top Clarifai announces video recognition SF Office ImageNet, starts Clarifai capability Clarifai announces opening of SF-based office Series A Clarifai receives $10M series A funding Mobile SDK led by Union Square Ventures Clarifai releases Mobile SDK in Beta Forevery Clarifai launches first consumer app, Forevery photo For Internal Use Only - Do Not Distribute AI 6 POWERFUL ARTIFICIAL INTELLIGENCE IN CLEAN APIS AND SDKS For Internal Use Only - Do Not Distribute AI 7 MODEL GALLERY DEMOS For Internal Use Only - Do Not Distribute AI 8 EASY TO USE API For Internal Use Only - Do Not Distribute
    [Show full text]
  • FOR IMMEDIATE RELEASE: Thursday, May 27, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Thursday, May 27, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai Adds Another Senior Advisor to its Public Sector Advisory Council to Better Serve its Government and Civilian Agency Customers Fort Lee, NJ, May 27, 2021 — Today, Clarifai announced that Jacqueline Tame, the former Acting Deputy Director of the DoD’s Joint Artificial Intelligence Center (JAIC), has Joined the company as a senior advisor and member of its Public Sector Advisory Council. Clarifai is a leading AI provider that specializes in Computer Vision, Natural Language Processing and Audio Recognition. Clarifai helps government and defense agencies transform unstructured image, video, text and audio data into structured data. At the JAIC, Ms. Tame oversaw day-to-day operations, advised DoD AI leadership and led engagements with the White House and Congress to raise awareness of DoD AI programs and shape policy. “Jacqueline Tame has very unique expertise that combines DoD technology policy with implementation of repeatable, outcome-based AI programs” said Dr. Matt Zeiler, Clarifai’s CEO. “She is one of the few people that has successfully worked with Congress to define and fund the US government’s AI policy. We are thrilled to partner with Jacqueline in re-envisioning how our government can work with best-in-class technology companies while harnessing innovation at home.” Ms. Tame has also served as a Senior Advisor to the Under Secretary of Defense for Intelligence and Security; Senior Staffer on the House Permanent Select Committee on Intelligence; Chief of Customer Engagement at the Defense Intelligence Agency; Advisor to the Chief of Naval Operations; and Policy Advisor to the Deputy Director of National Intelligence.
    [Show full text]
  • Challenges and Modern Machine Learning-Based Approaches to Object Tracking in Videos
    Challenges and modern machine learning-based approaches to object tracking in videos Object Tracking Introduction Object tracking, also known as visual tracking, in a video, is one of the challenging tasks in machine vision due to appearance changes caused by variable illumination, occlusion, and variation changes (scale of objects, pose, articulation, etc.). Some of the complex problems in visual tracking is people tracking and traffic density estimation. The conventional algorithms developed by researchers are able to tackle only some of the issues in object tracking, however, they still do not provide a foolproof solution (Bombardelli, Gül, Becker, Schmidt, & Hellge, 2018). Note: It is assumed that readers of this article are well aware of ML techniques used in object detection and object classification. Video tracking aPPlications · Surveillance · Security · Traffic monitoring · Video communication · Robot vision · Accident prediction/detection Can we use object detection at every frame to track an object in a video? This is a very valid question, however, temporal information is a more complex task for non-static items. When things move, there are some occasions when the object disappears for a few frames. Object detection will stop at that point. If the image appears again, a lot of re-processing will be required to identify whether the object that appeared is the same object that was detected previously. The problem becomes more complex when there is more than one object in a video. Object detection is more focused on working with a single image at a time with no idea about motion that may have occurred in the past frame.
    [Show full text]
  • 2018 World AI Industry Development Blue Book 2018 World AI Industry Development Blue Book
    2018 World AI Industry Development Blue Book 2018 World AI Industry Development Blue Book Preface 3 As the driving force of a new round of technology Committed to taking the lead in China’s reform In-depth Report on the and opening-up and innovative developments, Development of the World AI and industrial revolution, artificial intelligence Industry, 2018 (AI) is wielding a profound impact on the world Shanghai has been working closely with economy, social progress and people’s daily life. related national ministries and commissions to Overview and Description Therefore, the 2018 World AI Conference will be jointly organize the 2018 World AI Conference. held in Shanghai from September 17 to 19 this year, Hosting this event becomes a priority in the Industrial Development jointly by the National Development and Reform process of Shanghai constructing the “five Environment Commission (NDRC), the Ministry of Science and centers” - international and economic center, Technology (MST), the Ministry of Industry and finance center, trade center, shipping center Technical Environment Information Technology (MIIT), the Cyberspace and technical innovation center, and building Administration of China (CAC), the Chinese the “four brands” of Shanghai - services, World AI Enterprises Academy of Sciences (CAS), the Chinese Academy manufacturing, shopping and culture. Also, of Engineering (CAE) and the Shanghai Municipal it is the practical measure to foster the reform World Investment and People’s Government, with the approval of the State and opening-up and optimize the business Financing Council, in order to accelerate the development environment in Shanghai. This meeting of a new generation of AI, complying with the provides the opportunity for Shanghai to Industrial Development development rules and grasping the future trends.
    [Show full text]
  • Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Wednesday, October 28, 2020 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai introduces a new suite of products at its inaugural Perceive 2020 conference NEW YORK, October 28, 2020 — Clarifai, a leading AI lifecycle platform provider for managing unstructured image, video, and text data, hosted its inaugural Perceive 2020 conference on October 21. The conference featured a rock star team of speakers from companies such as Facebook, Google, Samsung, and more. It also included 20 business, technical, panel, and use case sessions. “We hosted this free event to foster a community around AI, where businesses and public sector organizations can learn and collaborate and achieve their mission of accelerating their digital transformation with AI,” said Dr. Matt Zeiler, Clarifai’s CEO and Founder. Clarifai used the event to introduce a suite of new products that position them as the only complete end-to-end AI platform for unstructured data. “Our new Scribe product is a great addition to our end-to- end computer vision and NLP platform. We built it for scalable data labeling so that AI can really learn from accurately annotated datasets,” said Zeiler. In addition to Scribe, Clarifai launched Spacetime — a search tool that can identify when and where objects appear in images and video data. They also launched its Enlight suite for training, managing, and updating deep learning models. Enlight rapidly accelerates the model development process and helps users to iterate on their models over time. They can improve model accuracy and speed while preventing data drift.
    [Show full text]
  • Perspectives on Visual Search Technology What Is Visual Search? Retailers Are Looking for New Ways to Streamline Product Discovery and Visual Search Is Enabling Them
    July 2019 ComCap’s perspectives on Visual Search technology What is Visual Search? Retailers are looking for new ways to streamline product discovery and Visual Search is enabling them ▪ At it’s core, Visual Search can answer key questions that are more easily resolved using visual prompts, unlike traditional text-based searching − From a consumer standpoint, visual search enables product discovery, connecting consumers directly to the products they desire ▪ Visual search is helping blend physical experiences and online convenience as retailers are competing to offer the best service for high-demanding customers through innovative technologies ▪ Consumers are now actively seeking multiple ways to visually capture products across channels, resulting in various applications including: − Image Recognition: enabling consumers to connect pictures with products, either through camera- or online- produced snapshots through AI-powered tools − QR / bar code scanning: allowing quick and direct access between consumers and specific products − Augmented Reality enabled experiences: producing actionable experiences by recognizing objects and layering information on top of a real-time view Neiman Marcus Samsung Google Introduces Amazon Introduces Snap. Introduces Bixby Google Lens Introduces Find. Shop. (2015) Vision (2017) (2017) StyleSnap (2019) Source: eMarketer, News articles 2 Visual Search’s role in retail Visual Search capabilities are increasingly becoming the industry standard for major ecommerce platforms Consumers place more importance on
    [Show full text]
  • Derive Insights from Unstructured Text with NLP
    nATURAL LANGUAGE PROcESSING Derive insights from unstructured text with NLP Get more value out of your text and image data with NLP and Computer Vision. Use these powerful pre-trained models for 3 Topic analysis Classify documents into a language understanding pre-determined set of topics or themes. 1 Text moderation Protect users from toxic, obscene, racist or Smart reply threatening language with pre-trained or 4 custom moderation models. Suggest quick responses for chatbots, emails or other conversational clients. Should this text be shown to our community? Choose the best response: Checking in with you as it’s been too long since we last caught up! Toxic 5 Named entity recognition Sentiment analysis 2 Identify and extract structured text from Assign negative, postive or neutral sentiment documents with the option to add values to documents. bounding boxes. Categorize the sentiment in this text: I’ve been enjoying how easy this product is to use. Founded in 2013 D AT E by Matthew Zeiler NA ME , a foremost expert in machine learning, Clarifai Organization Positive has been a market leader since winning the top five places in image classification at the ImageNet Organization 2013 D AT E competition. optical character recognition Recognize text in images and 98% documents with Object Positive Character Recognition (OCR) Topic Uses of OCR Read text in images and video Extract text from images to gain more meaning and metadata from your content. Text name Metadata information Process documents at scale Why Clarifai Process documents automatically and at scale with the power of Computer Vision. Clarifai is the leading enterprise platform for Computer Vision, Natural Language Processing and deep learning to model unstructured image, video and textual data.
    [Show full text]
  • Review of Deep Convolution Neural Network in Image Classification
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by UMP Institutional Repository 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications Review of Deep Convolution Neural Network in Image Classification Ahmed Ali Mohammed Al-Saffar, Hai Tao, Mohammed Ahmed Talab Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang Pahang, Malaysia [email protected] Abstract—With the development of large data age, recognition system, feature extraction methods such as Scale- Convolutional neural networks (CNNs) with more hidden layers invariant feature transform (SIFT [1]) and histogram of have more complex network structure and more powerful feature oriented gradients (HOG [2]) were used, and then the extracted learning and feature expression abilities than traditional machine Feature input classifier for classification and recognition. These learning methods. The convolution neural network model trained features are essentially a feature of manual design. For by the deep learning algorithm has made remarkable different identification problems, the extracted features have a achievements in many large-scale identification tasks in the field direct impact on the performance of the system, so the of computer vision since its introduction. This paper first researchers need to study the problem areas to be studied in introduces the rise and development of deep learning and order to design Adaptability to better features, thereby convolution neural network,
    [Show full text]
  • The Malicious Use of Artificial Intelligence: February 2018 Forecasting, Prevention, and Mitigation
    Future of University Centre for University of Center for a Electronic OpenAI Humanity of Oxford the Study of Cambridge New American Frontier Institute Existential Security Foundation Risk The Malicious Use February 2018 of Artificial Intelligence: Forecasting, Prevention, and Mitigation The Malicious Use of Artificial Intelligence: February 2018 Forecasting, Prevention, and Mitigation Miles Brundage Shahar Avin Jack Clark Helen Toner Peter Eckersley Ben Garfinkel Allan Dafoe Paul Scharre Thomas Zeitzoff Bobby Filar Hyrum Anderson Heather Roff Gregory C. Allen Jacob Steinhardt Carrick Flynn Seán Ó hÉigeartaigh Simon Beard Haydn Belfield Sebastian Farquhar Clare Lyle Rebecca Crootof Owain Evans Michael Page Joanna Bryson Roman Yampolskiy Dario Amodei 1 Corresponding author 9 American University 18 Centre for the Study of miles.brundage@philosophy. Existential Risk, University ox.ac.uk of Cambridge Future of Humanity Institute, 10 Endgame University of Oxford; Arizona State University 19 Future of Humanity 11 Endgame Institute, University of 2 Corresponding author, Oxford [email protected] University of Oxford/ Centre for the Study of 12 Arizona State University/New Existential Risk, University 20 Future of Humanity America Foundation of Cambridge Institute, University of Oxford 13 Center for a New American 3 OpenAI Security 21 Information Society Project, Yale University Open Philanthropy Project 4 14 Stanford University 22 Future of Humanity Electronic Frontier Institute, University of 5 Future of Humanity Foundation 15 Oxford Institute,
    [Show full text]
  • A State-Of-The-Art Survey on Deep Learning Theory and Architectures
    Review A State-of-the-Art Survey on Deep Learning Theory and Architectures Md Zahangir Alom 1,*, Tarek M. Taha 1, Chris Yakopcic 1, Stefan Westberg 1, Paheding Sidike 2, Mst Shamima Nasrin 1, Mahmudul Hasan 3, Brian C. Van Essen 4, Abdul A. S. Awwal 4 and Vijayan K. Asari 1 1 Department of Electrical and Computer Engineering, University of Dayton, OH 45469, USA; [email protected] (T.M.T.); [email protected] (C.Y.); [email protected] (S.W.); [email protected] (M.S.N.); [email protected] (V.K.A.) 2 Department of Earth and Atmospheric Sciences, Saint Louis University, MO 63108, USA; [email protected] 3 Comcast Labs, Washington, DC 20005, USA; [email protected] 4 Lawrence Livermore National Laboratory (LLNL), Livermore, CA 94550, USA; [email protected] (B.C.V.E.); [email protected] (A.A.S.A.) * Correspondence: [email protected] Received: 17 January 2019; Accepted: 31 January 2019; Published: 5 March 2019 Abstract: In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the- art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others.
    [Show full text]
  • A Multi-Modal Approach to Infer Image Affect
    A Multi-Modal Approach to Infer Image Aect Ashok Sundaresan, Sugumar Murugesan Sean Davis, Karthik Kappaganthu, ZhongYi Jin, Divya Jain Anurag Maunder Johnson Controls - Innovation Garage Santa Clara, CA ABSTRACT Our approach combines the top-down and boom-up features. e group aect or emotion in an image of people can be inferred Due to the presence of multiple subjects in an image, a key problem by extracting features about both the people in the picture and the that needs to be resolved is to combine individual human emotions overall makeup of the scene. e state-of-the-art on this problem to a group level emotion. e next step is to combine multiple investigates a combination of facial features, scene extraction and group level modalities. We address the rst problem by using a Bag- even audio tonality. is paper combines three additional modali- of-Visual-Words (BOV) based approach. Our BOV based approach ties, namely, human pose, text-based tagging and CNN extracted comprises developing a code book using two well known clustering features / predictions. To the best of our knowledge, this is the algorithms: k-means clustering and a Gaussian mixture model rst time all of the modalities were extracted using deep neural (GMM) based clustering. e code books from these clustering networks. We evaluate the performance of our approach against algorithms are used to fuse individual subjects’ features to a group baselines and identify insights throughout this paper. image level features We evaluate the performance of individual features using 4 dif- KEYWORDS ferent classication algorithms, namely, random forests, extra trees, gradient boosted trees and SVM.
    [Show full text]