A Multi-Modal Approach to Infer Image Affect

Total Page:16

File Type:pdf, Size:1020Kb

A Multi-Modal Approach to Infer Image Affect A Multi-Modal Approach to Infer Image Aect Ashok Sundaresan, Sugumar Murugesan Sean Davis, Karthik Kappaganthu, ZhongYi Jin, Divya Jain Anurag Maunder Johnson Controls - Innovation Garage Santa Clara, CA ABSTRACT Our approach combines the top-down and boom-up features. e group aect or emotion in an image of people can be inferred Due to the presence of multiple subjects in an image, a key problem by extracting features about both the people in the picture and the that needs to be resolved is to combine individual human emotions overall makeup of the scene. e state-of-the-art on this problem to a group level emotion. e next step is to combine multiple investigates a combination of facial features, scene extraction and group level modalities. We address the rst problem by using a Bag- even audio tonality. is paper combines three additional modali- of-Visual-Words (BOV) based approach. Our BOV based approach ties, namely, human pose, text-based tagging and CNN extracted comprises developing a code book using two well known clustering features / predictions. To the best of our knowledge, this is the algorithms: k-means clustering and a Gaussian mixture model rst time all of the modalities were extracted using deep neural (GMM) based clustering. e code books from these clustering networks. We evaluate the performance of our approach against algorithms are used to fuse individual subjects’ features to a group baselines and identify insights throughout this paper. image level features We evaluate the performance of individual features using 4 dif- KEYWORDS ferent classication algorithms, namely, random forests, extra trees, gradient boosted trees and SVM. Additionally, the predictions from Group aect, scene extraction, pose extraction, facial feature ex- these classiers are used to create an ensemble of classiers to traction, CENTRIST, feature aggregation, Gaussian mixture model, obtain the nal classication results. While developing the ensem- convolutional neural networks, deep learning, image captions ble model, we also employ predictions from a individually trained convolutional neural network (CNN) model using the group level 1 INTRODUCTION images. Using machine learning to infer the emotion of a group of people in an image or video has the potential to enhance our understanding 2 MODALITIES of social interactions and improve civil life. Using modern com- puter vision techniques, image retrieval can be made more intuitive, 2.1 Facial Feature Extraction doctors will be able to beer diagnose psychological disorders and Facial features are extracted individually from isolated human im- security will be able to beer respond to social unrest before it ages on an image-by-image basis. First, a Faster R-CNN [21] is escalates to violence. Currently, even highly skilled professionals employed to extract the each of the full human frames from the have a dicult time recognizing and categorizing the exact emo- image. ese extracted frames are used for both the pose estimation tional state of a group. Consistent, automatic categorization of the and the facial feature extraction. Each face is then extracted and aect of an image can only be achieved with the latest advances aligned with the frontal view using Deepface [29]. Once the faces in machine learning coupled with precise feature extraction and are isolated and pre-processed, a deep residual network (ResNet) fusion. [14] is employed to extract facial features. Inferring the overall emotion that an image elicits from a viewer requires an understanding of details from vastly diering scales in 2.1.1 Human Frame Extraction. Isolation of each signicant hu- the image. Traditional approaches to this problem have focused on man frame is performed on the image using a Faster R-CNN [21]. arXiv:1803.05070v1 [cs.CV] 13 Mar 2018 small-scale features, namely investigating the emotion of individual e model is trained on the Pascal VOC 2007 data set [9] using the faces [13, 20, 33]. However, both large-scale features [8, 17] and very deep VGG-16 model [26]. e procedure for building and train- additional action/scene recognition must be represented in the ing this model is presented in the original Faster R-CNN manuscript description of the image emotion [18, 25]. Each of these feature [21] and will not be detailed here. extraction methods, or modalities, cannot capture the full emotion of an image by itself. 2.1.2 Deepface. For extracting and aligning human faces in the Previous research has investigated a combination of facial fea- images, Facebook’s DeepFace [29] algorithm is utilized. e align- tures, scene extraction [17] and even audio tonality [27]. is paper ment is performed using explicit 3D facial modeling coupled with combines three additional modalities with commonly used facial a 9-layer deep neural network. For more details on the implemen- and scene modalities, namely, human pose, text-based tagging and tation and training of the model, please see [29]. CNN extracted features / predictions. To the best of our knowl- edge, this is the rst time all of the modalities were extracted using 2.1.3 ResNet based feature extraction. e ResNet model is trained deep neural networks with the exception of the baseline CENsus with the Face Emotion Recognition (FER13) dataset [13], which TRansform hISTogram (CENTRIST) [31] model. contains 35,887 aligned face images with a predened training and validation set. Each face in FER13 is labeled with one of the follow- 2.4 Image Captions ing 7 emotions: anger, disgust, fear, happiness, sadness, surprise 2.4.1 ClarifAI. One of the modalities we used in our model is or neutral. To balance training time and ecacy, the ResNet-50 to generate captions for images and use captions to infer emotion. topology from 14 is implemented. e ResNet model is trained from As a rst step towards that, we rst proceeded to use commercially scratch, with an initial learning rate of 0.1, a learning rate decay available ClarifAI API [1] to answer the following preliminary factor of 0.9 which decays every 2 epochs and a batch size of 64. questions: is network architecture is able to achieve a 62.8% top-1 accuracy on the validation FER13 set when predicting the emotion from an • Whether reasonably accurate captions could be generated image. In lieu of a fully connected layer, ResNet-50 uses global for group-level emotion images average pooling as its penultimate layer. e feature set, which is • Whether captions are descriptive of underlying emotions a vector of 2048 elements, is the output from the global average Fig. 2 illustrates the top four tags generated by Clarifai for some pooling layer when running inference on the isolated faces from of the images in the Emotiw2017 Training Dataset. ese results each image. along with those generated for several other images in the dataset provide us condence that a deep learning model can be reason- ably accurate in generating tags for images, thus answering the 2.2 Centrist rst question above. As for the second question, we refer to Fig. 3. Here, the distribution of top tags generated by Clarifai across e CENsus TRansform hISTogram (CENTRIST) [31] is a visual three emotion classes is ploed. Note that, for several tags such descriptor predominantly used to describe topological places and as ‘administration’, ‘election’, ‘competition’, there is a notable dif- scenes in an image [4, 7, 32, 34]. Census Transform, in essence, is ference in prior probability of occurrence given each class. is calculated by comparing a pixel’s gray-scale intensity with that of could translate to discriminative probability of classes given these its eight neighboring pixels, encoding the results as 8-bit binary tags, assuming uniform prior on the classes. is observation is codeword, and converting the codeword to decimal (Fig. 1). CEN- even more pronounced for more obvious tags such as ‘festival’ and TRIST, in turn, is the histogram of the Census Transform calculated ‘bale’. is answers the second question above, and motivates us over various rectangular sections of the image via Spatial Pyramid to explore further. techniques [16]. By construction, CENTRIST is expected to capture the high-level, global features of an image such as the background, 2.4.2 im2txt. Having motivated ourselves that image tags can the relative locations of the persons, etc. CENTRIST is adopted by be of value in inferring emotions on a group-level image, we turned the challenge organizers as the baseline algorithm for emotion de- our aention to the more general ‘image to caption’ models. To- tection. e baseline accuracy provided by the organizers is 52.97% wards this, we focused on the im2txt TensorFlow open-source model on the Validation set and 53.62% on the Test set. To achieve these [24]. is model combined techniques from both computer vision accuracies, a support vector regression model was trained using and natural language processing to form a complete image caption- features extracted by CENTRIST. In our work, we use CENTRIST ing algorithm. More specically, in machine translation between as a scene-level descriptor and build various modalities on top of it languages, a Recurrent Neural Network (RNN) is used to trans- to extract complementary features from the image. form a sentence in ‘source language’ into a vector representation [3, 5, 28]. is vector is further fed into a second RNN that gener- ates a target sentence in the ‘target language’. In [30], the authors derived from this approach and replaced the rst RNN with a vi- 2.3 Human Pose sual representation of an image using a deep Convolutional Neural e intention of including pose modality in this emotion detection Network (CNN) [23] until the pen-ultimate layer. is CNN was task is to detect similar and dissimilar poses of the individuals in the originally trained to detect objects, and the pen-ultimate layer is image and capture any eect of these poses may have towards the used to feed the second RNN designed to produce phrases in the group emotion.
Recommended publications
  • FOR IMMEDIATE RELEASE: Tuesday, April 13, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Tuesday, April 13, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai to Deliver AI/ML Algorithms on Palantir's Data Management Platform for U.S. Army's Ground Station Modernization Fort Lee, New Jersey - Clarifai, a leadinG AI lifecycle platform provider for manaGinG unstructured imaGe, video and text data, announced today that it has partnered with Palantir, Inc. (NYSE: PLTR) to deliver its Artificial IntelliGence and Machine LearninG alGorithms on Palantir’s data manaGement platform. This effort is part of the first phase of the U.S. Army’s Ground Station modernization in support of its Tactical IntelliGence TargetinG Access Node (TITAN) proGram. Palantir was awarded a Phase one Project Agreement throuGh an Other Transaction Agreement (OTA) with Consortium ManaGement Group, Inc. (CMG) on behalf of Consortium for Command, Control and Communications in Cyberspace (C5). DurinG the first phase of the initiative, Clarifai is supportinG Palantir by providinG object detection models to facilitate the desiGn of a Ground station prototype solution. “By providinG our Computer Vision AI solution to Palantir, Clarifai is helpinG the Army leveraGe spatial, hiGh altitude, aerial and terrestrial data sources for use in intelliGence and military operations,” said Dr. Matt Zeiler, Clarifai’s CEO. “The partnership will offer a ‘turnkey’ solution that inteGrates data from various sources, includinG commercial and classified sources from space to Ground sensors.” “In order to target with Great precision, we need to be able to see with Great precision, and that’s the objective of this proGram,” said Lieutenant General Robert P.
    [Show full text]
  • Asset Finder: a Search Tool for Finding Relevant Graphical Assets Using Automated Image Labelling
    Sjalvst¨ andigt¨ arbete i informationsteknologi 20 juni 2019 Asset Finder: A Search Tool for Finding Relevant Graphical Assets Using Automated Image Labelling Max Perea During¨ Anton Gildebrand Markus Linghede Civilingenjorsprogrammet¨ i informationsteknologi Master Programme in Computer and Information Engineering Abstract Institutionen for¨ Asset Finder: A Search Tool for Finding Rele- informationsteknologi vant Graphical Assets Using Automated Image Besoksadress:¨ Labelling ITC, Polacksbacken Lagerhyddsv¨ agen¨ 2 Max Perea During¨ Postadress: Box 337 Anton Gildebrand 751 05 Uppsala Markus Linghede Hemsida: http:/www.it.uu.se The creation of digital 3D-environments appears in a variety of contexts such as movie making, video game development, advertising, architec- ture, infrastructure planning and education. When creating these envi- ronments it is sometimes necessary to search for graphical assets in big digital libraries by trying different search terms. The goal of this project is to provide an alternative way to find graphical assets by creating a tool called Asset Finder that allows the user to search using images in- stead of words. The Asset Finder uses image labelling provided by the Google Vision API to find relevant search terms. The tool then uses synonyms and related words to increase the amount of search terms us- ing the WordNet database. Finally the results are presented in order of relevance using a score system. The tool is a web application with an in- terface that is easy to use. The results of this project show an application that is able to achieve good results in some of the test cases. Extern handledare: Teddy Bergsman Lind, Quixel AB Handledare: Mats Daniels, Bjorn¨ Victor Examinator: Bjorn¨ Victor Sammanfattning Skapande av 3D-miljoer¨ dyker upp i en mangd¨ olika kontexter som till exempel film- skapande, spelutveckling, reklam, arkitektur, infrastrukturplanering och utbildning.
    [Show full text]
  • Scene Understanding for Mobile Robots Exploiting Deep Learning Techniques
    Scene Understanding for Mobile Robots exploiting Deep Learning Techniques José Carlos Rangel Ortiz Instituto Universitario de Investigación Informática Scene Understanding for Mobile Robots exploiting Deep Learning Techniques José Carlos Rangel Ortiz TESIS PRESENTADA PARA ASPIRAR AL GRADO DE DOCTOR POR LA UNIVERSIDAD DE ALICANTE MENCIÓN DE DOCTOR INTERNACIONAL PROGRAMA DE DOCTORADO EN INFORMÁTICA Dirigida por: Dr. Miguel Ángel Cazorla Quevedo Dr. Jesús Martínez Gómez 2017 The thesis presented in this document has been reviewed and approved for the INTERNATIONAL PhD HONOURABLE MENTION I would like to thank the advises and contributions for this thesis of the external reviewers: Professor Eduardo Mario Nebot (University of Sydney) Professor Cristian Iván Pinzón Trejos (Universidad Tecnológica de Panamá) This thesis is licensed under a CC BY-NC-SA International License (Cre- ative Commons AttributionNonCommercial-ShareAlike 4.0 International License). You are free to share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the ma- terial. The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. NonCommercial — You may not use the material for commercial purposes. ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Document by José Carlos Rangel Ortiz. A mis padres. A Tango, Tomás y Julia en su memoria.
    [Show full text]
  • Search, Analyse, Predict Image Spread on Twitter
    Image Search for Improved Law and Order: Search, Analyse, Predict image spread on Twitter Student Name: Sonal Goel IIIT-D-MTech-CS-GEN-16-MT14026 April, 2016 Indraprastha Institute of Information Technology New Delhi Thesis Committee Dr. Ponnurangam Kumaraguru (Chair) Dr. AV Subramanyam Dr. Samarth Bharadwaj Submitted in partial fulfillment of the requirements for the Degree of M.Tech. in Computer Science, in General Category ©2016 IIIT-D-MTech-CS-GEN-16-MT14026 All rights reserved Keywords: Image Search, Image Virality, Twitter, Law and Order, Police Certificate This is to certify that the thesis titled “Image Search for Improved Law and Order:Search, Analyse, Predict Image Spread on Twitter" submitted by Sonal Goel for the partial ful- fillment of the requirements for the degree of Master of Technology in Computer Science & Engineering is a record of the bonafide work carried out by her under our guidance and super- vision in the Security and Privacy group at Indraprastha Institute of Information Technology, Delhi. This work has not been submitted anywhere else for the reward of any other degree. Dr. Ponnurangam Kumarguru Indraprastha Institute of Information Technology, New Delhi Abstract Social media is often used to spread images that can instigate anger among people, hurt their religious, political, caste, and other sentiments, which in turn can create law and order situation in society. This results in the need for Law Enforcement Agencies (LEA) to inspect the spread of images related to such events on social media in real time. To help LEA analyse the image spread on microblogging websites like Twitter, we developed an Open Source Real Time Image Search System, where the user can give an image, and a supportive text related to image and the system finds the images that are similar to the input image along with their occurrences.
    [Show full text]
  • The Ai First Experience
    May 23, 2018 THE AI FIRST EXPERIENCE For Internal Use Only - Do Not Distribute CLARIFAI IS A MARKET LEADER IN VISUAL RECOGNITION AI TECHNOLOGY Proven, award-winning technology Team of computer vision experts $40M+ in venture to date Imagenet 2013 Winners Matt Zeiler, CEO & Founder Machine Learning PhD For Internal Use Only - Do Not Distribute COMPUTER VISION RESULTS 90% Accuracy 1980 2013 For Internal Use Only - Do Not Distribute AI 4 NEURAL NETWORK HISTORY 30x Speedup ● more data ● bigger models ● faster iterations 1980s 2009 Geoff Hinton Yann LeCun For Internal Use Only - Do Not Distribute CLARIFAI: A BRIEF HISTORY V2 API Brand Refresh Clarifai launches v2 API with Custom Clarifai launches brand refresh Training and Visual Search On Prem V1 API Series B Clarifai announces On Prem Clarifai launches v1 API Clarifai raises $30M series B funding for image recognition led by Menlo Ventures 2013 2015 2017 2014 2016 2018 Foundation Video Matt Zeiler wins top Clarifai announces video recognition SF Office ImageNet, starts Clarifai capability Clarifai announces opening of SF-based office Series A Clarifai receives $10M series A funding Mobile SDK led by Union Square Ventures Clarifai releases Mobile SDK in Beta Forevery Clarifai launches first consumer app, Forevery photo For Internal Use Only - Do Not Distribute AI 6 POWERFUL ARTIFICIAL INTELLIGENCE IN CLEAN APIS AND SDKS For Internal Use Only - Do Not Distribute AI 7 MODEL GALLERY DEMOS For Internal Use Only - Do Not Distribute AI 8 EASY TO USE API For Internal Use Only - Do Not Distribute
    [Show full text]
  • Reverse Image Search for Scientific Data Within and Beyond the Visible
    Journal, Vol. XXI, No. 1, 1-5, 2013 Additional note Reverse Image Search for Scientific Data within and beyond the Visible Spectrum Flavio H. D. Araujo1,2,3,4,a, Romuere R. V. Silva1,2,3,4,a, Fatima N. S. Medeiros3,Dilworth D. Parkinson2, Alexander Hexemer2, Claudia M. Carneiro5, Daniela M. Ushizima1,2,* Abstract The explosion in the rate, quality and diversity of image acquisition instruments has propelled the development of expert systems to organize and query image collections more efficiently. Recommendation systems that handle scientific images are rare, particularly if records lack metadata. This paper introduces new strategies to enable fast searches and image ranking from large pictorial datasets with or without labels. The main contribution is the development of pyCBIR, a deep neural network software to search scientific images by content. This tool exploits convolutional layers with locality sensitivity hashing for querying images across domains through a user-friendly interface. Our results report image searches over databases ranging from thousands to millions of samples. We test pyCBIR search capabilities using three convNets against four scientific datasets, including samples from cell microscopy, microtomography, atomic diffraction patterns, and materials photographs to demonstrate 95% accurate recommendations in most cases. Furthermore, all scientific data collections are released. Keywords Reverse image search — Content-based image retrieval — Scientific image recommendation — Convolutional neural network 1University of California, Berkeley, CA, USA 2Lawrence Berkeley National Laboratory, Berkeley, CA, USA 3Federal University of Ceara,´ Fortaleza, CE, Brazil 4Federal University of Piau´ı, Picos, PI, Brazil 5Federal University of Ouro Preto, Ouro Preto, MG, Brazil *Corresponding author: [email protected] aContributed equally to this work.
    [Show full text]
  • FOR IMMEDIATE RELEASE: Thursday, May 27, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Thursday, May 27, 2021 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai Adds Another Senior Advisor to its Public Sector Advisory Council to Better Serve its Government and Civilian Agency Customers Fort Lee, NJ, May 27, 2021 — Today, Clarifai announced that Jacqueline Tame, the former Acting Deputy Director of the DoD’s Joint Artificial Intelligence Center (JAIC), has Joined the company as a senior advisor and member of its Public Sector Advisory Council. Clarifai is a leading AI provider that specializes in Computer Vision, Natural Language Processing and Audio Recognition. Clarifai helps government and defense agencies transform unstructured image, video, text and audio data into structured data. At the JAIC, Ms. Tame oversaw day-to-day operations, advised DoD AI leadership and led engagements with the White House and Congress to raise awareness of DoD AI programs and shape policy. “Jacqueline Tame has very unique expertise that combines DoD technology policy with implementation of repeatable, outcome-based AI programs” said Dr. Matt Zeiler, Clarifai’s CEO. “She is one of the few people that has successfully worked with Congress to define and fund the US government’s AI policy. We are thrilled to partner with Jacqueline in re-envisioning how our government can work with best-in-class technology companies while harnessing innovation at home.” Ms. Tame has also served as a Senior Advisor to the Under Secretary of Defense for Intelligence and Security; Senior Staffer on the House Permanent Select Committee on Intelligence; Chief of Customer Engagement at the Defense Intelligence Agency; Advisor to the Chief of Naval Operations; and Policy Advisor to the Deputy Director of National Intelligence.
    [Show full text]
  • I Am Robot: (Deep) Learning to Break Semantic Image Captchas
    I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs Suphannee Sivakorn, Iasonas Polakis and Angelos D. Keromytis Department of Computer Science Columbia University, New York, USA fsuphannee, polakis, [email protected] Abstract—Since their inception, captchas have been widely accounts and posting of messages in popular services. Ac- used for preventing fraudsters from performing illicit actions. cording to reports, users solve over 200 million reCaptcha Nevertheless, economic incentives have resulted in an arms challenges every day [2]. However, it is widely accepted race, where fraudsters develop automated solvers and, in turn, that users consider captchas to be a nuisance, and may captcha services tweak their design to break the solvers. Recent require multiple attempts before passing a challenge. Even work, however, presented a generic attack that can be applied simple challenges deter a significant amount of users from to any text-based captcha scheme. Fittingly, Google recently visiting a website [3], [4]. Thus, it is imperative to render unveiled the latest version of reCaptcha. The goal of their new the process of solving a captcha challenge as effortless system is twofold; to minimize the effort for legitimate users, as possible for legitimate users, while remaining robust while requiring tasks that are more challenging to computers against automated solvers. Traditionally, automated solvers than text recognition. ReCaptcha is driven by an “advanced have been considered less lucrative than human solvers risk analysis system” that evaluates requests and selects the for underground markets [5], due to the quick response difficulty of the captcha that will be returned. Users may times exhibited by captcha services in tweaking their design.
    [Show full text]
  • Challenges and Modern Machine Learning-Based Approaches to Object Tracking in Videos
    Challenges and modern machine learning-based approaches to object tracking in videos Object Tracking Introduction Object tracking, also known as visual tracking, in a video, is one of the challenging tasks in machine vision due to appearance changes caused by variable illumination, occlusion, and variation changes (scale of objects, pose, articulation, etc.). Some of the complex problems in visual tracking is people tracking and traffic density estimation. The conventional algorithms developed by researchers are able to tackle only some of the issues in object tracking, however, they still do not provide a foolproof solution (Bombardelli, Gül, Becker, Schmidt, & Hellge, 2018). Note: It is assumed that readers of this article are well aware of ML techniques used in object detection and object classification. Video tracking aPPlications · Surveillance · Security · Traffic monitoring · Video communication · Robot vision · Accident prediction/detection Can we use object detection at every frame to track an object in a video? This is a very valid question, however, temporal information is a more complex task for non-static items. When things move, there are some occasions when the object disappears for a few frames. Object detection will stop at that point. If the image appears again, a lot of re-processing will be required to identify whether the object that appeared is the same object that was detected previously. The problem becomes more complex when there is more than one object in a video. Object detection is more focused on working with a single image at a time with no idea about motion that may have occurred in the past frame.
    [Show full text]
  • 2018 World AI Industry Development Blue Book 2018 World AI Industry Development Blue Book
    2018 World AI Industry Development Blue Book 2018 World AI Industry Development Blue Book Preface 3 As the driving force of a new round of technology Committed to taking the lead in China’s reform In-depth Report on the and opening-up and innovative developments, Development of the World AI and industrial revolution, artificial intelligence Industry, 2018 (AI) is wielding a profound impact on the world Shanghai has been working closely with economy, social progress and people’s daily life. related national ministries and commissions to Overview and Description Therefore, the 2018 World AI Conference will be jointly organize the 2018 World AI Conference. held in Shanghai from September 17 to 19 this year, Hosting this event becomes a priority in the Industrial Development jointly by the National Development and Reform process of Shanghai constructing the “five Environment Commission (NDRC), the Ministry of Science and centers” - international and economic center, Technology (MST), the Ministry of Industry and finance center, trade center, shipping center Technical Environment Information Technology (MIIT), the Cyberspace and technical innovation center, and building Administration of China (CAC), the Chinese the “four brands” of Shanghai - services, World AI Enterprises Academy of Sciences (CAS), the Chinese Academy manufacturing, shopping and culture. Also, of Engineering (CAE) and the Shanghai Municipal it is the practical measure to foster the reform World Investment and People’s Government, with the approval of the State and opening-up and optimize the business Financing Council, in order to accelerate the development environment in Shanghai. This meeting of a new generation of AI, complying with the provides the opportunity for Shanghai to Industrial Development development rules and grasping the future trends.
    [Show full text]
  • A Deep Learning Based CAPTCHA Solver for Vulnerability Assessment
    Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment Zahra Noury ∗, Mahdi Rezaei y Faculty of Computer and Electrical Engineering, Qazvin Azad University Faculty of Environment, Institute for Transport Studies, The University of Leeds ∗[email protected], [email protected] Abstract— CAPTCHA is a human-centred test to distinguish a human operator from bots, attacking programs, or other computerised agents that tries to imitate human intelligence. In this research, we investigate a way to crack visual CAPTCHA tests by an automated deep learning based solution. The goal of this research is to investigate the weaknesses and vulnerabilities of the CAPTCHA generator systems; hence, developing more robust CAPTCHAs, without taking the risks of manual try and fail efforts. We develop a Convolutional Neural Network called Deep- CAPTCHA to achieve this goal. The proposed platform is able to investigate both numerical and alphanumerical CAPTCHAs. To train and develop an efficient model, we have generated a dataset of 500,000 CAPTCHAs to train our model. In this paper, we present our customised deep neural network model, we review the research gaps, the existing challenges, and the solutions to cope with the issues. Our network’s cracking accuracy leads to a high rate of 98.94% and 98.31% for the numerical and the alpha-numerical test datasets, respectively. That means more works is required to develop robust CAPTCHAs, to be non- crackable against automated artificial agents. As the outcome of this research, we identify some efficient techniques to improve the security of the CAPTCHAs, based on the performance analysis conducted on the Deep-CAPTCHA model.
    [Show full text]
  • Clarifai Contact: [email protected]
    FOR IMMEDIATE RELEASE: Wednesday, October 28, 2020 From: Matthew Zeiler, Clarifai Contact: [email protected] https://www.clarifai.com Clarifai introduces a new suite of products at its inaugural Perceive 2020 conference NEW YORK, October 28, 2020 — Clarifai, a leading AI lifecycle platform provider for managing unstructured image, video, and text data, hosted its inaugural Perceive 2020 conference on October 21. The conference featured a rock star team of speakers from companies such as Facebook, Google, Samsung, and more. It also included 20 business, technical, panel, and use case sessions. “We hosted this free event to foster a community around AI, where businesses and public sector organizations can learn and collaborate and achieve their mission of accelerating their digital transformation with AI,” said Dr. Matt Zeiler, Clarifai’s CEO and Founder. Clarifai used the event to introduce a suite of new products that position them as the only complete end-to-end AI platform for unstructured data. “Our new Scribe product is a great addition to our end-to- end computer vision and NLP platform. We built it for scalable data labeling so that AI can really learn from accurately annotated datasets,” said Zeiler. In addition to Scribe, Clarifai launched Spacetime — a search tool that can identify when and where objects appear in images and video data. They also launched its Enlight suite for training, managing, and updating deep learning models. Enlight rapidly accelerates the model development process and helps users to iterate on their models over time. They can improve model accuracy and speed while preventing data drift.
    [Show full text]