<<

Machine Vision and Industrial Accelerating Adoption in Industrial Automation Jayavardhana Gubbi Jayavardhana P Balamuralidhar

Machine vision has harnessed admirably. Disruptive solutions are in the market for security and surveillance, autonomous vehicles, industrial inspection, digital twins, and augmented reality (AR). However, historically, data acquisition within has been hamstrung by manual interpretation of captured image data. Fortunately though, deep learning has generated data-driven models that could replace handcrafted preprocessing and feature extraction superbly. IN BRIEF The future has already arrived. TCS has, for instance, not only delivered revolutionary deep learning-powered machine vision solutions, but is also using advanced software algorithms to design industry-grade spatial cognition-enabled computer vision applications.

A field established in the 1960s, Machine vision today it’s only in the past 30 months or so that the machine vision Key technological developments in industry has harnessed deep miniaturized cameras, embedded learning—with extremely devices, and good results. Poised where AI, have historically enabled adoption imaging sensors, and image of machine vision in a multitude processing meet, machine vision of applications—satellite image is now making a massive impact in analysis and medical image industrial applications. interpretation, among others— across diverse domains. This has brought disruptive industry-grade solutions to the But the game-changer today is market. Table 11 sets out the key the power and affordability of industry sectors in which computer computing devices, including vision has made inroads. the advent of digital and mobile cameras. This has resulted in image The key application trends processing spreading to other bringing machine vision to the fore areas and making humongous include security and surveillance, volumes of real (as against autonomous vehicles, industrial synthetic) visual data available inspection, digital twins, and AR. for testing.

Reimagining Research 2018

Data_Section_TCS-Checked0315.indd 26 29/01/1941 Saka 15:27 Fact File

TCS Research: Machine Vision Outcomes: Vision for Drones, Warehouse Inventory Management, Insurance and Utilities use cases, Forestry, xVision Principal Investigator: Jayavardhana Gubbi Academic Partners: IISc, Bengaluru, IIIT Hyderabad Techniques used: Machine Learning, Computer Vision, and Industries benefited: Multiple industries where visual inspection and automation are an integral part Patents: 27 Papers: 28 publications

Maturity of Current Level of Market Example Size of Solutions Market Sophistication Segment Applications Market and API Uptake Needed Availability

Augmented reality (AR) Extremely Early-stage Consumer and virtual reality (VR) Improving Low large availability retail analytics

Security and Perimeter monitoring, Mature and Large High Medium surveillance theft detection available

Strategy in Early-stage Entertainment sports, intelligent Small, niche Low Medium availability advertisements Early disease detection, tumor detection, Medium, Medical Mature Medium High tracking, and gated niche therapy Advanced driver assistance systems Automotive (ADAS), lane detection, Very large Mature Low Very high and autonomous driving

Navigation and Very large, Early-stage Robotics Low Very high perception of the world niche availability

Table 1: Major sectors contributing to the computer vision boom

However, in most applications, the environmental conditions such as data acquisition process is saddled lighting or illumination, contrast, with manual interpretation of the quality, resolution, noise, scale, captured image data. Variations in pose (of the object), and occlusion

Machine Vision and Industrial Automation

Data_Section_TCS-Checked0315.indd 27 29/01/1941 Saka 15:27 (blocking of part of the field of But the good news is that the view) necessitate human-based emergence of deep learning The good news is preprocessing and handcrafting has resulted in generating data- for shadow removal, illumination driven models that could replace that the emergence normalization, and so on. These are handcrafting operations, such of deep learning has among the most time-consuming as preprocessing and feature resulted in generating tasks in enabling the machine extraction, with extremely good to make holistic sense of the results. Further, the deep learning datadriven models collected data. approach exploits an emerging that could replace area within AI—algorithm This limitation becomes severe development for unsupervised handcrafting when one considers that modern machine learning—called operations. digital cameras, generating vast generative adversarial networks amounts of still and video-image (GANs), which allow data synthesis data—together with wireless to cater to variations, while technology—which enables their enabling the design of more robust easy deployment, makes imperative algorithms.2 These allow reuse of the processing of zettabytes (trillion models in different applications, gigabytes) of data. (In fact, this is through transfer learning. the order of magnitude of the visual data already being collected by the Let us look at some technologies (IoT) devices in businesses are employing today: smart cities!) Pervasive, XR (Augmented Reality, Virtual Scale-, pose-, and occlusion- Reality, Mixed Reality) and domain invariant image processing are still based digital platforms. In these, in their infancy. Object detection more than 50% of the world is still the first phase in image involves computer vision as a processing, where an image is first fundamental horizontal step. For preprocessed, followed by feature instance, autonomous vehicles extraction (identification and and drones under AI Everywhere; tagging of image features, such as Augmented reality and human the shape, size, and color of areas augmentation in XR; and Digital within it) and evaluation. These twin in digital platforms use evaluated features are then used in computer vision at some stage a machine learning framework for of processing. Further, these obtaining the final result. In addition technologies are spread in near to classification, other necessary term as well as long term reflecting image processing operations, the complexity of task that will have such as image registration, image to be addressed in machine vision. segmentation, and image stitching, are also addressed. For identification of non-rigid objects, deformable models are normally employed.

Reimagining Research 2018

Data_Section_TCS-Checked0315.indd 28 29/01/1941 Saka 15:27 Using drones for warehouse inventory

monitoring In million+ Stock Keeping Unit (SKU) warehouses taking an inventory count is an elaborate manual affair, where workers need to climb tall ladders to check labels. TCS has built a drone-based solution, which has been carrying out inventory counting at large (500,000-1,000,000 SKU) warehouses worldwide.

Designed and implemented by the Machine Vision Group and the Drones Incubation team at TCS Research, the solution programs commercial off-the-shelf drones to autonomously navigate 50-100 racks, each 20-50 ft. tall, across 400-500 ft. aisles.

The total time taken for a drone to complete such a mission is around 20 minutes. Inventory counting is over 98% accurate, which matches human accuracy but gets the job done more consistently and much faster.

Minimal manual intervention is required from a human inventory auditor, who just taps a tablet touchscreen for the drone to take off, fly, hover, record, report, and land. With a few more taps, the auditor specifies the number of racks and shelves to cover for a given mission. Each tablet app controls multiple drones simultaneously.

On mission completion, the drone finds its way back to its base station, lands safely, and powers off.

Each drone is programmed to make sense of the dimensions, floor plan, and layout of the warehouse. The drones use this data to calculate mission range. Inbuilt fail-safe features include obstacle avoidance (collision prevention with humans, inventory, or warehouse structural projections), gyrostabilization (when subjected to unexpected drafts of air or to rotor malfunction), nearest permissible emergency safe landing location determination, and systematic shutdown without data loss.

On mission completion, the inventory data, hosted on a private cloud, becomes immediately retrievable. Integrating the drone-based solution with the warehouse management system—a one-time, 4-person-hour investment—enables report generation on the fly, with full data visibility.

The architecture of a The cameras must be capable machine vision system of recording across the entire electromagnetic spectrum (the A modern-day computer vision visible 380–760 nm/400–790 system comprises five modular terahertz and nonvisible 1 architectural elements (Figure 1). picometer–1,000 km/3 Hz–1 million terahertz range). Thus, Image acquisition depending on the application, This constitutes cameras mounted image acquisition may contain on drones, mobile phones, and LIDAR, hyperspectral, multispectral, other devices that map real- thermal, and ultraviolet cameras, world 3D data to a 2D matrix. apart from visual cameras.

Machine Vision and Industrial Automation

Data_Section_TCS-Checked0315.indd 29 29/01/1941 Saka 15:27 IMAGE ACQUISITION DATA STORE IMAGE ENHANCEMENT SCENE INTERPRETATION UNDERSTANDING AND VISUALIZATION

Visual Filtering Multispectral Illumination correction Damage detection Hyperspectral Shadow removal Change detection Thermal Super-resolution Semantic scene understanding

Centralised Cloud Object detection Local Store Object identi cation Edge Store Object Relationships Data agnostic Edge analytics

Figure 1: A bird’s-eye view of the modules in a Machine Vision system

Data storage and retrieval into 3D (the data transformation block in the module), for better A data-agnostic still- and video- interpretation and automatic image store is the next critical navigation, also forms a part of element. It should be capable of this module. This will be a key in storing and archiving multimodal creating 3D digital twins as well. data (data captured via multiple, simultaneous imaging techniques), Scene understanding and the corresponding metadata, available from the data acquisition The heart of the computer vision devices. The store should also system is the analytics engine. Its house algorithms for content- and object-level analytics, which form location-based image retrieval at the core analytics block, should runtime, because machine vision include multiple approaches for systems typically require identifying segmentation, object detection the 3D and temporal coordinates (using feature extraction and deep of a given event. Therefore, learning algorithms to recognize tools that help data annotation objects), localization (identifying through automation or human the x, y, and z coordinates of all crowdsourcing (similar to Google the pixels comprising an object), Maps users labeling places) make super-pixel analysis (super-pixels the data store a complete platform. are larger-than-pixel elements), and object relationships. Image enhancement The output is domain-independent, uniform scene representation Low-level image processing for which graph theory provides operations (Photoshop, if abundant resources. you will)—filtering, contrast enhancement, illumination Data interpretation normalization, and more—are and visualization essential for further processing. Super-resolution image formation Eventually, all the data should be (increasing the resolution of provided to the user in a usable the image) and other highly form. Therefore, irrespective of the sophisticated image processing application, effective information algorithms are in this module. The design algorithms with standard module should also condition the inferencing (decision-making) multimodal data inputted from and semantic capability, including the previous module, making domain-dependent ontology, it consumable by analytics should become part of this final frameworks. Geometric computer module. The module includes vision that can convert 2D images higher level intelligence—change

Reimagining Research 2018

Data_Section_TCS-Checked0315.indd 30 29/01/1941 Saka 15:27 detection in scenes, diagnosis of • Building an AR platform and faults in physical infrastructure, enterprise system integration such as cracks, and visual metrology for visualizing the results and (precise measurement of digital actuating the next process image elements for decision- making). AR- and VR-enabled Thus, for spatial cognition-enabled visualizations significantly raise industrial computer vision effectiveness. application development, advanced software algorithm research is underway along three themes: The outlook on industrial machine Object detection and vision solutions identification. Enabling the drone to understand Industrial machine vision focuses on its environment is the key in niche applications for developing autonomous navigation. The input algorithms and systems for visual images should be conditioned interpretation. For advanced for automatic self-enhancement. industrial applications, spatial For this, machine learning–based intelligence is the key. It helps object detection and identification, understand the spatial and visual attention-based approaches, temporal properties of objects in novel feature development, deep the real world, providing better learning–based methods, shape insight into events and object analysis, image enhancement, behavior. background suppression, object Consider the example of inventory tracking, event detection, monitoring using drones in super-resolution imaging, and warehouses (see box). Indoor compressive sensing (an efficient warehouses are GPS-denied signal processing technique) must environments that usually have be addressed. This research is fairly poor lighting. (GPS works only advanced. with clear lines of sight between satellite/transmitter and object/ Robotic navigation receiver.) This environmental and semantic SLAM. limitation is addressed through the In GPS-denied environments, visual navigation and object-of- visual (or semantic) Simultaneous interest identification features of the Localization and Mapping (SLAM), algorithms embedded in the drone using monocular cameras, hardware. vision-based obstacle avoidance, structure from motion (estimating Most industrial machine vision an object’s 3D structure in a 2D challenges are addressed in three scene by observing incremental stages: changes during its movement), • Detecting objects and events multiview analysis, and 3D in the surroundings with scene reconstruction, addresses location data navigation. These benefit most autonomous robotic inspection • Using reasoning to build self- and monitoring. The uptake of this awareness (the ability of the technology is in the medium term. drone to know its coordinates and those of objects around Cognitive architecture for it and, consequently, how contextual scene understanding. to behave) and situation A holistic architecture based on assessment cognitive principles is currently

Machine Vision and Industrial Automation

Data_Section_TCS-Checked0315.indd 31 29/01/1941 Saka 15:27 Vision Technology Area Example Scenarios Time for Uptake Involved

Fingerprint and iris Business and Detection, recognition, face Short and medium term Financial Services identification identification

Energy and Pipeline monitoring, Navigation, detection Medium and long term Resources rust identification

Detection, Surveillance, urban change Government identification, Short and medium term detection cognitive, tracking

Gaze tracking and emotion Detection, Hi-Tech recognition in consumer identification, Medium term electronics tracking

Abnormality detection, Detection identification, microscope imaging, Healthcare tracking, navigation, Medium and long term Medical imaging detection, cognitive infrastructure monitoring

Roof damage Navigation, Short, medium, and Insurance detection, structural detection, change long term changes detection

Detection, Life Microarray analysis, AR identification, data Medium term augmentation

Thermal data Manufacturing Defect detection Medium term analysis, change detection

Communication, Content retrieval, data Detection, segmentation, Media and Medium term gathering cognitive, navigation Information services

Planogram Detection, Short and Retail analysis, buyer identification, medium term sentiment analysis cognitive, navigation

GPU performance Detection, Short and Technology BU assessment identification medium term

Travel, Autonomous AR, detection, tracking, Short and Transportation and navigation, ADAS identification medium term Hospitality Navigation, Infrastructure Utilities identification, cognitive, Medium and long term damage detection change detection

Table 2: Ongoing and Potential Work on Machine Vision at TCS

Reimagining Research 2018

Data_Section_TCS-Checked0315.indd 32 29/01/1941 Saka 15:27 lacking in visual cognition. This its application viability, cutting- theme therefore focuses on edge research is underway in the better understanding of spatial following areas: properties of objects and events in the world through a novel • Capsule networks and cognitive architecture. Higher- hyperspectral modeling, level abstraction will be needed which may have some answers for computational photography, for non-uniform, anomalous semantic video representation, small-object detection—for affective computing, visual quality example, in case of power lines, assessment and quality-of- cracks, and rust in physical 4 experience management. A ‘first installations principles’ approach to building • Shared representation visual cognitive systems can achieve learning for combining data this. The uptake of this technology from multiple sensors spatially is long term. • Generative Adversarial The levels of granularity and Networks (GANs), to generate accuracy—and, consequently, the synthetic data that ensures algorithmic sophistication—vary deep networks are not fooled,5 across industries. In general, most and help in generating human- robotics applications covering like robust models large areas need GPS navigation, with the option for visual data- • Explainable AI, continuing based augmentation using visual the research on class information. Thus, detection and activation maps, which will identification are short-term help understand the workings needs for many applications of deep learning models,6 across domains, although, in the enabling better outcomes long term, building human-like reasoning into robots is the goal. • Graphs within learning frameworks for better higher level knowledge representation Future directions and addressing challenges with Deep learning can Computer vision and AI (and, spatiotemporal data such as sidestep the entire consequently, machine learning videos, which could combine geometric computer vision with feature learning and emerging deep learning approaches) are closely deep learning, enabling better process in computer intertwined with advancements scene understanding, making vision. in machine vision and the cloud. it possible to create robots that However, the future of industrial can explore their environments computer vision applications lies in • Model compression and private clouds, which are expected parameter pruning for to mature. implementing these systems on Another area that will open up hardware, and picking the right diverse application scenarios is data to enable processing at the embedded implementation of edge, as these will fuel uptake algorithms on general-purpose of the emerging technology in graphics processing units and field- multiple domains programmable gate arrays. • Cognitive active vision, Deep learning3 can sidestep the rather than merely rule-based entire feature learning process vision 7 (i.e., when computer in computer vision. To increase vision systems try to get more

Machine Vision and Industrial Automation

Data_Section_TCS-Checked0315.indd 33 29/01/1941 Saka 15:27 contextual information from Machine vision has highly complex the surroundings; for instance, tasks to address. But, once the by making subtle movements), solutions for the above technical which will make cameras challenges are available, many autonomous and enable truly new categories of applications autonomous robots will emerge. Implementation of these advanced and complex • Semantic scene algorithms on embedded devices understanding and will enable faster decision-making. 8 reasoning to take computer Significant efforts continue vision closer to natural worldwide on standardization, language processing (NLP) and with intellectual property and realize the tremendous qualification being aggressively potential of human-robot pursued, we look forward to interaction continually accelerated adoption • Cross-application working of the technology in consumer pipeline creation for and industrial automation collaboration in domains, applications. which will steer the current The future has a curious way of paradigm toward cognitive becoming the present before we computing, enabling know it. As the saying goes, the one autonomous navigation and who says it can’t be done shouldn’t decision-making, which are interrupt the one doing it. imperative in robotics

1 Computer Vision Technologies and Markets, Tractica, https://www.tractica.com/research/computer-vision-technologies-and-markets/, 2015. 2 Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron C. Courville, YoshuaBengio, Generative Adversarial Nets. NIPS 2014: 2672–2680, 2014. 3 Bengio, Yoshua; LeCun, Yann; Hinton, Geoffrey (2015). Deep Learning, Nature. 521: 436–444. 4 Sabour, Sara and Frosst, Nicholas and Hinton, Geoffrey E, Dynamic Routing Between Capsules, NIPS 2017, 3859–3869, 2017. 5 Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron C. Courville, YoshuaBengio, Generative Adversarial Nets. NIPS 2014: 2672–2680, 2014. 6 Zhou, Bolei, Aditya Khosla, AgataLapedriza, AudeOliva, and Antonio Torralba. Learning deep features for discriminative localization, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929. 2016. 7 Aloimonos, John, Isaac Weiss, and Amit Bandyopadhyay. Active vision, International journal of computer vision 1, no. 4: 333–356, 1988. 8 A Garcia-Garcia, S Orts-Escolano, S Oprea, V Villena-Martinez, A Review on Deep Learning Techniques Applied to Semantic Segmentation, arXiv preprint arXiv:1704.06857, 2017.

Reimagining Research 2018

Data_Section_TCS-Checked0315.indd 34 29/01/1941 Saka 15:27 Jayavardhana Gubbi

Jayavardhana Gubbi is a Senior Scientist at TCS Research and Innovation, coordinating the Machine Vision group, and a research lead for the Drones Innovation Program. His focus areas are deep learning and applied research involving developing a workbench that enables rapid development and deployment of machine vision for multiple industry verticals. He has a Bachelor of from Bangalore University (2000) and a Ph.D. from the University of Melbourne, Australia (2007). He has coauthored more than 75 peer-reviewed papers and filed patents. During his spare time, Jay likes travelling and reading, particularly reading Kannada literature.

Balamuralidhar P

Balamuralidhar P is a Principal Scientist at TCS Research and Innovation and Head of TCS Innovation Lab, Bangalore. The major areas of his current research include different aspects of Cyber Physical Systems, Sensor Informatics & Internet of Things, Cognitive Computer Vision, Mobile Interactive Sensing and Model Driven Security. He has over 100 publications in various international journals and conferences and over 40 patent filings, out of which 20 have been granted. He is a senior member of the IEEE.

He obtained his Bachelor of Technology from Kerala University and his Master of Technology (MTech) from IIT Kanpur. His Ph.D. is from Aalborg University, Denmark, in the area of Cognitive Wireless Networks.

Machine Vision and Industrial Automation

Data_Section_TCS-Checked0315.indd 35 29/01/1941 Saka 15:27 All content / information in Machine Vision and Industrial Automation is the exclusive property of Tata Consultancy Services Limited (TCS) and/or its licensors. This publication is made available to you for your personal, non-commercial use for reference purposes only; any other use of the work is strictly prohibited. Except as permitted under the Copyright law, this publication or any part or portion thereof may not be copied, modified, adapted, translated, reproduced, republished, uploaded, transmitted, posted, created as derivative work, sold, distributed or communicated in any form or by any means without prior written permission from TCS. Unauthorized use of the content/information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

TCS attempts to be as accurate as possible in providing information and insights through this publication, however, TCS and its licensors do not warrant that the content/information of this publication, including any information that can be accessed via QR codes, links, references or otherwise is accurate, adequate, complete, reliable, current, or error-free and expressly disclaim any warranty, express or implied, including but not limited to implied warranties of merchantability or fitness for a particular purpose. In no event shall TCS and/or its licensors be liable for any direct, indirect, punitive, incidental, special, consequential damages or any damages whatsoever including, without limitation, damages for loss of use, data or profits, arising out of or in any way connected with the use of this publication or any information contained herein.

©2019 Tata Consultancy Services Limited. All Rights Reserved.

Tata Consultancy Services (name and logo), TCS (name and logo), and related trade dress used in this publication are the trademarks or registered trademarks of TCS and its affiliates in India and other countries and may not be used without express written consent of TCS. All other trademarks used in this publication are property of their respective owners and are not associated with any of TCS’ products or services. Rather than put a trademark or registered trademark symbol after every occurrence of a trademarked name, names are used in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark.