INFUSE COMPUTER VISION INTO YOUR APPS BALANCING TECHNOLOGY, SKILLS AND INVESTMENT

SOFTWARE ARCHITECTURE CONFERENCE 2019 JAKARTA, 2-3 AUGUST 2019 Hello World! Interests • Core Banking Operations & Optimization • Card Payment & EMV Standards • UNIX System Programming • Performance Engineering • Deep Learning, Computer Vision

Nama saya: Favorite Tools gcc, g++, python, dbx, gdb, valgrind, gprof, Gito purify, make, tensorflow, darknet, vim, powerpoint

http://www.github.com/ngito http://www.slideshare.net/ngito/ COMPUTER VISION

3 1 2 3 4

5 6 7 8

9 10 11 12 1 2 3 4

5 6 7 8

9 10 11 125 https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/ 6 Estimated: 68 PEOPLE

7 Computer Vision in Public Sector Computer Vision

Making computers gain high-level understanding from digital images or videos.

It tries to achieve human visual system can do.

9 Computer Vision Applications

Traffic Monitor & Activity Face Recognition Machine Vision Self Driving Car Enforcement Recognition

Computer Vision Applications | 3D reconstruction from multiple images | 3D | Artificial intelligence for video surveillance | Audio-visual speech recognition | | Augmented reality-assisted surgery | Automated Lip Reading | Automated optical inspection | Automatic image annotation | Automatic number-plate recognition | Automatic target recognition | Check weigher | Closed-circuit television | Computer stereo vision | Content-based image retrieval | Contextual image classification | DARPA LAGR Program | Deepfake | Digital video fingerprinting | Document mosaicing | Fingerprint recognition | Free viewpoint television | Fyuse | GazoPa | Geometric feature learning | Gesture recognition | Image collection exploration | Image retrieval | Image-based modeling and rendering | Intelligent character recognition | Iris recognition | Machine translation of sign languages | Machine vision | Mobile mapping | Morphing | Object Co-segmentation | Object detection | Optical braille recognition | Optical character recognition | Pedestrian detection | People counter | Physical computing | Positional tracking | Red light camera | Reverse image search | Scale-invariant feature operator | Smart camera | This Person Does Not Exist | Traffic enforcement camera | Traffic-sign recognition | Vehicle infrastructure integration | Velocity Moments | Video content analysis | View synthesis | Applications of virtual reality | Visual sensor network | Visual temporal attention | Visual Word | Water remote sensing 10| Computer Vision Subsystems

Image We will discuss further on Image Classification & Object Detection enhancement

Transformations

Filtering

Visual recognition

Pose estimation

Color vision

Registration

Feature extraction 11 AI vs vs Deep Learning TERMINOLOGIES

If it is written in Python, It’s probably machine learning

If it is written in PowerPoint, https://twitter.com/matvelloso/status/1065778379612282885?lang=enIt’s probably AI Definitions 1. Science and engineering of making intelligent machines and computer programs AI 2. There are many definitions of AI as of now but as a philosophy, AI is defined as future vision that is unattainable to ensure continuous improvement in multiple disciplines

1. Technique for realizing AI Machine 2. Enable machines to learn using the provided data and make accurate Learning predictions 3. Implemented in multiple algorithms to solve different problems

1. Subset and next evolution of (Supervised) Machine Learning Deep 2. Inspired by the patterns processing found in the human brain Learning 3. Implemented in Neural Network to solve different problems 4. Triggers new chip design to handle Deep Learning workload

https://plato.stanford.edu/entries/artificial-intelligence/ http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html 13 AI Evolution

https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon 14 AI General Categories Non-exhaustive list Mechanical Knowledge Base Speech Vision Language Electrical Inference Engine Speech to Text Classification Classification Control Reasoning Text to Speech Object Detection Extraction Kinematics … Instance Understanding Motion Segmentation Translation Face Recognition … …

Machine Learning Supervised Learning Unsupervised Learning Expert Reinforced Learning Deep Learning Robotics Convolutional Neural Network Systems Generative Adversarial Network Recurrent Neural Network …

15 Machine Learning & Deep Learning Characteristics Non-exhaustive list Machine Learning Deep Learning

Unstructured & Complex Structure Data Source Tabular (Image, Video, Speech, Text)

Number of Features Dozens - Hundreds Millions

Feature Engineering By domain expert Automated with Feature Extraction

Hardware General purpose CPU Custom chip (GPU, FPGA, ASIC)

Training Time Minutes - Hours Hours - Weeks

Technique Multiple ML Algorithm Neural Network Architecture https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063 16 Major Shift to Deep Learning Non-exhaustive list Hardware advancements, after GPU being used as Deep Learning accelerator after 2012, has trigger and accelerate conversion some applications from Machine Learning to Deep Learning. Machine Learning Deep Learning

1. Viola Jones 1. Region Proposals (R-CNN & Fast R-CNN) 2. Scale Invariant Feature Transform (SIFT) 2. Single Shot Multibox Detector (SSD) 3. Histogram of Oriented Gradients (HOG) 3. You Only Look Once (YOLO) Vision 4. … 4. …

1. Speech Synthesis TTS 1. WaveNet TTS 2. Hidden Markov Model based STT 2. DeepSpeech STT

3. … 3. … Speech

1. Statistical Machine Translation (SMT) 1. Neural Machine Translation (NMT)

2. … 2. … Translate 17 https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e 18 Deep Learning Influencer Non-exhaustive list

Yann LeCun, Li Fei Fei Chief Scientist Stanford, ImageNet Database LeNet, first neural network to

Andre Ng Alex Krizhevsky Joseph Redmon Stanford, Baidu, Google YOLO Real-time Google AlexNet first neural Object Detection Brain behind network runs on GPU Google Brain

19 Deep Learning Revolution ImageNet (ILSVRC) Competition Winner 2012, AlexNet (Object Classification) First Convolutional Neural Network 160 100.00% 152 (CNN) implementation utilizing GPU and 90.00% 140 won object classification ImageNet Deeper Neural Network 80.00% competition. 120 Layers 70.00%

100 Neural Network 60.00%

80 50.00%

Lower Error Rates Error Error Rate 40.00%

Number Number Layers of 60

30.00% 40 25.81% 20.00%

20 15.30% 14.80% 22 10.00% 6.70% Number of layers 8 8 3.60% 1. More layers 0 0 0.00% 2. More training data XRCE AlexNet ZFNet GoogleNet ResNet 3. More computing power (2011) (2012) (2013) (2014) (2015) 4. More accuracy 20 Why CPU IS NOT ENOUGH CPUs are designed for a wide variety of applications and to provide fast response times to a single task.

GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism.

Processor CPU, Intel Xeon Gold GPU, Nvidia Tesla V100 5120 (FP32), Cores 22 640 (Tensor Core) Transistor 14 nm 12 nm size V100 RAM Type GDDR4 GDDR6 Few complex cores for general Thousands simpler cores for parallel Strength processing processing Deep Learning Triggers new Chip Design

22 INFUSING COMPUTER VISION INTO YOUR APPS

23 Computer Vision runs on top of Deep Learning Stack Non-exhaustive list General Software Stack Deep Learning Stack

Applications Web Application Computer Vision, Speech, Language

Keras, TensorFlow, Caffe, PyTorch, CNTK Runtime Java, .NET, Python, PHP, NodeJS + Model

Middleware Application Server Accelerator Library (NVIDIA CUDA, Intel OpenVino)

OS Linux / Windows Accelerator Driver (NVIDIA Driver/Intel FPGA Driver)

Virtualization Hypervisor / Container Accelerator Virtualization (NVIDIA/AMD Virtual GPU, Intel Virtual FPGA)

Infrastructure CPU, RAM, Storage, Network Accelerator (NVIDIA/AMD GPU, Intel FPGA)

Software stack Illustration when adding Deep Learning Capability to any General Software, On Premise & IaaS24 Computer Vision High Level Integration Highly simplified Enterprise Application Computer Vision System Owner : Development Team Owner : Data Science Team Process : Software Development Lifecycle Process : Machine Learning Lifecycle (Waterfall, Agile, other) (OSEMN, CRISP-DM, TDSP) Type : Web App, Mobile App, other Training & Test Dataset

Trained Your Applications Training Engine Model

API Image / Video Inference Engine Computer Vision Life Cycle Training at Server 1 Neural Network Training Training & Test Initial Weight Trained Model Model Parameters Data

Public Data Set Image Classification COCO GoogleNet Training Parameters ImageNet ResNet Learning Rate Pascal VOC Object Detection Epoch R-CNN Based Batch size Research Dataset SSD YOLO Intersection of Union Average Loss Custom Data Set Object Segmentation Mean Avg Precision Image Collection Masked R-CNN Privately Labelled

Inference at Server / Edge 2 Neural Network Inference Input Image / Trained Model Predictions Model Parameter Video

Deep Learning Framework

26 Technology BALANCING TECHNOLOGY, SKILLS

AND Business Investment Value INVESTMENT Skill Identify Business Impact Highly simplified 1. Understand business impact associated with economic dollar values (example: cost saving, revenue increase) 2. Business investment must consider several key capability the company is planning to invest, such as IT Infrastructure, IT Skill and time to market Business Impact Economic Value Example Example

Drive Sales Sales Increase 25% YoY

Increase Customer Loyalty Higher repeat order by 15%

Improve Customer Experience Longer engagement

Improve Operational Reduce operational cost by 17% Effectiveness

28 Building Computer Vision Capability Considerations

Rent Buy 1. Rent deep learning infrastructure from Cloud 1. Buy infrastructure for on-premise deployment Provider Infrastructure 2. Maintain hardware infrastructure & deep learning 2. Choose between deep learning IaaS (full control software stack over deep learning infrastructure) or PaaS (partial control up to model & datasets)

Custom Existing 1. Use existing image data sets from community/cloud 1. Collect / create image data sets Model & Dataset provider 2. Create custom model based on custom data sets 2. Use existing model from community/cloud provider

Data Science & Developer Team Developer Team Only 1. Build full data science team in house to support 1. No plan to build data science team, since current or future current or future data science capability requirement don’t justify to have data science team 2. Train developer team to use deep learning software up to Skill 2. Data science and Developer team work together to API integration, or use deep learning software that has been integrate deep learning software with enterprise pre-configured specifically for user without prior data applications science knowledge

30 Computer Vision as a Service Non-exhaustive list Available options between technology stack, investment, skills, time to market Rent IaaS / Rent PaaS Rent PaaS Skills Buy On Premise Training & Inference Inference

API Integration, Rest API / Stream API Rest API / Stream API Rest API / Stream API API REST API

Image Labeling. Training Data Custom Image Collection Custom Image Collection Provided Human or Automated

Label Tools + Neural Network labelImg, OpenLabeling Web based N/A Training Tools Training Parameters Network Design, build or R-CNN, SSD, YOLO R-CNN, SSD, YOLO R-CNN, SSD, YOLO choose the right Model Neural Network

Deep Learning Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Develop custom Runtime PyTorch, CNTK PyTorch, CNTK PyTorch, CNTK deep learning code

Higher investment cost Lower investment Client Managed Higher barrier to entry Lower barrier to entry Higher CV skills Lower CV skills Cloud Provider Managed RIch Inference Capability Limited Inference Capability Total control to training data No/Less control to training data 31 Computer Vision as a Service Non-exhaustive list Available options between technology stack, investment, skills, time to market Rent IaaS / Rent PaaS Rent PaaS Skills Buy On Premise Training & Inference Inference

API Integration, Rest API / Stream API Rest API / Stream API Rest API / Stream API API Deep aws aws REST API Learning Rekognition Vision Image Labeling. Training Data Custom ImagePC Collection Custom Imageazure Collection Providedazure Human or Automated Custom-vision Vision Label Tools + Neural Network labelImg, OpenLabeling Web based N/A Training Tools Google Google Training Parameters Deep Auto-ML Cloud Vision Network Design, build or RLearning-CNN, SSD, YOLO R-CNN, SSD, YOLO R-CNN, SSD, YOLO choose the right Model IBM Watson IBM Visual Neural Network Server Studio Recognition Deep Learning Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Develop custom Runtime PyTorch, CNTK PyTorch, CNTK PyTorch, CNTK deep learning code

Higher investment cost Lower investment Client Managed Higher barrier to entry Lower barrier to entry Higher CV skills Lower CV skills Cloud Provider Managed RIch Inference Capability Limited Inference Capability Total control to training data No/Less control to training data 32 Build your own CV Enabled Apps … Learn the Basics … Stanford University Build your dataset CS231n: Convolutional Neural Networks for 1. Use public dataset Visual Recognition 2. Build image collection 3. Label image Find your cool use cases 4. Augment your Image 1. Image Search 2. Video Analytics … Train 3. Pet Counter 4. Rat Trap 5. Parking Detection and 6. Match making 7. … and more Integrate…

33 DEMO! OBJECT DETECTION POSE ESTIMATION

34 Q&A

35 TERIMA KASIH