INFUSE COMPUTER VISION INTO YOUR APPS BALANCING TECHNOLOGY, SKILLS AND INVESTMENT
SOFTWARE ARCHITECTURE CONFERENCE 2019 JAKARTA, 2-3 AUGUST 2019 Hello World! Interests • Core Banking Operations & Optimization • Card Payment & EMV Standards • UNIX System Programming • Performance Engineering • Deep Learning, Computer Vision
Nama saya: Favorite Tools gcc, g++, python, dbx, gdb, valgrind, gprof, Gito purify, make, tensorflow, darknet, vim, powerpoint
http://www.github.com/ngito http://www.slideshare.net/ngito/ COMPUTER VISION
3 1 2 3 4
5 6 7 8
9 10 11 12 1 2 3 4
5 6 7 8
9 10 11 125 https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/ 6 Estimated: 68 PEOPLE
7 Computer Vision in Public Sector Computer Vision
Making computers gain high-level understanding from digital images or videos.
It tries to achieve human visual system can do.
9 Computer Vision Applications
Traffic Monitor & Activity Face Recognition Machine Vision Self Driving Car Enforcement Recognition
Computer Vision Applications | 3D reconstruction from multiple images | 3D selfie | Artificial intelligence for video surveillance | Audio-visual speech recognition | Augmented reality | Augmented reality-assisted surgery | Automated Lip Reading | Automated optical inspection | Automatic image annotation | Automatic number-plate recognition | Automatic target recognition | Check weigher | Closed-circuit television | Computer stereo vision | Content-based image retrieval | Contextual image classification | DARPA LAGR Program | Deepfake | Digital video fingerprinting | Document mosaicing | Fingerprint recognition | Free viewpoint television | Fyuse | GazoPa | Geometric feature learning | Gesture recognition | Image collection exploration | Image retrieval | Image-based modeling and rendering | Intelligent character recognition | Iris recognition | Machine translation of sign languages | Machine vision | Mobile mapping | Morphing | Object Co-segmentation | Object detection | Optical braille recognition | Optical character recognition | Pedestrian detection | People counter | Physical computing | Positional tracking | Red light camera | Reverse image search | Scale-invariant feature operator | Smart camera | This Person Does Not Exist | Traffic enforcement camera | Traffic-sign recognition | Vehicle infrastructure integration | Velocity Moments | Video content analysis | View synthesis | Applications of virtual reality | Visual sensor network | Visual temporal attention | Visual Word | Water remote sensing 10| Computer Vision Subsystems
Image We will discuss further on Image Classification & Object Detection enhancement
Transformations
Filtering
Visual recognition
Pose estimation
Color vision
Registration
Feature extraction 11 AI vs Machine Learning vs Deep Learning TERMINOLOGIES
If it is written in Python, It’s probably machine learning
If it is written in PowerPoint, https://twitter.com/matvelloso/status/1065778379612282885?lang=enIt’s probably AI Definitions 1. Science and engineering of making intelligent machines and computer programs AI 2. There are many definitions of AI as of now but as a philosophy, AI is defined as future vision that is unattainable to ensure continuous improvement in multiple disciplines
1. Technique for realizing AI Machine 2. Enable machines to learn using the provided data and make accurate Learning predictions 3. Implemented in multiple algorithms to solve different problems
1. Subset and next evolution of (Supervised) Machine Learning Deep 2. Inspired by the patterns processing found in the human brain Learning 3. Implemented in Neural Network to solve different problems 4. Triggers new chip design to handle Deep Learning workload
https://plato.stanford.edu/entries/artificial-intelligence/ http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html 13 AI Evolution
https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon 14 AI General Categories Non-exhaustive list Mechanical Knowledge Base Speech Vision Language Electrical Inference Engine Speech to Text Classification Classification Control Reasoning Text to Speech Object Detection Extraction Kinematics … Instance Understanding Motion Segmentation Translation Face Recognition … …
Machine Learning Supervised Learning Unsupervised Learning Expert Reinforced Learning Deep Learning Robotics Convolutional Neural Network Systems Generative Adversarial Network Recurrent Neural Network …
15 Machine Learning & Deep Learning Characteristics Non-exhaustive list Machine Learning Deep Learning
Unstructured & Complex Structure Data Source Tabular (Image, Video, Speech, Text)
Number of Features Dozens - Hundreds Millions
Feature Engineering By domain expert Automated with Feature Extraction
Hardware General purpose CPU Custom chip (GPU, FPGA, ASIC)
Training Time Minutes - Hours Hours - Weeks
Technique Multiple ML Algorithm Neural Network Architecture https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063 16 Major Shift to Deep Learning Non-exhaustive list Hardware advancements, after GPU being used as Deep Learning accelerator after 2012, has trigger and accelerate conversion some applications from Machine Learning to Deep Learning. Machine Learning Deep Learning
1. Viola Jones 1. Region Proposals (R-CNN & Fast R-CNN) 2. Scale Invariant Feature Transform (SIFT) 2. Single Shot Multibox Detector (SSD) 3. Histogram of Oriented Gradients (HOG) 3. You Only Look Once (YOLO) Vision 4. … 4. …
1. Speech Synthesis TTS 1. WaveNet TTS 2. Hidden Markov Model based STT 2. DeepSpeech STT
3. … 3. … Speech
1. Statistical Machine Translation (SMT) 1. Neural Machine Translation (NMT)
2. … 2. … Translate 17 https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e 18 Deep Learning Influencer Non-exhaustive list
Yann LeCun, Li Fei Fei Chief Scientist Stanford, Google Facebook ImageNet Database LeNet, first neural network to
Andre Ng Alex Krizhevsky Joseph Redmon Stanford, Baidu, Google YOLO Real-time Google AlexNet first neural Object Detection Brain behind network runs on GPU Google Brain
19 Deep Learning Revolution ImageNet (ILSVRC) Competition Winner 2012, AlexNet (Object Classification) First Convolutional Neural Network 160 100.00% 152 (CNN) implementation utilizing GPU and 90.00% 140 won object classification ImageNet Deeper Neural Network 80.00% competition. 120 Layers 70.00%
100 Neural Network 60.00%
80 50.00%
Lower Error Rates Error Error Rate 40.00%
Number Number Layers of 60
30.00% 40 25.81% 20.00%
20 15.30% 14.80% 22 10.00% 6.70% Number of layers 8 8 3.60% 1. More layers 0 0 0.00% 2. More training data XRCE AlexNet ZFNet GoogleNet ResNet 3. More computing power (2011) (2012) (2013) (2014) (2015) 4. More accuracy 20 Why CPU IS NOT ENOUGH CPUs are designed for a wide variety of applications and to provide fast response times to a single task.
GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism.
Processor CPU, Intel Xeon Gold GPU, Nvidia Tesla V100 5120 (FP32), Cores 22 640 (Tensor Core) Transistor 14 nm 12 nm size V100 RAM Type GDDR4 GDDR6 Few complex cores for general Thousands simpler cores for parallel Strength processing processing Deep Learning Triggers new Chip Design
22 INFUSING COMPUTER VISION INTO YOUR APPS
23 Computer Vision runs on top of Deep Learning Stack Non-exhaustive list General Software Stack Deep Learning Stack
Applications Web Application Computer Vision, Speech, Language
Keras, TensorFlow, Caffe, PyTorch, CNTK Runtime Java, .NET, Python, PHP, NodeJS + Model
Middleware Application Server Accelerator Library (NVIDIA CUDA, Intel OpenVino)
OS Linux / Windows Accelerator Driver (NVIDIA Driver/Intel FPGA Driver)
Virtualization Hypervisor / Container Accelerator Virtualization (NVIDIA/AMD Virtual GPU, Intel Virtual FPGA)
Infrastructure CPU, RAM, Storage, Network Accelerator (NVIDIA/AMD GPU, Intel FPGA)
Software stack Illustration when adding Deep Learning Capability to any General Software, On Premise & IaaS24 Computer Vision High Level Integration Highly simplified Enterprise Application Computer Vision System Owner : Development Team Owner : Data Science Team Process : Software Development Lifecycle Process : Machine Learning Lifecycle (Waterfall, Agile, other) (OSEMN, CRISP-DM, TDSP) Type : Web App, Mobile App, other Training & Test Dataset
Trained Your Applications Training Engine Model
API Image / Video Inference Engine Computer Vision Life Cycle Training at Server 1 Neural Network Training Training & Test Initial Weight Trained Model Model Parameters Data
Public Data Set Image Classification COCO GoogleNet Training Parameters ImageNet ResNet Learning Rate Pascal VOC Object Detection Epoch R-CNN Based Batch size Research Dataset SSD YOLO Intersection of Union Average Loss Custom Data Set Object Segmentation Mean Avg Precision Image Collection Masked R-CNN Privately Labelled
Inference at Server / Edge 2 Neural Network Inference Input Image / Trained Model Predictions Model Parameter Video
Deep Learning Framework
26 Technology BALANCING TECHNOLOGY, SKILLS
AND Business Investment Value INVESTMENT Skill Identify Business Impact Highly simplified 1. Understand business impact associated with economic dollar values (example: cost saving, revenue increase) 2. Business investment must consider several key capability the company is planning to invest, such as IT Infrastructure, IT Skill and time to market Business Impact Economic Value Example Example
Drive Sales Sales Increase 25% YoY
Increase Customer Loyalty Higher repeat order by 15%
Improve Customer Experience Longer engagement
Improve Operational Reduce operational cost by 17% Effectiveness
28 Building Computer Vision Capability Considerations
Rent Buy 1. Rent deep learning infrastructure from Cloud 1. Buy infrastructure for on-premise deployment Provider Infrastructure 2. Maintain hardware infrastructure & deep learning 2. Choose between deep learning IaaS (full control software stack over deep learning infrastructure) or PaaS (partial control up to model & datasets)
Custom Existing 1. Use existing image data sets from community/cloud 1. Collect / create image data sets Model & Dataset provider 2. Create custom model based on custom data sets 2. Use existing model from community/cloud provider
Data Science & Developer Team Developer Team Only 1. Build full data science team in house to support 1. No plan to build data science team, since current or future current or future data science capability requirement don’t justify to have data science team 2. Train developer team to use deep learning software up to Skill 2. Data science and Developer team work together to API integration, or use deep learning software that has been integrate deep learning software with enterprise pre-configured specifically for user without prior data applications science knowledge
30 Computer Vision as a Service Non-exhaustive list Available options between technology stack, investment, skills, time to market Rent IaaS / Rent PaaS Rent PaaS Skills Buy On Premise Training & Inference Inference
API Integration, Rest API / Stream API Rest API / Stream API Rest API / Stream API API REST API
Image Labeling. Training Data Custom Image Collection Custom Image Collection Provided Human or Automated
Label Tools + Neural Network labelImg, OpenLabeling Web based N/A Training Tools Training Parameters Network Design, build or R-CNN, SSD, YOLO R-CNN, SSD, YOLO R-CNN, SSD, YOLO choose the right Model Neural Network
Deep Learning Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Develop custom Runtime PyTorch, CNTK PyTorch, CNTK PyTorch, CNTK deep learning code
Higher investment cost Lower investment Client Managed Higher barrier to entry Lower barrier to entry Higher CV skills Lower CV skills Cloud Provider Managed RIch Inference Capability Limited Inference Capability Total control to training data No/Less control to training data 31 Computer Vision as a Service Non-exhaustive list Available options between technology stack, investment, skills, time to market Rent IaaS / Rent PaaS Rent PaaS Skills Buy On Premise Training & Inference Inference
API Integration, Rest API / Stream API Rest API / Stream API Rest API / Stream API API Deep aws aws REST API Learning Rekognition Vision Image Labeling. Training Data Custom ImagePC Collection Custom Imageazure Collection Providedazure Human or Automated Custom-vision Vision Label Tools + Neural Network labelImg, OpenLabeling Web based N/A Training Tools Google Google Training Parameters Deep Auto-ML Cloud Vision Network Design, build or RLearning-CNN, SSD, YOLO R-CNN, SSD, YOLO R-CNN, SSD, YOLO choose the right Model IBM Watson IBM Visual Neural Network Server Studio Recognition Deep Learning Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Keras, TensorFlow, Caffe, Develop custom Runtime PyTorch, CNTK PyTorch, CNTK PyTorch, CNTK deep learning code
Higher investment cost Lower investment Client Managed Higher barrier to entry Lower barrier to entry Higher CV skills Lower CV skills Cloud Provider Managed RIch Inference Capability Limited Inference Capability Total control to training data No/Less control to training data 32 Build your own CV Enabled Apps … Learn the Basics … Stanford University Build your dataset CS231n: Convolutional Neural Networks for 1. Use public dataset Visual Recognition 2. Build image collection 3. Label image Find your cool use cases 4. Augment your Image 1. Image Search 2. Video Analytics … Train 3. Pet Counter 4. Rat Trap 5. Parking Detection and 6. Match making 7. … and more Integrate…
33 DEMO! OBJECT DETECTION POSE ESTIMATION
34 Q&A
35 TERIMA KASIH