TC Tech Community
Total Page:16
File Type:pdf, Size:1020Kb
INFUSE COMPUTER VISION INTO YOUR APPS BALANCING TECHNOLOGY, SKILLS AND INVESTMENT SOFTWARE ARCHITECTURE CONFERENCE 2019 JAKARTA, 2-3 AUGUST 2019 Hello World! Interests • Core Banking Operations & Optimization • Card Payment & EMV Standards • UNIX System Programming • Performance Engineering • Deep Learning, Computer Vision Nama saya: Favorite Tools gcc, g++, python, dbx, gdb, valgrind, gprof, Gito purify, make, tensorflow, darknet, vim, powerpoint http://www.github.com/ngito http://www.slideshare.net/ngito/ COMPUTER VISION 3 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 125 https://www.runsociety.com/event/6th-edition-jakarta-marathon-2018/ 6 Estimated: 68 PEOPLE 7 Computer Vision in Public Sector Computer Vision Making computers gain high-level understanding from digital images or videos. It tries to achieve human visual system can do. 9 Computer Vision Applications Traffic Monitor & Activity Face Recognition Machine Vision Self Driving Car Enforcement Recognition Computer Vision Applications | 3D reconstruction from multiple images | 3D selfie | Artificial intelligence for video surveillance | Audio-visual speech recognition | Augmented reality | Augmented reality-assisted surgery | Automated Lip Reading | Automated optical inspection | Automatic image annotation | Automatic number-plate recognition | Automatic target recognition | Check weigher | Closed-circuit television | Computer stereo vision | Content-based image retrieval | Contextual image classification | DARPA LAGR Program | Deepfake | Digital video fingerprinting | Document mosaicing | Fingerprint recognition | Free viewpoint television | Fyuse | GazoPa | Geometric feature learning | Gesture recognition | Image collection exploration | Image retrieval | Image-based modeling and rendering | Intelligent character recognition | Iris recognition | Machine translation of sign languages | Machine vision | Mobile mapping | Morphing | Object Co-segmentation | Object detection | Optical braille recognition | Optical character recognition | Pedestrian detection | People counter | Physical computing | Positional tracking | Red light camera | Reverse image search | Scale-invariant feature operator | Smart camera | This Person Does Not Exist | Traffic enforcement camera | Traffic-sign recognition | Vehicle infrastructure integration | Velocity Moments | Video content analysis | View synthesis | Applications of virtual reality | Visual sensor network | Visual temporal attention | Visual Word | Water remote sensing 10| Computer Vision Subsystems Image We will discuss further on Image Classification & Object Detection enhancement Transformations Filtering Visual recognition Pose estimation Color vision Registration Feature extraction 11 AI vs Machine Learning vs Deep Learning TERMINOLOGIES If it is written in Python, It’s probably machine learning If it is written in PowerPoint, https://twitter.com/matvelloso/status/1065778379612282885?lang=enIt’s probably AI Definitions 1. Science and engineering of making intelligent machines and computer programs AI 2. There are many definitions of AI as of now but as a philosophy, AI is defined as future vision that is unattainable to ensure continuous improvement in multiple disciplines 1. Technique for realizing AI Machine 2. Enable machines to learn using the provided data and make accurate Learning predictions 3. Implemented in multiple algorithms to solve different problems 1. Subset and next evolution of (Supervised) Machine Learning Deep 2. Inspired by the patterns processing found in the human brain Learning 3. Implemented in Neural Network to solve different problems 4. Triggers new chip design to handle Deep Learning workload https://plato.stanford.edu/entries/artificial-intelligence/ http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html 13 AI Evolution https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon 14 AI General Categories Non-exhaustive list Mechanical Knowledge Base Speech Vision Language Electrical Inference Engine Speech to Text Classification Classification Control Reasoning Text to Speech Object Detection Extraction Kinematics … Instance Understanding Motion Segmentation Translation Face Recognition … … Machine Learning Supervised Learning Unsupervised Learning Expert Reinforced Learning Deep Learning Robotics Convolutional Neural Network Systems Generative Adversarial Network Recurrent Neural Network … 15 Machine Learning & Deep Learning Characteristics Non-exhaustive list Machine Learning Deep Learning Unstructured & Complex Structure Data Source Tabular (Image, Video, Speech, Text) Number of Features Dozens - Hundreds Millions Feature Engineering By domain expert Automated with Feature Extraction Hardware General purpose CPU Custom chip (GPU, FPGA, ASIC) Training Time Minutes - Hours Hours - Weeks Technique Multiple ML Algorithm Neural Network Architecture https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063 16 Major Shift to Deep Learning Non-exhaustive list Hardware advancements, after GPU being used as Deep Learning accelerator after 2012, has trigger and accelerate conversion some applications from Machine Learning to Deep Learning. Machine Learning Deep Learning 1. Viola Jones 1. Region Proposals (R-CNN & Fast R-CNN) 2. Scale Invariant Feature Transform (SIFT) 2. Single Shot Multibox Detector (SSD) 3. Histogram of Oriented Gradients (HOG) 3. You Only Look Once (YOLO) Vision 4. … 4. … 1. Speech Synthesis TTS 1. WaveNet TTS 2. Hidden Markov Model based STT 2. DeepSpeech STT 3. … 3. … Speech 1. Statistical Machine Translation (SMT) 1. Neural Machine Translation (NMT) 2. … 2. … Translate 17 https://medium.com/finc-engineering/cnn-do-we-need-to-go-deeper-afe1041e263e 18 Deep Learning Influencer Non-exhaustive list Yann LeCun, Li Fei Fei Chief Scientist Stanford, Google Facebook ImageNet Database LeNet, first neural network to Andre Ng Alex Krizhevsky Joseph Redmon Stanford, Baidu, Google YOLO Real-time Google AlexNet first neural Object Detection Brain behind network runs on GPU Google Brain 19 Deep Learning Revolution ImageNet (ILSVRC) Competition Winner 2012, AlexNet (Object Classification) First Convolutional Neural Network 160 100.00% 152 (CNN) implementation utilizing GPU and 90.00% 140 won object classification ImageNet Deeper Neural Network 80.00% competition. 120 Layers 70.00% 100 Neural Network 60.00% 80 50.00% Lower Error Rates Error Error Rate 40.00% Number Number of Layers 60 30.00% 40 25.81% 20.00% 20 15.30% 14.80% 22 10.00% 6.70% Number of layers 8 8 3.60% 1. More layers 0 0 0.00% 2. More training data XRCE AlexNet ZFNet GoogleNet ResNet 3. More computing power (2011) (2012) (2013) (2014) (2015) 4. More accuracy 20 Why CPU IS NOT ENOUGH CPUs are designed for a wide variety of applications and to provide fast response times to a single task. GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism. Processor CPU, Intel Xeon Gold GPU, Nvidia Tesla V100 5120 (FP32), Cores 22 640 (Tensor Core) Transistor 14 nm 12 nm size V100 RAM Type GDDR4 GDDR6 Few complex cores for general Thousands simpler cores for parallel Strength processing processing Deep Learning Triggers new Chip Design 22 INFUSING COMPUTER VISION INTO YOUR APPS 23 Computer Vision runs on top of Deep Learning Stack Non-exhaustive list General Software Stack Deep Learning Stack Applications Web Application Computer Vision, Speech, Language Keras, TensorFlow, Caffe, PyTorch, CNTK Runtime Java, .NET, Python, PHP, NodeJS + Model Middleware Application Server Accelerator Library (NVIDIA CUDA, Intel OpenVino) OS Linux / Windows Accelerator Driver (NVIDIA Driver/Intel FPGA Driver) Virtualization Hypervisor / Container Accelerator Virtualization (NVIDIA/AMD Virtual GPU, Intel Virtual FPGA) Infrastructure CPU, RAM, Storage, Network Accelerator (NVIDIA/AMD GPU, Intel FPGA) Software stack Illustration when adding Deep Learning Capability to any General Software, On Premise & IaaS24 Computer Vision High Level Integration Highly simplified Enterprise Application Computer Vision System Owner : Development Team Owner : Data Science Team Process : Software Development Lifecycle Process : Machine Learning Lifecycle (Waterfall, Agile, other) (OSEMN, CRISP-DM, TDSP) Type : Web App, Mobile App, other Training & Test Dataset Trained Your Applications Training Engine Model API Image / Video Inference Engine Computer Vision Life Cycle Training at Server 1 Neural Network Training Training & Test Initial Weight Trained Model Model Parameters Data Public Data Set Image Classification COCO GoogleNet Training Parameters ImageNet ResNet Learning Rate Pascal VOC Object Detection Epoch R-CNN Based Batch size Research Dataset SSD YOLO Intersection of Union Average Loss Custom Data Set Object Segmentation Mean Avg Precision Image Collection Masked R-CNN Privately Labelled Inference at Server / Edge 2 Neural Network Inference Input Image / Trained Model Predictions Model Parameter Video Deep Learning Framework 26 Technology BALANCING TECHNOLOGY, SKILLS AND Business Investment Value INVESTMENT Skill Identify Business Impact Highly simplified 1. Understand business impact associated with economic dollar values (example: cost saving, revenue increase) 2. Business investment must consider several key capability the company is planning to invest, such as IT Infrastructure, IT Skill and time to market Business Impact Economic Value Example Example Drive Sales Sales Increase 25% YoY Increase