Supercomputing Power Booster for Artificial Intelligence
Dahua Lin
1 Artificial Intelligence: A Technical Revolution in Sight
Deep Learning
Internet Age of AI Electricity Age of Information Steam Engine Age of Electricity
Age of Steam
2 Deep Learning Enables AI Breakthroughs
Face Recognition
Voice Recognition Image Recognition
Game Playing
Deep Learning
Natural Language Processing
Autonomous Driving Intelligent Financial Services 3 World Economy: Artificial Intelligence Is The New Lever
Contribute to the Increase global GDP by Till 2030 world economy by AI will 14% $15.7trillion
Data source:PwC4 SenseTime - Largest AI Algorithm & Solution Provider in Asia
Company Profile Performance Leading Technologies
Valuation 20 years 2300+ Core Tech Research Highest Staff World Class Experience in the World AI+Fintech AI+Smart City AI+Mobile
700+ 150+ Proprietary Revenue Deep Learning Major Clients AI PhDs Industry No.1 AI+ Platform and Partners AI+Chips Autonomous Car AI+Health
5 Partnership with 700+ Major Corporations
§ Revenue & Market share #1 § Revenue & Market share #1 AI+Mobile Internet AI+Fintech
Strategic Partners
AI IOT AI+Smart City + AI+Car
AI+Smart Phones AI+ Remote Sensing
§ Revenue & Market share #1 AI+Retail
§ Revenue & Market share #1 6 The Breakthroughs in Face Recognition
Achieve better accuracy 6-digit 8-digit than human eyes DeepID era password era password era What’s Next?
98.52% DeepID3 99.55% 10-6 FAR 10-8 FAR Human Eyes 97.45% DeepID2 99.15% 95% Pass rate 97% Pass rate 2BN facial images 97.35% 300K facial images 60MM facial images 200MM individual
2014 2015 2016 2017 2018
7 Academic Publication in Comparison with Tech Giants
Dominance in No. of top papers published leads to rich transform into intellectual properties
250 Microsoft 44 papers @ CVPR 2018 200 191 37 papers @ ECCV 2018
152 SenseTime* 150 Google 119 Facebook 101 99 100 91 89 Baidu, Alibaba, 73 61 Tencent 46 50 33 25 21 18 8
0 Microsoft CMU SenseTime Stanford Berkeley Google MIT Adobe Oxford Facebook Cambridge IBM Tencent Baidu Alibaba
CVPR+ICCV 2015 CVPR+ECCV 2016 CVPR+ICCV 2017 Total *SenseTime co-R&D with CUHK CVPR、ICCV、ECCV are the top 3 computer vision conferences worldwide with highest impact factor. They accept the best work on AI and deep learning. 8 Top Performances on International Competitions
l Detection and Tracking :KITTI 2015 Winner l Depth Estimation :KITTI Stereo 2015 Benchmark No.1 l Action Recognition :ActivityNet Challenge 2016 No.1 l Scene segmentation :Cityscapes Challenge 2016 Winner 2015-2017 l Multiple Object Tracking :MOT Challenge 2016 Winner l Lane Detection :TuSimple in CVPR 2017 workshop Winner l Segmentation :Pascal VOC Dataset No.1 l Video Object Segmentation :DAVIS Challenge 2017 No.1
l COCO2018 :Object Detection Champion 2018 l VOT Challenge : Champion l Face Identification & Verification : MegaFace2018 Two Winners
9 Innovation in Infrastructure
SenseParrots SensePetrel
High performance Multiple Storage Types High computational speed, low consumption of memory Object Storage, File Storage, Middle-ware
High Scalability Linear speed up for training on hundreds of High System Stability GPU cards High Availability Architecture + QoS, High Stability
High flexibility Highly modular design, can effectively adopt novel Linear Expansion training tasks Expand capacity and performance as needed
High productivity Optimization for AI Rapid prototyping, easy deployment A large number of optimization such as RDMA and SSD 10 World’s Top Ten AI Lab | CUHK – SenseTime Joint Lab
Trevor Darrel Tomaso Poggio Top 4 Universities in Top 4 Universities in Computer Science Computer Science
Russ Yoshua Bengio Salakhutdinov Top 4 Universities in Top Four AI Labs Computer Science
Xiao’ou Sean Tang Nandode Freitas Founder of SenseTime Renowned Visual Lab
Geoff Hinton Yann LeCun Leader of Google Leader of Facebook deep learning AI Lab research Top Four AI Labs Top Four AI Labs
Fei-Fei Li Director of Stanford Jurgen Top 4 Universities in AI Lab, founder of Schmidhuber Computer Science ImageNet Competition Top Four AI Labs Renowned Visual Lab 11 SenseTime Becomes the 5th National Platform for Next-gen AI
National Open Innovation Platform for Next Generation Artificial Intelligence on Intelligent Vision On Sep. 20, 2018, the Ministry of Science and Technology of China declared SenseTime as the fifth National AI Platform. 12 The Technological Pyramid of AI
AI is not a magic
The success of an AI technology relies on large A Algorithm drives the process amount of annotated data and plenty of computational Computational power enables the process from resources. C data to intelligence
Data are the ultimate source D where AI is from
13 Tremendous computing power will be needed in the future
10x/yr. 300,000 Since 2012 Since 2012 The computation resources This metric has grown by more than required by a single model 300,000x training increases by over 10.85 per year.
Data From:OpenAI 14 Supercomputing: the Cornerstone of AI+
Ø Computing Power Ø Diverse Demand Ø Reduced Cost Industry-grade applications require Well-designed components and Computing infrastructures requires reliable infrastructures that can middle-wares are needed, in a highly professional skills and is costly. deliver the needed computing power. timely and scalable way.
Ø Top Performances on International Competitions
15 SenseTime | World-Class AI Platform
Banks & Financial Public Security & Union Pay China Mobile Qualcomm, NVIDIA Institutions OPPO, Xiaomi Surveillance companies
Application
Face Image Autonomous Human-machine Medical Image AI Chip Recognition Recognition Driving interaction
A variety of core AI technologies
SensePetrel PPL SenseParrots
Platform Layer
High Speed Comm System Virtualization High performance Computing System
Infrastructure Layer
Supercomputing Platform in SenseTime
16 SenseParrots: A Training System Designed for the Future
Dynamic Graph Parallelism
Embrace the future with high flexibility, supporting arbitrary models, defined in advance or created on the fly
Scalability over Thousand GPUs
Scale to 1000+ GPUs, with nearly linear speed-up and just in one click
Non-blocking Execution Engine
No longer limited by the interpreter — computation, communication, and IO, all running in parallel yet reliably
JIT Compilation
Compile, schedule, and execute codes just in time, exploiting the computing resources to the maximum
System Architecture
17 SenseParrots: Performance in Comparison
9000 8000 70 7000
) 6000 60
5000 50 4000
images/s 3000 40
( 2000
1000 30 0 20
Throughput 10
0 1 4 16 64 SenseParrots Pytorch TensorFlow Ideal Parrots2 Pytorch TenorFlow
Training throughput on 64GPUs (images/s) Speedup on InceptionResNetV2 (Batchsize = 32)
CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.40GHz CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GPU: NVIDIA 1080ti GPU: NVIDIA 1080ti
18 SensePetrel l High Performance Storage
Tech n ical pr o po s al
Object storage Unified Interface Expansion on Demand
Unifies Storage interfaces with Expands capacity and different types of storage system performance whenever needed Middle- ware
Enhanced Availability Optimized IO Performance and Stability Uses High Availability Architecture Aggregates small files, increases and QoS batch read-write interface, optimizes File client pre-read cache, storage
19 SensePetrel l High Performance Storage
1.3 millionQPS 100 billion Usually small file system are 100,000 QPS Horizontal expansion of architecture to store small files
6 times 36% Writing Speed of S3 interface of 10 billion 10K files Throughput compare with memory caching systems
20 Training Deep Networks in Minutes
1.5 min Training AlexNet 7.5 min Training ResNet50 ImageNet Dataset, 512 V100 GPUs, 56 Gbps Network
128 90%+ 512 86% GPUs Efficiency GPUs Efficiency
Lazy Allreduce State-of-the-Art Approach: Coarse-Grained sparsity the training speed of ImageNet reaches 1 epoch/s on GPU cluster. LARS & Wram up
21 Significance of Minute Level Training
Architecture Search Rapid Iteration Automated Adaptation
• Finding a good new architecture • Revolutionize the workflow of • You no longer need a team of takes thousands of trials. Minute- model design, allowing a new experts in response to a new level training makes automatic model to be tested in minutes customer. Everything can be done search a reasonable option and reducing the design period in one click, with the support of from months to hours the supercomputing
22 GPU-based Supercomputing Platform
ü 17 clusters ü 14,000+ GPUs ü Computing power :160PFLops+ ü Beijing, Shanghai, Shenzhen, Hong Kong, Tokyo, Singapore 23 Businesses Supported By Supercomputing Cluster
Smart healthcare Entertainment
Natural language Internet content analysis processing
Smart city Remote sensing image
Robot Mobile internet Supercomputing Text recognition AI education Cluster Smart building Smart retail
Identity authentication Live-streaming video
Smartphone Carrier
Autonomous driving AR advertising
24 Collaboration
Universities
Enterprises
25 Lead AI Innovation
Power the Future
26