<<

Supercomputing Power Booster for

Dahua Lin

1 Artificial Intelligence: A Technical Revolution in Sight

Deep Learning

Internet Age of AI Electricity Age of Information Steam Engine Age of Electricity

Age of Steam

2 Enables AI Breakthroughs

Face Recognition

Voice Recognition Image Recognition

Game Playing

Deep Learning

Natural Language Processing

Autonomous Driving Intelligent Financial Services 3 World Economy: Artificial Intelligence Is The New Lever

Contribute to the Increase global GDP by Till 2030 world economy by AI will 14% $15.7trillion

Data source:PwC4 SenseTime - Largest AI Algorithm & Solution Provider in Asia

Company Profile Performance Leading Technologies

Valuation 20 years 2300+ Core Tech Research Highest Staff World Class Experience in the World AI+Fintech AI+ AI+Mobile

700+ 150+ Proprietary Revenue Deep Learning Major Clients AI PhDs Industry No.1 AI+ Platform and Partners AI+Chips Autonomous Car AI+Health

5 Partnership with 700+ Major Corporations

§ Revenue & Market share #1 § Revenue & Market share #1 AI+Mobile Internet AI+Fintech

Strategic Partners

AI IOT AI+Smart City + AI+Car

AI+Smart Phones AI+ Remote Sensing

§ Revenue & Market share #1 AI+

§ Revenue & Market share #1 6 The Breakthroughs in Face Recognition

Achieve better accuracy 6-digit 8-digit than human eyes DeepID era password era password era What’s Next?

98.52% DeepID3 99.55% 10-6 FAR 10-8 FAR Human Eyes 97.45% DeepID2 99.15% 95% Pass rate 97% Pass rate 2BN facial images 97.35% 300K facial images 60MM facial images 200MM individual

2014 2015 2016 2017 2018

7 Academic Publication in Comparison with Tech Giants

Dominance in No. of top papers published leads to rich transform into intellectual properties

250 44 papers @ CVPR 2018 200 191 37 papers @ ECCV 2018

152 SenseTime* 150 119 101 99 100 91 89 Baidu, Alibaba, 73 61 Tencent 46 50 33 25 21 18 8

0 Microsoft CMU SenseTime Stanford Berkeley Google MIT Adobe Oxford Facebook Cambridge IBM Tencent Baidu Alibaba

CVPR+ICCV 2015 CVPR+ECCV 2016 CVPR+ICCV 2017 Total *SenseTime co-R&D with CUHK CVPR、ICCV、ECCV are the top 3 conferences worldwide with highest impact factor. They accept the best work on AI and deep learning. 8 Top Performances on International Competitions

l Detection and Tracking :KITTI 2015 Winner l Depth Estimation :KITTI Stereo 2015 Benchmark No.1 l Action Recognition :ActivityNet Challenge 2016 No.1 l Scene segmentation :Cityscapes Challenge 2016 Winner 2015-2017 l Multiple Object Tracking :MOT Challenge 2016 Winner l Lane Detection :TuSimple in CVPR 2017 workshop Winner l Segmentation :Pascal VOC Dataset No.1 l Video Object Segmentation :DAVIS Challenge 2017 No.1

l COCO2018 :Object Detection Champion 2018 l VOT Challenge : Champion l Face Identification & Verification : MegaFace2018 Two Winners

9 Innovation in Infrastructure

SenseParrots SensePetrel

High performance Multiple Storage Types High computational speed, low consumption of memory Object Storage, File Storage, Middle-ware

High Scalability Linear speed up for training on hundreds of High System Stability GPU cards High Availability Architecture + QoS, High Stability

High flexibility Highly modular design, can effectively adopt novel Linear Expansion training tasks Expand capacity and performance as needed

High productivity Optimization for AI Rapid prototyping, easy deployment A large number of optimization such as RDMA and SSD 10 World’s Top Ten AI Lab | CUHK – SenseTime Joint Lab

Trevor Darrel Tomaso Poggio Top 4 Universities in Top 4 Universities in Computer Science Computer Science

Russ Yoshua Bengio Salakhutdinov Top 4 Universities in Top Four AI Labs Computer Science

Xiao’ou Sean Tang Nandode Freitas Founder of SenseTime Renowned Visual Lab

Geoff Hinton Yann LeCun Leader of Google Leader of Facebook deep learning AI Lab research Top Four AI Labs Top Four AI Labs

Fei-Fei Li Director of Stanford Jurgen Top 4 Universities in AI Lab, founder of Schmidhuber Computer Science ImageNet Competition Top Four AI Labs Renowned Visual Lab 11 SenseTime Becomes the 5th National Platform for Next-gen AI

National Open Innovation Platform for Next Generation Artificial Intelligence on Intelligent Vision On Sep. 20, 2018, the Ministry of Science and Technology of declared SenseTime as the fifth National AI Platform. 12 The Technological Pyramid of AI

AI is not a magic

The success of an AI technology relies on large A Algorithm drives the process amount of annotated data and plenty of computational Computational power enables the process from resources. C data to intelligence

Data are the ultimate source D where AI is from

13 Tremendous computing power will be needed in the future

10x/yr. 300,000 Since 2012 Since 2012 The computation resources This metric has grown by more than required by a single model 300,000x training increases by over 10.85 per year.

Data From:OpenAI 14 Supercomputing: the Cornerstone of AI+

Ø Computing Power Ø Diverse Demand Ø Reduced Cost Industry-grade applications require Well-designed components and Computing infrastructures requires reliable infrastructures that can middle-wares are needed, in a highly professional skills and is costly. deliver the needed computing power. timely and scalable way.

Ø Top Performances on International Competitions

15 SenseTime | World-Class AI Platform

Banks & Financial Public Security & Union Pay , Institutions , Surveillance companies

Application

Face Image Autonomous Human-machine Medical Image AI Chip Recognition Recognition Driving interaction

A variety of core AI technologies

SensePetrel PPL SenseParrots

Platform Layer

High Speed Comm System Virtualization High performance Computing System

Infrastructure Layer

Supercomputing Platform in SenseTime

16 SenseParrots: A Training System Designed for the Future

Dynamic Graph Parallelism

Embrace the future with high flexibility, supporting arbitrary models, defined in advance or created on the fly

Scalability over Thousand GPUs

Scale to 1000+ GPUs, with nearly linear speed-up and just in one click

Non-blocking Execution Engine

No longer limited by the interpreter — computation, communication, and IO, all running in parallel yet reliably

JIT Compilation

Compile, schedule, and execute codes just in time, exploiting the computing resources to the maximum

System Architecture

17 SenseParrots: Performance in Comparison

9000 8000 70 7000

) 6000 60

5000 50 4000

images/s 3000 40

( 2000

1000 30 0 20

Throughput 10

0 1 4 16 64 SenseParrots Pytorch TensorFlow Ideal Parrots2 Pytorch TenorFlow

Training throughput on 64GPUs (images/s) Speedup on InceptionResNetV2 (Batchsize = 32)

CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.40GHz CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GPU: NVIDIA 1080ti GPU: NVIDIA 1080ti

18 SensePetrel l High Performance Storage

Tech n ical pr o po s al

Object storage Unified Interface Expansion on Demand

Unifies Storage interfaces with Expands capacity and different types of storage system performance whenever needed Middle- ware

Enhanced Availability Optimized IO Performance and Stability Uses High Availability Architecture Aggregates small files, increases and QoS batch read-write interface, optimizes File client pre-read cache, storage

19 SensePetrel l High Performance Storage

1.3 millionQPS 100 billion Usually small file system are 100,000 QPS Horizontal expansion of architecture to store small files

6 times 36% Writing Speed of S3 interface of 10 billion 10K files Throughput compare with memory caching systems

20 Training Deep Networks in Minutes

1.5 min Training AlexNet 7.5 min Training ResNet50 ImageNet Dataset, 512 V100 GPUs, 56 Gbps Network

128 90%+ 512 86% GPUs Efficiency GPUs Efficiency

Lazy Allreduce State-of-the-Art Approach: Coarse-Grained sparsity the training speed of ImageNet reaches 1 epoch/s on GPU cluster. LARS & Wram up

21 Significance of Minute Level Training

Architecture Search Rapid Iteration Automated Adaptation

• Finding a good new architecture • Revolutionize the workflow of • You no longer need a team of takes thousands of trials. Minute- model design, allowing a new experts in response to a new level training makes automatic model to be tested in minutes customer. Everything can be done search a reasonable option and reducing the design period in one click, with the support of from months to hours the supercomputing

22 GPU-based Supercomputing Platform

ü 17 clusters ü 14,000+ GPUs ü Computing power :160PFLops+ ü Beijing, , Shenzhen, , Tokyo, 23 Businesses Supported By Supercomputing Cluster

Smart healthcare Entertainment

Natural language Internet content analysis processing

Smart city Remote sensing image

Robot Mobile internet Supercomputing Text recognition AI Cluster Smart building Smart retail

Identity authentication Live-streaming video

Smartphone Carrier

Autonomous driving AR advertising

24 Collaboration

Universities

Enterprises

25 Lead AI Innovation

Power the Future

26