車載ディープラーニング及び自動運転用プラットフォーム DRIVE PX2

馬路 徹 技術顧問、GPUエバンジェリスト • NVIDIAの自動車ビジネス • ディープラーニングによる先進の画像認識 • GPU: ディープラーニング及び超並列処理のための エンジン • ディープラーニング及び超並列処理用 講演目次 車載プラットフォームDRIVE PX2 • ADAS及び自動運転用SWフレームワーク DRIVE WORK • 自動運転稼動状況の可視化 • 直近の自動運転関連応用事例(公開情報) NVIDIAの自動車ビジネス Automotive Experience 10 Years Car Models 80 Units Shipped 10+ M NVIDIA SDK (SOFTWARE DEVELOPMENT KIT) The Essential Resource for OEM, Tier1, Eco System Proliferation

developer.nvidia.com | Available Now "Modules, modules and more modules. There's so many modules there. If we were to strip off this car, we'd probably have a basketful of Modules -- little black boxes that do something. It's getting out of control. They're very expensive. They're tough to package. They're THE NEW very complex. REALIZATION “I’d like to see a monster module that controls the entire vehicle and that's easier to upgrade.“

Ralph Gilles, Fiat Chrysler Automobiles Global Design Chief Automotive News, February 28, 2016

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. THE FUTURE OF CAR COMPUTERS ONLY TWO MAIN INTEGRATED MODULES DRIVE CX DRIVE PX Cockpit Self-Driving Software Software GPU Virt Perception

AI - Speech Localization

SurroundView Planning

Smart Mirror Visualization

Two computers replace many ECUs Both have access to cameras/sensors Multiple OSs, Displays Powered by Artificial Intelligence Upgradeable SW replaces HW ECUs Cockpit Computer One architecture Self-Driving Computer Higher performance Lower total cost ディープラーニングによる先進の画像認識 DL REVOLUTIONIZE CAR COMPUTER VISION Required Separate Algorithms/Apps One Deep Neural Net App can Detect various Objects - Pedestrian, Cars, Traffic Signs, lanes Pedestrian: HOG etc - - Also with many attributes (Car: Police Car, Van, Sedan, Truck, Ambulance….) - Traffic Sign: Hough Transform + Character Recog. etc Only simple context recognition - Pedestrian Y/N Only (no additional info) DEEP NEURAL NETWORK - Speed Limit Signs Only CONVENTIONAL

(…) VERY SHORT TIME TO GET TOP-CLASS SCORE KITTI Dataset: Object Detection NVIDIA DRIVENet 100%

90% Top Score 88% 80%

72% 70% KITTY Database 60% Object Detection 55%

50%

40% 39%

30% 7/2015 8/2015 9/2015 10/2015 11/2015 12/2015

EVERYBODY USING GPU ! (Not the latest Ranking) Courtesy of Cityscape

Courtesy of Daimler

Courtesy of Audi “Using NVIDIA DIGITS deep learning platform, in less than four hours we achieved over 96% accuracy using Ruhr University Bochum’s traffic sign database. While others invested years of development to achieve similar levels of perception with classical computer vision algorithms, we have been able to do it at the speed of light.”

Matthias Rudolph, Director of Architecture, Driver Assistance Systems, Audi GPU: ディープラーニング及び 超並列処理のためのエンジン NVIDIA GPU BIG CONRIBUTION ON SUPERCOMPUTER USING CUDA (GPU Massive Parallel Computing)

CUDA: Compute Unified Device Architecture From SC TOP500 November 2015

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. LEAPS IN SUPERCOMPUTER GPU ADOPTION 120 100 80 60 40

20 # accelerated systems accelerated # 0 Nov 2013 Nov 2014 Nov 2015 Accelerated Systems x2 from 2013 to 2015 96% of New Systems using NVIDIA GPU 超並列プログラミング環境CUDA CUDA (Compute Unified Device Architecture)

https://developer.nvidia.com/gpu-accelerated-libraries

代表的なCUDA対応ライブラリ cuDNN ディープラーニング cuBLAS 行列演算(密行列) cuSPARSE 行列演算(疎行列) cuFFT フーリエ変換 cuRAND 乱数生成 NPP 画像処理プリミティブ cuSOLVER 行列ソルバ (y=Ax) Thrust C++テンプレートライブラリ … SOLID GPU ROADMAP

72 Volta

60

48 Pascal Mixed Precision Double Precision 36 3D Memory NVLink

24 Maxwell

12 Kepler Fermi Tesla SGEMM / W / SGEMM 0 2008 2010 2012 2014 2016 2018 NVIDIA ONE-ARCHITECTURE FROM SUPER COMPUTER TO AUTOMOTIVE SOC Automotive Tesla In Super Computers In Work Stations

GeForce In PCs

Mobile GPU In Tegra PARALLEL PROCESSING AND AI/DL EVERYWHERE WITH ONE-ARCHITECTURE OVER ALL PRODUCTS/PLATFORMS

NVIDIA Tegra/DRIVE PX

NVIDIA Tesla/Supercomputer, HPC

NVIDIA Tegra/Jetson

TITAN X/Graphics Card WHAT TRULY SCALABLE GPU ARCHITECTURE ENABLES TIME-CONSUMING TRAINING ON SERVER & REAL-TIME RECOGNITION ON EMBEDDED SYSTEM

Classified Object

!

Trained Camera Inputs Neural Net Model

NVIDIA GPU DEEP LEARNING DRIVE PX AUTO-PILOT SUPERCOMPUTER CAR COMPUTER ディープラーニング及び超並列処理用 車載プラットフォームDRIVE PX2 DRIVE PX2 ENGAGEMENTS >100

Passenger Car OEMs Commercial Car OEMs Tier 1s ~25 ~10 ~20

TAAS Eco System Partners (Transportation As A Service) (R&D, Universities, OS, Sensor, ISV etc) ~10 ~50 DRIVE PX PLATFORM SOLUTION • Drive PX is a computing platform for ADAS / autonomous driving

• End-to-End platform optimized for deep learning (Super Computer – DRIVE PX) DL: VERY FAST DEVELOPMENT SPEED TOWARDS TOPDL SCORE(1) Training • Open and Scalable SW Stack: Workstation/SuperComputer DRIVE Works

• Scalable architecture from ADAS to Autonomous Driving (One Tegra to 2 x Tegra + 2 x discrete GPU)

DRIVE PX DRIVE PX

. Dual Tegra X1 . 8 CPU Cores . Maxwell GPU . 850GFLOPS (FP32) . 12 simultaneous LVDS camera inputs . 2 LVDS display ports

Display Camera Inputs Ports Car Connector

Proprietary & Confidential All Information Subject to Change DRIVE PX HARNESS FROM CAR CONNECTOR CAN, LIN, FlexRay and Ethernet Supported

Ethernet (x1)

UART (x1) 1x Power

FlexRay (x2)

LIN (x4) 48-pin Automotive Grade Vehicle Harness

CAN 2.0 (x6) DRIVE PX2 . Dual Next Generation Tegra Dual Tegras on Top . Dual Discrete GPUs . 12 CPU Cores

. Pascal GPU Dual Discrete GPUs on the Bottom . 8TFLOPS (FP32) . 24DL TOPS . 12 simultaneous LVDS camera inputs Liquid Cooled if All Devices used

Proprietary & Confidential All Information Subject to Change DRIVE PX2 COMPUTATION ENGINES

TEGRA A PASCAL A GPU TOTAL PERFORMANCE - 8TFLOPS (FP32) Denver Denver PCIex4 - 24DL TOPS 8GB LPDDR4 A57 A57 A57 A57 Pascal 4GB HIGH PERFORMANCE 12CPUs 128bit Discrete GPU GDDR5 - 2 x Quad ARM A57 UMA Pascal - 2 x Dual Denver Integrated GPU (ARM 64b compatible)

SCALABLE 1Gb Ether PASCAL B - Scalable Platform TEGRA B Max: 2-Tegras + 2-dGPUs Min: 1-Tegra Denver Denver PCIex4 8GB LPDDR4 Pascal 4GB REDUNDANCY A57 A57 A57 A57 128bit GDDR5 - For Function Safety Discrete GPU UMA Pascal DEDICATED MEMORY Integrated GPU for each GPU DRIVE PX2 INTERFACES 70 Gigabits per second of I/O Camera CAN DRIVE PX2 . Sensor Fusion Interfaces Gb Ether LIN

GMSL Camera, CAN, GbE, BroadR-Reach, BroadR-Reach TEGRA A PASCAL A FlexRay FlexRay, LIN, GPIO USB3.0 Gb Ether GPIOs . Displays/Cockpit Computer Interfaces USB2.0 Display HDMI, FPDLink III and GMSL TEGRA B PASCAL B . Development and Debug Interfaces Gb Ether 10Gb Ether

HDMI, GbE, 10GbE, USB3, JTAG ASIL-D Display(HDMI) Safety MCU USB 2 (UART/debug), JTAG

Auto Grade connectors Debug/Lab interfaces DRIVE PX2 SOFTWARE A full stack of rich software components

. NVIDIA Vibrante Linux & Comprehensive BSP

. Rich Autonomous Driving DRIVE Works SDK

. SDK, Samples and more DRIVE PX ANALYSIS AS AN SEOOC (SAFETY ELEMENTS OUT OF CONTEXT)

PX as an SEooC is developed based on “Assumptions on use in Vehicles” including external interfaces  Safety Manual, FMEAD: NVIDIA as a developer of this SEooC will provide the assumptions to the Tier1s and OEMs  In order to have a compete safety case, these “assumptions” are validated by OEMs, Tier1s in the context of the actual Vehicle system  In case that NVIDIA SEooC does not fulfill the Vehicle requirements, “a modification needs to be made” to either the Vehicle or the SEooC

SEooC: Safety Elements out of Context

HARA: Hazard Analysis and Risk Assessment Quantitative Analysis SEooC Done FEMDA/FTA FEMDA: Failure Mode Effects and Diagnostic Analysis FTA: Fault Tree Analysis ADAS及び自動運転用SWフレームワーク DRIVE WORKS NVIDIA DRIVEWORKS AI/DL is now used in Detection (Perception) Other Features are accelerated by CUDA (GPU Massive-Parallel Computing)

COMPUTEWORKS GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK

Sensor Fusion Detection Localization HD Maps

and other technologies such as Driving, Planning AND OTHER SUPPORTING SDKS

Deep Learning SDK DIGITS Workflow VisionWorks

and other technologies such as: GIE (GPU Inference Engine), System Trace, Visual Profiler The NVIDIA DriveWorks SDK gives developers a foundation to build applications across the self-driving pipeline — perception, localization, planning and visualization. And we can bring all of these technologies together into a beautiful cockpit visualization to give the driver confidence that the car is accurately seeing the world around him.

“As a leading provider of graphical hardware for gamers and researchers alike, NVIDIA has a lot of expertise in building systems that can make sense of video input and make it something understandable.”

— Business Insider

DRIVEWORKS

Perception

Localization

Planning

Visualization

37 自動運転稼動状況の可視化 NEW AI DRIVING

MAPPING

KALDI

LOCALIZATION

DRIVENET

DAVENET

Training on Driving with NVIDIA DGX-1 NVIDIA DRIVE PX DGX-1 DriveWorks 直近の自動運転関連応用事例 (公開情報) As a part of VOLVO Drive Me project, they will run 100 autonomous driving test cars in 2017.

These cars will be equipped with NVIDIA’s Deep Learning Car Computer DRIVE PX2. WORLD’S FIRST AUTONOMOUS CAR RACE

. 10 teams, 20 identical cars

. DRIVE PX 2: The “brain” of every car

. 2016/17 Formula E season

FAST-SPEED RACING ALGORITHM ALREADY THERE Georgia Tech MPPI (Model Predictive Path Integral control) Algorithm • Calculate the optimized trajectory from the weighted average of 2,560 different trajectories (each looking 2.5sec ahead) calculated in parallel on the monster NVIDIA GPU 60-times every sec. • Using just one sampled trajectory will be very jerky. Thus 2,560 trajectories are weighted averaged. • The dynamics model is a linear function of 25 features based on an analytical vehicle model Doing by itself: Counter Steering, Power Slide…. Max speed 100km/Hr • On Car GPU used there is NVIDIA GTX750Ti (640-cores, 1,305-GFLOPS) THANK YOU