Nvidia Tegra X2 Architecture and Design

Nvidia Tegra X2 Architecture and Design

Nvidia Tegra X2 Architecture and Design Jeff Barker Barry Wu Rochester Institute of Technology Spring 2017 Agenda I Overview I CPU Complex (4x ARM A57 + 2x Denver 2) I GPU Complex (Pascal GPU) I Applications I Nvidia Jetson TX2 I Nvidia Drive PX2 I Closing Remarks and Questions Nvidia Tegra X2 Overview I Announced in April 2017 as credit card sized system-on-module I Improved on previous generation Tegra X1 module with: I Two additional Denver 2 CPU Cores I 8GB 128-bit LPDDR4 RAM (compared to 4GB 64-bit) I 32GB eMMC storage (compared to 16GB) I Pascal GPU architecture (compared to Maxwell) I Improved video encoding and decoding Tegra X2 Block Diagram ARM CPU Cores I Quad Core ARM A57 64-bit Processor Cluster I Multi-thread optimized ARMv8 architecture I Superscalar, variable-length, out-of-order pipeline I Dynamic branch prediction with Branch Target Buffer (BTB) and Global History Buffer RAMs, a return stack, and an indirect predictor I 48KB 3-way set-associative L1 Instruction cache per core I 32KB 2-way set-associative L1 Data cache per core I 2MB 16-way set-associative shared L2 cache Denver 2 CPU Cores I Dual Core Denver 2 64-bit Processor Cluster I Single-thread optimized ARMv8 architecture I 7-wide Superscalar architecture I Dynamic branch prediction with Branch Target Buffer (BTB) and Global History Buffer RAMs, a return stack, and an indirect predictor I 128KB 4-way set-associative L1 Instruction cache per core I 64KB 4-way set-associative L1 Data cache per core I 2MB 16-way set-associative shared L2 cache Pascal GPU Micro-architecture Pascal GPU I 16-nm FinFET manufacturing process I 2 Streaming Multiprocessors (SM) I 4 Streaming Multiprocessor Blocks (SMP) per SM I 32 CUDA cores per SMP (256 CUDA cores total) I Simplified data path I Improved scheduling and overlapping load/store instruction I Improved addressing and page fault handling Jetson TX2 System-On-Module Block Diagram Jetson Carrier Board Jetson Carrier Board (cont.) I Interfaces I HDMI 2.0, 2x DSI display ports I CSI2 D-PHY 1.2 camera port I CAN, UART, SPI, I2C, I2S, GPIO I USB 3.0, USB 2.0 I Gigabit Ethernet, 802.11ac WLAN, Bluetooth I Applications I Computer Vision I Deep Learning/Neural Networks I Autonomous Drones I Robotics Nvidia Drive PX2 Nvidia Drive PX2 (cont.) I Fully autonomous driving I Auto-pilot Functionality I Perception I Reasoning I Mapping I Driving I Co-pilot Functionality I Face detection I Head tracking I Gaze tracking I Lip reading I Multimedia Control Closing Remarks I Extremely capable embedded platform processor I Two high power single thread optimized cores I Four low power multiple thread optimized cores I Nvidia Pascal GPU I High performance for low power draw I Claimed 1.5 TFLOPS at 15 Watts I Twice the performance per watt compared to the Tegra X1 I Wide range of applications, especially with Jetson I Higher cost for increased performance compared to other embedded platforms Questions?.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us