Tegra: Mobile & GPU Supercomputing Convergence | GTC 2013
Total Page:16
File Type:pdf, Size:1020Kb
Tegra – at the Convergence of Mobile and GPU Supercomputing Neil Trevett, VP Mobile Content, NVIDIA © 2012 NVIDIA - Page 1 Welcome to the Inaugural GTC Mobile Summit! Tuesday Afternoon - Room 210C Ecosystem Broad View – including Ouya Development Tools – including Tegra 4 and Shield Wednesday Morning - Marriott Ballroom 3 Visualization – including using H.264 for still imagery Augmented device interaction – including depth camera on Tegra Wednesday Afternoon - Room 210C Vision and Computational Photography – including Chimera Web – the fastest mobile browser Mobile Panel – your chance to ask gnarly questions! Select Mobile Summit Tag in your GTC Mobile App! © 2012 NVIDIA - Page 2 Why Mobile GPU Compute? State-of-the-art Augmented Reality without GPU Compute Courtesy Metaio http://www.youtube.com/watch?v=xw3M-TNOo44&feature=related © 2012 NVIDIA - Page 3 Augmented Reality with GPU Compute Research today on CUDA equipped laptop PCs How will this GPU Compute Capability migrate from high- end PCs to mobile? High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria © 2012 NVIDIA - Page 4 Denver CPU Mobile SOC Performance Increases Maxwell GPU FinFET Full Kepler GPU CUDA 5.0 OpenGL 4.3 100 Parker Google Nexus 7 Logan HTC One X+ 100x perf increase in Tegra 4 four years 1st Quad A15 10 Chimera Computational Photography Core i5 Tegra 3 1st Quad A9 1st Power saver 5th core Core 2 Duo Tegra 2 st CPU/GPU AGGREGATE PERFORMANCE AGGREGATE CPU/GPU 1 Dual A9 1 2012 2013 2014 2015 2011 Device Shipping Dates © 2012 NVIDIA - Page 5 Power is the New Design Limit The Process Fairy keeps bringing more transistors.. ..but the ‘End of Voltage Scaling’ means power is much more of an issue than in the past In the Good Old Days The New Reality Leakage was not important, and voltage Leakage has limited threshold voltage, scaled with feature size largely ending voltage scaling L’ = L/2 L’ = L/2 D’ = 1/L2 = 4D D’ = 1/L2 = 4D f’ = 2f f’ = ~2f V’ = V/2 V’ = ~V E’ = CV2 = E/8 E’ = CV2 = E/2 P’ = P P’ = 4P Halve L and get 4x the transistors and Halve L and get 4x the transistors and 8x the capability for 8x the capability for the same power 4x the power!! © 2012 NVIDIA - Page 6 Mobile Thermal Design Point 10” Screen takes 1-2W Resolution makes a difference - 7” Screen the iPad3 screen takes up to 8W! 4-5” Screen takes takes 1W 250-500mW 2-4W 4-7W 6-10W 30-90W Typical max system power levels before thermal failure Even as battery technology improves - these thermal limits remain © 2012 NVIDIA - Page 7 How to Save Power? Write 32-bits to Memory 600pJ Much more expensive to MOVE data than COMPUTE data Energy efficiency must now be key metric Send 32-bits Off-chip during silicon AND software design 50pJ Awareness of where data lives, where computation happens, how is it scheduled Need to use hardware acceleration Reduce data movement Send 32-bits 2mm 24pJ Lots of local processing in parallel Efficient caching and memory usage 32-bit Float Operation For 40nm, 7pJ 1V process 32-bit Integer Add 1pJ 32-bit Register Write 0.5pJ © 2012 NVIDIA - Page 8 Dark Silicon, Mobile SOCs and Power Efficiency Lots of space for transistors - can’t turn them on at same time! Would exceed Thermal Design Point Dark Silicon - specialized hardware turned on when needed Dedicated units can increase locality and parallelism of computation GPUs are also much more power efficient than CPUs When exploiting data parallelism X100 Multi-core CPU GPU X10 Compute Enabling new mobile Dedicated experiences requires pushing Hardware computation onto GPUs and Power Consumption Power X1 dedicated hardware Computation Flexibility © 2012 NVIDIA - Page 9 Mobile GPU Compute Adoption NVIDIA invented GPU Computing What we learned - it’s not technology alone it’s USE CASES Computational Face, Body and 3D Scene/Object Augmented Photography Gesture Tracking Reconstruction Reality Mobile GPU Compute Use Case Pipeline © 2012 NVIDIA - Page 10 ISP – Dedicated Hardware for Sensor Processing Camera ISP (Image Signal Processor) typically has little or no programmability Scan-line-based, data flows through compact hardware pipe No global memory used to minimize power BUT… computational photography apps now want to mix non-programmable ISP processing with more flexible GPU processing -> Chimera – new NVIDIA Computational Photography Architecture Camera ISP ~760 math Ops ~42K vals = 670Kb ~250Gops @ 300MHz © 2012 NVIDIA - Page 11 Flexible Use of ISP, GPU and CPU Flexible routing of image frames between computation engines Potential to integrate more hardware blocks over time - ISPs for different types of sensors – e.g. IR and depth cameras - ‘Scanners’ - very low power, always on, to detect things in the environment to process © 2012 NVIDIA - Page 12 Tegra 4 Family Tegra 4 (“Wayne”) Tegra 4i (“Grey”) World’s Fastest Mobile Processor 1st Integrated Tegra 4 LTE Processor Superphone / Tablet Smartphone Quad CPU Cortex A15, 4+1 Cortex A9 r4, 4+1 NVIDIA GPU 72 Core 60 Core LTE Optional with i500 Integrated i500 Chimera © 2012 NVIDIA - Page 13 Android Three Layer Ecosystem Apps and Games Most use Java, Cutting-edge apps/games use native APIs Partners Middleware and Apps Engines Use native APIs for power and performance VisX API Drivers - Java (SDK) and Native (NDK) VisX Turn-key vision middleware developed by NVIDIA: E.g. Tap-to-track, Panorama Paint © 2012 NVIDIA - Page 14 APIs for Mobile Imaging and Vision Camera and Images Graphics MediaCodec Java SurfaceTexture Java Binding to OpenGL ES FilterScript (RenderScript Subset) (similar to JSR239) OpenCV4Tegra Open source OpenCV vision library with OpenGL ES GLSL, ARM Multithreading Use GLSL shaders for imaging OpenCV and NEON optimizations Open source research project for advanced camera? control Native Open standard under development at Khronos for OpenGL 4.3 Compute Shaders provide general optimized, power efficient purpose computation on uniforms, images and vision acceleration textures for image and vision processing © 2012 NVIDIA - Page 15 APIs for Mobile GPU Compute GPU Compute Graphics RenderScript Run performance critical sections as Java Binding to OpenGL ES Java native C. Automatically offload C code segments to the GPU if possible (similar to JSR239) Program GPUs in C - over 375 million CUDA-enabled GPUs in notebooks, Use GLSL shaders for GPGPU compute workstations and supercomputers? Native OpenGL 4.3 Compute Shaders provides sufficient flexibility for physics, AI, Global Illumination and Ray-tracing acceleration © 2012 NVIDIA - Page 16 CUDA 5.0 and OpenGL 4.3 on Tegra Today Kayla Tegra + discrete GPU development platforms Available to select developers OpenGL 4.3 and CUDA 5.0 Full Kepler support on Linux PhysX, VisX … Enables early development of ARM-based applications with desktop-class graphics and compute Talk to us if you are interested Or email [email protected] © 2012 NVIDIA - Page 17 Thank You! Powerful GPU Compute is coming to a mobile device near you! New use cases need GPUs for acceptable battery consumption Logan will bring full Kepler-class GPU to Mobile! Desktop APIs for full GPU Compute: OpenGL 4.3 and CUDA 5.0 If you have apps that need Mobile GPU Compute now is the time to be talking to us… Questions? [email protected] © 2012 NVIDIA - Page 18 .