Silicon Acceleration Apis Embedded Technology 2016, Yokohama

Total Page:16

File Type:pdf, Size:1020Kb

Silicon Acceleration Apis Embedded Technology 2016, Yokohama Silicon Acceleration APIs Embedded Technology 2016, Yokohama Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos [email protected] | @neilt3d November 2016 © Copyright Khronos Group 2016 - Page 1 Khronos and Open Standards Software Silicon Khronos is an Industry Consortium of over 100 companies We create royalty-free, open standard APIs for hardware acceleration of Graphics, Parallel Compute, Neural Networks and Vision © Copyright Khronos Group 2016 - Page 2 Accelerated API Landscape Vision Frameworks Neural Net Libraries OpenVX Neural Net Extension High-level Language-based Acceleration Frameworks 3D Graphics Explicit Kernels DSP Dedicated Hardware GPU FPGA © Copyright Khronos Group 2016 - Page 3 OpenCL – Low-level Parallel Programing • Low level programming of heterogeneous parallel compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA • OpenCL C language to write kernel programs to execute on any compute device - Platform Layer API - to query, select and initialize compute devices - Runtime API - to build and execute kernels programs on multiple devices • New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14 - Adaptable and elegant sharable code – great for building libraries - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism GPU OpenCL CPU KernelOpenCL Kernel code DSP Runtime API CodeKernelOpenCL compiled for CPU loads and executes CPU CodeKernelOpenCL FPGA CodeKernel devices kernels across devices Devices Code Host © Copyright Khronos Group 2016 - Page 4 OpenCL Conformant Implementations 1.0 | May09 1.1 | Jul11 1.2 | Jun12 1.0 | Aug09 1.1 | Aug10 1.2 | May12 2.0 | Dec14 1.0 | May10 1.1 | Feb11 Desktop 1.1 |Mar11 1.2 | Dec12 2.0 | Jul14 2.1 | Jun15 1.0 | May09 1.1 | Jun10 1.2 | May15 1.1 | Aug12 1.2 | Mar16 1.0 | Feb11 1.2 | Sep13 Mobile 1.1 | Nov12 1.2 | Apr14 2.0 | Nov15 1.1 | Apr12 1.2 | Dec14 1.2 | Sep14 1.1 | May13 1.0 | Jan10 Embedded 1.2 | May15 1.2 | Aug15 1.0 | Jul13 FPGA 1.0 | Dec14 Vendor timelines are Dec08 Jun10 Nov11 Nov13 Nov15 first implementation of OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 OpenCL 2.1 each spec generation Specification Specification Specification Specification Specification © Copyright Khronos Group 2016 - Page 5 SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code • Aligns the hardware acceleration of OpenCL with direction of the C++ standard - C++14 with open source C++17 Parallel STL hosted by Khronos Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches C++ Kernel Language Single-source C++ Low Level Control Programmer Familiarity ‘GPGPU’-style separation of Approach also taken by device-side kernel source C++ AMP and OpenMP code and host code © Copyright Khronos Group 2016 - Page 6 Embedded Vision Acceleration Embedded Technology, Yokohama Shorin Kyo, Huawei © Copyright Khronos Group 2016 - Page 7 OpenVX – Low Power Vision Acceleration • Targeted at vision acceleration in real-time, mobile and embedded platforms - Precisely defined API for production deployment • Higher abstraction than OpenCL for performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware… • Extends portable vision acceleration to very low power domains - Doesn’t require high-power CPU/GPU Complex or OpenCL precision Vision Engine Vision Processing X100 Dedicated Efficiency Middleware Hardware Application Vision DSPs X10 GPU Compute Multi-core CPU Power Efficiency Power X1 GPU DSP Computation Flexibility Hardware © Copyright Khronos Group 2016 - Page 8 OpenVX Graphs • OpenVX developers express a graph of image operations (‘Nodes’) - Nodes can be on any hardware or processor coded in any language • Graphs can execute almost autonomously - Possible to Minimize host interaction during frame-rate graph execution • Graphs are the key to run-time optimization opportunities… Pyr Camera OpenVX Nodes t Rendering Input Output RGB YUV Gray Array of Frame Color Frame Channel Frame Image Optical Keypoints Harris Conversion Extract Pyramid Flow Track Array of Features OpenVX Graph Ftrt-1 Feature Extraction Example Graph © Copyright Khronos Group 2016 - Page 9 OpenVX Efficiency through Graphs.. Graph Memory Kernel Data Scheduling Management Merge Tiling Execute a sub- Split the graph Reuse Replace a sub- graph at tile execution across pre-allocated graph with a granularity instead the whole system: memory for single faster node of image CPU / GPU / multiple granularity dedicated HW intermediate data Faster execution Less allocation overhead, Better memory Better use of or lower power more memory for locality, less kernel data cache and consumption other applications launch overhead local memory © Copyright Khronos Group 2016 - Page 10 Example Relative Performance 10 Relative Performance 9 8.7 8 OpenCV (GPU accelerated) 7 OpenVX (GPU accelerated) NVIDIA 6 implementation experience. Geometric mean of 5 >2200 primitives, grouped into each 4 categories, 2.9 running at different 3 image sizes and 2.5 parameter settings 2 1.5 1.1 1 0 Arithmetic Analysis Filter Geometric Overall © Copyright Khronos Group 2016 - Page 11 Layered Vision Processing Ecosystem Implementers may use OpenCL or Compute Shaders to implement OpenVX nodes on programmable processors And then developers can use OpenVX to enable a developer to easily connect those nodes into a graph AMD OpenVX - Open source, highly optimized Application for x86 CPU and OpenCL for GPU - “Graph Optimizer” looks at entire processing pipeline and removes/replaces/merges functions to improve C/C++ performance and bandwidth - Scripting for rapid prototyping, without re-compiling, at Programmable Vision Dedicated Vision production performance levels Processors Hardware http://gpuopen.com/compute-product/amd-openvx/ OpenVX enables the graph to be extended to include hardware architectures that don’t support programmable APIs The OpenVX graph enables implementers to optimize execution across diverse hardware architectures an drive to lower power implementations © Copyright Khronos Group 2016 - Page 12 OpenVX 1.0 Shipping, OpenVX 1.1 Released! • Multiple OpenVX 1.0 Implementations shipping – spec in October 2014 - Open source sample implementation and conformance tests available • OpenVX 1.1 Specification released 2nd May 2016 - Expands node functionality AND enhances graph framework • OpenVX is EXTENSIBLE - Implementers can add their own nodes at any time to meet customer and market needs = shipping implementations © Copyright Khronos Group 2016 - Page 13 OpenVX Neural Net Extension • Convolution Neural Network topologies can be represented as OpenVX graphs - Layers are represented as OpenVX nodes - Layers connected by multi-dimensional tensors objects - Layer types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes • Import/Export Extension - Efficient handling of network Weights/Biases or complete networks • The specification is provisional - Welcome feedback from the deep learning community Vision Node Native Vision Vision Downstream Camera Application Node Node Control CNN Nodes Processing An OpenVX graph mixing CNN nodes with traditional vision nodes © Copyright Khronos Group 2016 - Page 14 NNEF - Neural Network Exchange Format NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 NN Authoring Framework 1 Inference Engine 1 Neural Net NN Authoring Framework 2 Exchange Inference Engine 2 Format NN Authoring Framework 3 Inference Engine 3 NNEF encapsulates neural network structure, data formats, commonly used operations (such as convolution, pooling, normalization, etc.) and formal network semantics NNEF 1.0 is currently being defined OpenVX will import NNEF files © Copyright Khronos Group 2016 - Page 15 Tessellation and geometry shaders 32-bit integers and floats ASTC Texture Compression OpenGL ES NPOT, 3D/depth textures Floating point render targets Vertex and Texture arrays Compute Shaders Debug and robustness for security Fixed function Multiple Render Targets Pipeline fragment shaders Epic’s Rivalry demo using full Unreal Engine 4 https://www.youtube.com/watch?v=jRr-G95GdaM Driver Silicon Silicon Driver Silicon Driver Update Update Update Update Update Update 2003 2004 2007 2012 2014 2015 2016 ES 1.0 ES 1.1 ES 2.0 ES 3.0 ES 3.1 ES 3.2 AEP N L Close to 2 Billion OpenGL ES devices shipped in 2015 OpenGL ES 2.0 OpenGL ES 3.x Vertex and Fragment Shaders Compute Shaders http://hwstats.unity3d.com/mobile/gpu.html © Copyright Khronos Group 2016 - Page 16 The Key Principle of Vulkan: Explicit Control • Application tells the driver what it is going to do You have to supply a LOT of information! - In enough detail that driver doesn’t have to guess - When the driver needs to know it And, you have to supply it early • In return, driver promises to do - What the application asks for - When it asks for it - Very quickly Efficient, predictable performance • No driver magic – no surprises © Copyright Khronos Group 2016 - Page 17 Vulkan Explicit GPU Control Vulkan Benefits Resource management in app code: Less driver hitches and surprises Application Single thread per context Simpler drivers: Application Improved efficiency/performance Memory allocation Reduced CPU bottlenecks Thread management
Recommended publications
  • AMD Firepro™Professional Graphics for CAD & Engineering and Media & Entertainment
    AMD FirePro™Professional Graphics for CAD & Engineering and Media & Entertainment Performance at every price point. AMD FirePro professional graphics offer breakthrough capabilities that can help maximize productivity and help lower cost and complexity — giving you the edge you need in your business. Outstanding graphics performance, compute power and ultrahigh-resolution multidisplay capabilities allows broadcast, design and engineering professionals to work at a whole new level of detail, speed, responsiveness and creativity. AMD FireProTM W9100 AMD FireProTM W8100 With 16GB GDDR5 memory and the ability to support up to six 4K The new AMD FirePro W8100 workstation graphics card is based on displays via six Mini DisplayPort outputs,1 the AMD FirePro W9100 the AMD Graphics Core Next (GCN) GPU architecture and packs up graphics card is the ideal single-GPU solution for the next generation to 4.2 TFLOPS of compute power to accelerate your projects beyond of ultrahigh-resolution visualization environments. just graphics. AMD FireProTM W7100 AMD FireProTM W5100 The new AMD FirePro W7100 graphics card delivers 8GB The new AMD FirePro™ W5100 graphics card delivers optimized of memory, application performance and special features application and multidisplay performance for midrange users. that media and entertainment and design and engineering With 4GB of ultra-fast GDDR5 memory, users can tackle moderately professionals need to take their projects to the next level. complex models, assemblies, data sets or advanced visual effects with ease. AMD FireProTM W4100 AMD FireProTM W2100 In a class of its own, the AMD FirePro Professional graphics starts with AMD W4100 graphics card is the best choice FirePro W2100 graphics, delivering for entry-level users who need a boost in optimized and certified professional graphics performance to better address application performance that similarly- their evolving workflows.
    [Show full text]
  • PC-Grade Parallel Processing and Hardware Acceleration for Large-Scale Data Analysis
    University of Huddersfield Repository Yang, Su PC-Grade Parallel Processing and Hardware Acceleration for Large-Scale Data Analysis Original Citation Yang, Su (2009) PC-Grade Parallel Processing and Hardware Acceleration for Large-Scale Data Analysis. Doctoral thesis, University of Huddersfield. This version is available at http://eprints.hud.ac.uk/id/eprint/8754/ The University Repository is a digital collection of the research output of the University, available on Open Access. Copyright and Moral Rights for the items on this site are retained by the individual author and/or other copyright owners. Users may access full items free of charge; copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational or not-for-profit purposes without prior permission or charge, provided: • The authors, title and full bibliographic details is credited in any copy; • A hyperlink and/or URL is included for the original metadata page; and • The content is not changed in any way. For more information, including our policy and submission procedure, please contact the Repository Team at: [email protected]. http://eprints.hud.ac.uk/ PC-Grade Parallel Processing and Hardware Acceleration for Large-Scale Data Analysis Yang Su A thesis submitted to the University of Huddersfield in partial fulfilment of the requirements for the degree of Doctor of Philosophy School of Computing and Engineering University of Huddersfield October 2009 Acknowledgments I would like to thank the School of Computing and Engineering at the University of Huddersfield for providing this great opportunity of study and facilitating me throughout this project.
    [Show full text]
  • Mellanox Corporate Deck
    UCX Community Meeting SC’19 November 2019 Open Meeting Forum and Publicly Available Work Product This is an open, public standards setting discussion and development meeting of UCF. The discussions that take place during this meeting are intended to be open to the general public and all work product derived from this meeting shall be made widely and freely available to the public. All information including exchange of technical information shall take place during open sessions of this meeting and UCF will not sponsor or support any closed or private working group, standards setting or development sessions that may take place during this meeting. Your participation in any non-public interactions or settings during this meeting are outside the scope of UCF's intended open-public meeting format. © 2019 UCF Consortium 2 UCF Consortium . Mission: • Collaboration between industry, laboratories, and academia to create production grade communication frameworks and open standards for data centric and high-performance applications . Projects https://www.ucfconsortium.org Join • UCX – Unified Communication X – www.openucx.org [email protected] • SparkUCX – www.sparkucx.org • Open RDMA . Board members • Jeff Kuehn, UCF Chairman (Los Alamos National Laboratory) • Gilad Shainer, UCF President (Mellanox Technologies) • Pavel Shamis, UCF treasurer (Arm) • Brad Benton, Board Member (AMD) • Duncan Poole, Board Member (Nvidia) • Pavan Balaji, Board Member (Argonne National Laboratory) • Sameh Sharkawi, Board Member (IBM) • Dhabaleswar K. (DK) Panda, Board Member (Ohio State University) • Steve Poole, Board Member (Open Source Software Solutions) © 2019 UCF Consortium 3 UCX History https://www.hpcwire.com/2018/09/17/ucf-ucx-and-a-car-ride-on-the-road-to-exascale/ © 2019 UCF Consortium 4 © 2019 UCF Consortium 5 UCX Portability .
    [Show full text]
  • AMD Codexl 1.0 GA Release Notes (Version 1.0.)
    AMD CodeXL 1.0 GA Release Notes (version 1.0.) CodeXL 1.0 is finally here! Thank you for using CodeXL. We appreciate any feedback you have! Please use our CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and Milind Kukanur’s CodeXL blog at AMD Developer Central - Blogs This version contains: CodeXL Visual Studio package and Standalone application, for 32-bit and 64-bit Windows platforms CodeXL for 64-bit Linux platforms Kernel Analyzer v2 for both Windows and Linux platforms System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5xxx or newer) or APU is required o The AMD Catalyst Driver must be installed, release 12.8 or later. Catalyst 12.12 is the recommended version (will become available later in December 2012). For GPU API-Level Debugging, a working OpenCL/OpenGL configuration is required (AMD or other). CPU Profiling o Time Based Profiling can be performed on any x86 or AMD64 (x86-64) CPU/APU. o The Event Based Profiling (EBP) and Instruction Based Sampling (IBS) session types require an AMD CPU or APU processor. Supported platforms: Windows platforms: Windows 7, 32-bit and 64-bit are supported o Visual Studio 2010 must be installed on the station before the CodeXL Visual Studio Package is installed. Linux platforms: Red Hat EL 6 u2 64-bit and Ubuntu 11.10 64-bit New in this version The following items were not part of the Beta release and are new to this version: A unified installer for Windows which installs both CodeXL and APP KernelAnalyzer2 on 32 and 64 bit platforms, packaged in a single executable file.
    [Show full text]
  • AMD Codexl 1.1 GA Release Notes
    AMD CodeXL 1.1 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use our CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: CodeXL Visual Studio 2012 and 2010 packages and Standalone application, for 32-bit and 64-bit Windows platforms CodeXL for 64-bit Linux platforms Kernel Analyzer v2 for both Windows and Linux platforms Note about 32-bit Windows CodeXL 1.1 Upgrade Error On 32-bit Windows platforms, upgrading from previous version of CodeXL using the CodeXL 1.1 installer will remove the previous version and then display an error message without installing CodeXL 1.1. The recommended method is to uninstall previous CodeXL before installing CodeXL 1.1. If you ran the 1.1 installer to upgrade a previous installation and encountered the error mentioned above, ignore the error and run the installer again to install CodeXL 1.1. Note about installing CodeAnalyst after installing CodeXL for Windows CodeXL can be safely installed on a Windows station where AMD CodeAnalyst is already installed. However, do not install CodeAnalyst on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5xxx or newer) or APU is required o The AMD Catalyst Driver must be installed, release 12.8 or later.
    [Show full text]
  • GPU Computing with Python: Performance, Energy Efficiency And
    computation Article GPU Computing with Python: Performance, Energy Efficiency and Usability † Håvard H. Holm 1,2,* , André R. Brodtkorb 3,4 and Martin L. Sætra 4,5 1 Mathematics and Cybernetics, SINTEF Digital, P.O. Box 124 Blindern, NO-0314 Oslo, Norway 2 Department of Mathematical Sciences, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway 3 Research and Development Department, Norwegian Meteorological Institute, P.O. Box 43 Blindern, NO-0313 Oslo, Norway; [email protected] 4 Department of Computer Science, Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, NO-0130 Oslo, Norway; [email protected] 5 Information Technology Department, Norwegian Meteorological Institute, P.O. Box 43 Blindern, NO-0313 Oslo, Norway * Correspondence: [email protected] † This paper is an extended version of our paper published in International Conference on Parallel Computing (ParCo2019). Received: 6 December 2019; Accepted: 6 January 2020; Published: 9 January 2020 Abstract: In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). We investigate the portability of performance and energy efficiency between Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL); between GPU generations; and between low-end, mid-range, and high-end GPUs. Our findings showed that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases obtain the same computational performance. Our experiments showed that performance in general varies more between different GPUs than between using CUDA and OpenCL.
    [Show full text]
  • Brochure 1.4 MB
    www.SapphirePGS.com Professional Graphics Solutions SAPPHIRE PGS (Professional Graphics Solutions) is a business unit within SAPPHIRE Technology for Professional Graphics. It provides various types of professional graphics display solutions for workstation and professional clients. SAPPHIRE PGS supports the full range of 3D professional applications for professional users. For industrial customers, SAPPHIRE PGS integrates display related graphics application solutions for broadcasting, digital signage, medical, surveillance, ATC (Air Traffic Control) and other markets. SAPPHIRE PGS is focused on providing our customers with highly appropriate solutions and outstanding pre and after sales consultancy and services. SAPPHIRE Technology About SAPPHIRE Technology SAPPHIRE Technology is a leading manufacturer and global supplier of a broad range of innovative technologies for PC enthusiasts, home users and professionals. Its origins rooted in graphics hardware design and manufacturing, the extensive SAPPHIRE product range has since grown from state-of-the-art graphics add-in boards—for which SAPPHIRE is recognized as the premiere AMD partner—to include motherboards, mini PCs, external graphics expanders, and Professional AV products. Founded in 2001, SAPPHIRE is a privately held global company headquartered in Hong Kong. Further information can be found at: www.sapphiretech.com. About SAPPHIRE PGS SAPPHIRE PGS (Professional Graphics Solutions) is a business unit within SAPPHIRE Technology for Professional Graphics. It provides various types of professional graphics display solutions for workstation and professional clients. SAPPHIRE PGS supports the full range of 3D professional applications for professional users. For industrial customers, SAPPHIRE PGS integrates display related graphics application solutions for broadcasting, digital signage, medical, surveillance, ATC (Air Traffic Control) and other markets.
    [Show full text]
  • 256 Color for Windows 2000 Driver Download 256 Color for Windows 2000 Driver Download
    256 color for windows 2000 driver download 256 color for windows 2000 driver download. Explorer Patch (256 color TrayIcons) Dr. Hoiby is the author of this little tool. All Dr. Hoiby's Freewares are certified without Java, without .NET and without any Virtual Machine Code. He can be reached at [email protected] . P atch TrayIcon Windows: (You can now use 256 colors icon in your TrayBar). -> BEFORE AFTER. F.A.Q Title: 256 [2008-03-24] How do I switch my computer to 256 colors mode from a different mode? ?? I don't understand the question, sorry :( [2008-03-26] Title: Windows NT 4.0 workstation? [2008-04-28] Do you have a version for NT 4.0-SP6a ? Sorry, no :( I'm patching Windows during my spare time. So, with my three childrens, I do not much. [2009-02-04] Title: 256 [2008-06-01] Hi, I'm wondering what I have to download in order to run this 256bit patch. I can't find the correct one to download for my system. I'm running US Windows 2000 SP4, what should I download? Sorry. All the patched systems are listed on this page. [2009-02-04] Title: Win2000 SP4 [2008-06-15] Hi ! I wonder how to do a manual patch of my explorer.exe v5.0.3900.6920 Thanks ;) Bedfford A.B. - Chile Please send to me your explorer.exe file. After that I'll can answer this question. [2009-02-04] Title: Windows 95 OSR2 Spanish and NT4 with ie6 [2008-07-11] I want to know if you can pacth this version of Windows When you compare the solutions for the german version and the english version : http://www.dr-hoiby.com/TrayIconIn256Color/v4.00.950B.de.txt http://www.dr- hoiby.com/TrayIconIn256Color/v4.00.950.us.txt It seems to be the same.
    [Show full text]
  • Amd Radeon 7000 Series Driver Download Amd Radeon 7000 Series Driver Download
    amd radeon 7000 series driver download Amd radeon 7000 series driver download. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 67a1b815bb7484d4 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. DRIVERS AMD RADEONTM HD 7000 SERIES WINDOWS VISTA. Tech tip, updating drivers manually requires some computer skills and patience. Amd/ati drivers you have been having this graphic. This tool is designed to detect the model of amd graphics card and the version of microsoft windows installed in your system, and then provide the option to download and install the latest official amd driver. Compaq cq42-304au. Your system for the radeon hd 7000. For use with systems equipped with amd radeon discrete desktop graphics, mobile graphics, or amd processors with radeon graphics. Download drivers are running amd radeon drivers by jervis on topic. Hd 7000 series for amd 7000 series amd accelerated processing units. Integer Scaling. Developers working with the amd embedded r-series apu can implement remote management, client virtualization and security capabilities to help reduce deployment costs and increase security and reliability of their amd r-series based platform through amd das 1.0 featuring dash 1.1, amd virtualization and trusted platform module tpm 1.2 support.
    [Show full text]
  • Jot User Manual
    Jot user manual Robert D. Kalnins [email protected] Philip L. Davidson [email protected] David M. Bourguignon [email protected] Jot user manual by Robert D. Kalnins, Philip L. Davidson, and David M. Bourguignon Copyright © 2003 by Princeton University Here is the Jot user manual abstract. Table of Contents 1. Introduction ................................................................................................................1 2. License Information ..................................................................................................3 3. Tutorials .......................................................................................................................5 Basic Tutorial..........................................................................................................5 Intermediate Tutorial ............................................................................................9 Advanced Tutorial...............................................................................................16 Animation Tutorial..............................................................................................16 4. Reference guide........................................................................................................27 Camera Manipulation.........................................................................................27 Mouse ..........................................................................................................27 Tablet ...........................................................................................................27
    [Show full text]
  • AMD “Kabini” APU SOC DAN BOUVIER BEN BATES, WALTER FRY, SREEKANTH GODEY HOT CHIPS 25 AUGUST 2013
    AMD “Kabini” APU SOC DAN BOUVIER BEN BATES, WALTER FRY, SREEKANTH GODEY HOT CHIPS 25 AUGUST 2013 “KABINI” FLOORPLAN 28nm technology, 105 mm2, 914M transistors KABINI APU SOC | HOT CHIPS 25 | AUGUST 2013 | 2 "JAGUAR" CORE DESIGN GOALS IMPROVE ON “BOBCAT”: UPDATE THE INCREASE PROCESS PERFORMANCE IN A ISA/FEATURE SET PORTABILITY GIVEN POWER ENVELOPE “Jaguar” added: More IPC SSE4.1, SSE4.2 40-bit physical AES, CLMUL address-capable Better frequency at given voltage MOVBE Improved Improved power efficiency through AVX, virtualization clock gating and unit redesign XSAVE/XSAVEOPT F16C, BMI1 KABINI APU SOC | HOT CHIPS 25 | AUGUST 2013 | 3 "JAGUAR" ENHANCEMENTS 32KB Branch Improved IC prefetcher ICACHE Prediction Larger IB for improved New hardware divider fetch/decode decoupling Decode and New/improved COPs: Microcode ROMS CRC32/SSE4.2, BMI1, POPCNT, LZCNT More OOO resources Int Rename FP Decode Rename 128b native hardware Scheduler Scheduler 4 SP muls + 4 SP adds FP Scheduler Int PRF 1 DP mul + 2 DP adds FP PRF ALU ALU LAGU SAGU ISA: many new COPs VALU VALU Mul 256b AVX support VIMul St Conv. Ld/St Queues redesign Div New zero optimizations Enhanced tablewalks FPAdd FPMul 32KB Ld/St 128b data path to FPU DCache Queues Improved write combining To/From BU More outstanding Shared Cache Unit transactions KABINI APU SOC | HOT CHIPS 25 | AUGUST 2013 | 4 "JAGUAR" SHARED CACHE UNIT Shared cache is major design addition 32KB Branch ICACHE Prediction in “Jaguar” Decode and Microcode ROMS Int Rename FP Decode Rename Scheduler Scheduler FP Scheduler Supports 4 cores Int PRF FP PRF ALU ALU LAGU SAGU VALU VALU Mul VIMul St Conv.
    [Show full text]
  • AMD Codexl 1.3 GA Release Notes
    AMD CodeXL 1.3 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For Windows for 32-bit and 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010 extension o CodeXL Microsoft® Visual Studio® 2012 extension o CodeXL Remote Agent For 64-bit Linux platforms o CodeXL Standalone application o CodeXL Remote Agent Note about 32-bit Windows CodeXL 1.3 Upgrade Error On 32-bit Windows platforms, upgrading from CodeXL 1.0 using the CodeXL 1.3 installer will remove the previous version and then display an error message without installing CodeXL 1.3. The recommended method is to uninstall previous CodeXL versions before installing CodeXL 1.3. If you ran the 1.3 installer to upgrade a previous installation and encountered the error mentioned above, ignore the error and run the installer again to install CodeXL 1.3. Note about installing CodeAnalyst after installing CodeXL for Windows CodeXL can be safely installed on a Windows station where AMD CodeAnalyst is already installed. However, do not install CodeAnalyst on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5000 series or newer, desktop or mobile version) or APU is required.
    [Show full text]