GPU4S: Embedded Gpus in Space
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Reverse Engineering Power Management on NVIDIA Gpus - Anatomy of an Autonomic-Ready System Martin Peres
Reverse Engineering Power Management on NVIDIA GPUs - Anatomy of an Autonomic-ready System Martin Peres To cite this version: Martin Peres. Reverse Engineering Power Management on NVIDIA GPUs - Anatomy of an Autonomic-ready System. ECRTS, Operating Systems Platforms for Embedded Real-Time appli- cations 2013, Jul 2013, Paris, France. hal-00853849 HAL Id: hal-00853849 https://hal.archives-ouvertes.fr/hal-00853849 Submitted on 23 Aug 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Reverse engineering power management on NVIDIA GPUs - Anatomy of an autonomic-ready system Martin Peres Ph.D. student at LaBRI University of Bordeaux Hobbyist Linux/Nouveau Developer Email: [email protected] Abstract—Research in power management is currently limited supported nor documented by NVIDIA. As GPUs are leading by the fact that companies do not release enough documentation the market in terms of performance-per-Watt [3], they are or interfaces to fully exploit the potential found in modern a good candidate for a reverse engineering effort of their processors. This problem is even more present in GPUs despite power management features. The choice of reverse engineering having the highest performance-per-Watt ratio found in today’s NVIDIA’s power management features makes sense as they processors. -
Computer Graphics on Mobile Devices
Computer Graphics on Mobile Devices Bruno Tunjic∗ Vienna University of Technology Figure 1: Different mobile devices available on the market today. Image courtesy of ASU [ASU 2011]. Abstract 1 Introduction Computer graphics hardware acceleration and rendering techniques Under the term mobile device we understand any device designed have improved significantly in recent years. These improvements for use in mobile context [Marcial 2010]. In other words this term are particularly noticeable in mobile devices that are produced in is used for devices that are battery-powered and therefore physi- great amounts and developed by different manufacturers. New tech- cally movable. This group of devices includes mobile (cellular) nologies are constantly developed and this extends the capabilities phones, personal media players (PMP), personal navigation devices of such devices correspondingly. (PND), personal digital assistants (PDA), smartphones, tablet per- sonal computers, notebooks, digital cameras, hand-held game con- soles and mobile internet devices (MID). Figure 1 shows different In this paper, a review about the existing and new hardware and mobile devices available on the market today. Traditional mobile software, as well as a closer look into some of the most important phones are aimed at making and receiving telephone calls over a revolutionary technologies, is given. Special emphasis is given on radio link. PDAs are personal organizers that later evolved into de- new Application Programming Interfaces (API) and rendering tech- vices with advanced units communication, entertainment and wire- niques that were developed in recent years. A review of limitations less capabilities [Wiggins 2004]. Smartphones can be seen as a that developers have to overcome when bringing graphics to mobile next generation of PDAs since they incorporate all its features but devices is also provided. -
Comparative Study of Various Systems on Chips Embedded in Mobile Devices
Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system. -
Innovative AMD Handheld Technology – the Ultimate Visual Experience™ Anywhere –
MEDIA BACKGROUNDER Innovative AMD Handheld Technology – The Ultimate Visual Experience™ Anywhere – AMD Vision AMD has a vision of a new era of mobile entertainment, bringing all the capabilities of a camera, camcorder, music player and 3D gaming console to mobile phones, smart phones and tomorrow’s converged portable devices. This vision is quickly becoming reality. Mass adoption of image and video sharing sites like YouTube, as well as the growing popularity of camera phones and personalized media services, are several trends that demonstrate ever-increasing consumer demand for “always connected” multimedia. And consumers have demonstrated a willingness to pay for sophisticated devices and services that deliver immersive, media-rich experiences. This increasing appetite for mobile multimedia makes it more important than ever for device manufacturers to quickly deliver the latest multimedia features – without significantly increasing design and manufacturing costs. AMD in Mobile Multimedia With the acquisition of ATI Technologies in 2006, AMD expanded beyond its traditional realm of PC computing to become a powerhouse in multimedia processing technologies. Building on more than 20 years of graphics and multimedia expertise, AMD is a leading supplier of media processors to the handheld market with nearly 250 million AMD Imageon™ media processors shipped to date. Furthermore, AMD is a significant source of mobile intellectual property (IP), licensing graphics technology to semiconductor suppliers. AMD provides customers with a top-to-bottom family of cutting-edge audio, video, imaging, graphics and mobile TV products. The scalable AMD technology platforms are based on open industry standards, and are designed for maximum performance with low power consumption. -
Thermal Covert Channels Leveraging Package-On-Package DRAM
Thermal Covert Channels Leveraging Package-On-Package DRAM Shuai Chen∗, Wenjie Xiongy, Yehan Xu∗, Bing Li∗ and Jakub Szefery ∗Southeast University, Nanjing, China fchenshuai ic,220174472, bernie [email protected] yYale University, New Haven, CT, USA fwenjie.xiong, [email protected] Abstract—Package-on-Package (PoP) is an intergraded circuit observation of execution time [2], [3], or on observation packaging technique where multiple separate packages are of physical emanation, such as heat [4] or electromagnetic mounted vertically one on top of the other, allowing for more (EM) field [5], for example. This work focuses on thermal compact system design and reduction in the distance between channels, and shows how the heat, or temperature, can modules. However, this can introduce security vulnerabilities. be measured without special measurement equipment or In particular, this work shows that due to the close physical physical access in commodity SoC devices. Especially, this proximity of a System-on-a-Chip (SoC) package and a PoP work focuses on an SoC DRAM Package-on-Package (PoP) DRAM that is on top of it, a thermal covert channel exists configuration [6] where two chips (SoC and DRAM) are between the SoC and the PoP DRAM. The thermal covert stacked on top of each other. The heat transfers between the channel can allow multiple cores in the SoC to communicate two can be observed by measuring decay rate of DRAM by modulating the temperature of the PoP DRAM. Especially, cells, which is the basis for this work. it is possible for one core in the SoC to generate heat patterns, Previously, heat-based or thermal covert channels have which encode data that is to be transmitted, and another been demonstrated in data centers [7], or in multicore core to observe the transmitted pattern by measuring the processors [8], but these require dedicated thermal sensors decay rate of DRAM cells. -
Applications and Implementations
Applications and Implementations Hwanyong LEE CTO and Technical Marketing Director HUONE © Copyright Khronos Group, 2010 - Page 1 Khronos Family of Standards 3D Digital Asset Plugin-free Mobile OS accessibility Exchange format 3D Web Content Abstraction Authoring and Authoring A coordinated ecosystem of compute, graphics and media Cross platform Parallel Context and Surface Embedded 3D desktop 3D Computing Management standards and APIs Application Acceleration Acceleration Safety Critical 3D Steaming Media Advanced Audio Vector 2D Video, Audio and Window System System System Codec Creation Image Acceleration Acceleration Integration Integration Hundreds of man years invested by industry experts to create a coordinated visual computing ecosystem for accelerated parallel computation, 3D, video, audio and image processing on desktop, embedded and mobile systems © Copyright Khronos Group, 2010 - Page 2 OpenVG • Royalty-free open standard API • Low-level 2D vector graphics rendering API • OpenGL-style programming model • Advanced feature set enables - SVG, - Flash, - PDF, Postscript, Applications and UI - Java (JSR 287, 271, 226) SVG, Vector and - etc. Font Packages etc.. • Portable content • Map Applications • Hardware Acceleration Hardware Acceleration © Copyright Khronos Group, 2010 - Page 3 OpenVG Rendering Pipeline © Copyright Khronos Group, 2010 - Page 4 OpenVG with Native Graphics Processor • CPU sending data and commands to OpenVG hardware • OpenVG rendering pipeline is in the hardware CPU Native Vector Graphics Hardware © Copyright -
Accelerating Augmented Reality Video Processing with Fpgas
Accelerating Augmented Reality Video Processing with FPGAs A Major Qualifying Project Submitted to the Faculty of Worcester Polytechnic Institute in partial fulfillment of the requirements for the Degree of Bachelor of Science 4/27/2016 Anthony Dresser, Lukas Hunker, Andrew Weiler Advisors: Professor James Duckworth, Professor Michael Ciaraldi This report represents work of WPI undergraduate students submitted to the faculty as evidence of a degree requirement. WPI routinely publishes these reports on its web site without editorial or peer review. For more information about the projects program at WPI, see http://www.wpi.edu/Academics/Projects. Abstract This project implemented a system for performing Augmented Reality on a Xilinx Zync FPGA. Augmented and virtual reality is a growing field currently dominated by desktop computer based solutions, and FPGAs offer unique advantages in latency, performance, bandwidth, and portability over more traditional solutions. The parallel nature of FPGAs also create a favorable platform for common types of video processing and machine vision algorithms. The project uses two OV7670 cameras mounted on the front of an Oculus Rift DK2. A video pipeline is designed around an Avnet ZedBoard, which has a Zynq 7020 SoC/FPGA. The system aimed to highlight moving objects in front of the user. Executive Summary Virtual and augmented reality are quickly growing fields, with many companies bringing unique hard- ware and software solutions to market each quarter. Presently, these solutions generally rely on a desktop computing platform to perform their video processing and video rendering. While it is easy to develop on these platforms due to their abundant performance, several issues arise that are generally discounted: cost, portability, power consumption, real time performance, and latency. -
Paul Waring @Pwaring [email protected]
Low Cost Computing with Linux Paul Waring @pwaring [email protected] Low cost? £25-160 Creator CI20 Image: http://elinux.org/MIPS_Creator_CI20 Creator CI20 Release date May 2015 (v2) Cost £55-65 CPU Dual core 1.2Ghz MIPS 512 KB L2 cache GPU PowerVR SGX540 RAM 1 GB USB 2 ports (1 x OTG, 1 x Host) Networking 10/100Mbps Ethernet 802.11 b/g/n wireless Bluetooth 4.0 Onboard storage 8 GB flash 1 x SD MIPS Microprocessor without Interlocked Pipeline Stages Imagination Technologies – UK quoted company (LSE) MIPS Reduced Instruction Set Computer 32 and 64 bit MIPS Tend to be at the low/cheap end of the market DSL routers often have a MIPS CPU (e.g. Technicolour TG582n) Business model: License designs Raspberry Pi Image: https://www.raspberrypi.org/blog/raspberry-pi-3-on-sale/ Raspberry Pi B+ Release date February 2012 Cost £25-30 CPU Single core 700Mhz ARM11 128 KB L2 cache (shared with GPU) GPU Broadcom VideoCore IV RAM 512 MB (shared with GPU) USB 4 ports (via on-board hub) Networking 10/100Mbps Ethernet (USB) Onboard storage 1 x SD Raspberry Pi 2 Release date February 2015 Cost £25-30 CPU Quad core 900Mhz ARM Cortex-A7 256 KB L2 cache GPU Broadcom VideoCore IV RAM 1 GB USB 4 ports (via on-board hub) Networking 10/100Mbps Ethernet (USB) Onboard storage 1 x MicroSD Raspberry Pi 3 Release date February 2016 Cost £25-30 CPU Quad core 1.2Ghz ARM Cortex-A53 512 KB L2 cache GPU Broadcom VideoCore IV (at higher clock frequencies than B+ and 2) RAM 1 GB USB 4 ports (via on-board hub) Networking 10/100Mbps Ethernet 802.11 b/g/n wireless Bluetooth 4.1 Onboard -
Rapid Prototyping of an FPGA-Based Video Processing System
Rapid Prototyping of an FPGA-Based Video Processing System Zhun Shi Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science In Computer Engineering Peter M. Athanas, Chair Thomas Martin Haibo Zeng Apr 29th, 2016 Blacksburg, Virginia Keywords: FPGA, Computer Vision, Video Processing, Rapid Prototyping, High-Level Synthesis Copyright 2016, Zhun Shi Rapid Prototyping of an FPGA-Based Video Processing System Zhun Shi (ABSTRACT) Computer vision technology can be seen in a variety of applications ranging from mobile phones to autonomous vehicles. Many computer vision applications such as drones and autonomous vehicles requires real-time processing capability in order to communicate with the control unit for sending commands in real time. Besides real-time processing capability, it is crucial to keep the power consumption low in order to extend the battery life of not only mobile devices, but also drones and autonomous vehicles. FPGAs are desired platforms that can provide high-performance and low-power solutions for real-time video processing. As hardware designs typically are more time consuming than equivalent software designs, this thesis proposes a rapid prototyping flow for FPGA-based video processing system design by taking advantage of the use of high performance AXI interface and a high level synthesis tool, Vivado HLS. Vivado HLS provides the convenience of automatically synthesizing a software implementation to hardware implementation. But the tool is far from being perfect, and users still need embedded hardware knowledge and experience in order to accomplish a successful design. -
Performance Characterization of Mobile GP-Gpus
Performance Characterization of Mobile GP-GPUs Fitsum Assamnew Andargie Jonathan Rose School of Electrical and Computer Engineering The Edward Roger Sr. Department of Electrical and Addis Ababa University Computer Engineering Addis Ababa, Ethiopia University of Toronto [email protected] Toronto, Canada [email protected] Abstract— As smartphones and tablets have become more efficient algorithms [4]. The microarchitectural parameters of sophisticated, they now include General Purpose Graphics desktop/server GP GPUs are well understood and revealed by Processing Units (GP GPUs) that can be used for computation the vendors [4], whereas mobile GP GPU microarchitectures beyond driving the high-resolution screens. To use them are far less well documented. The purpose of this paper is to effectively, the programmer needs to have a clear sense of their measure key aspects of the microarchitecture and micro- microarchitecture, which in some cases is hidden by the communication channels for the Qualcomm Adreno 320 [5] manufacturer. In this paper we unearth key microarchitectural and 420 GPUs, which exist in the widely used Snapdragon parameters of the Qualcomm Adreno 320 and 420 GP GPUs, series of SoCs [6] used in many tablets and phones. present in one of the key SoCs in the industry, the Snapdragon Understanding these GP GPUs will enable high performance series of chips. applications to be developed for platforms that harbor them. Keywords—smartphones; GP GPU; microarchitecture; Adreno GPU;OpenCL II. BACKGROUND Recent smartphones are equipped with many kinds of I. INTRODUCTION compute modalities that can be used to enhance application Smartphones have progressed dramatically in the last few performance. -
Effective Opengl 5 September 2016, Christophe Riccio
Effective OpenGL 5 September 2016, Christophe Riccio Table of Contents 0. Cross platform support 3 1. Internal texture formats 4 2. Configurable texture swizzling 5 3. BGRA texture swizzling using texture formats 6 4. Texture alpha swizzling 7 5. Half type constants 8 6. Color read format queries 9 7. sRGB texture 10 8. sRGB framebuffer object 11 9. sRGB default framebuffer 12 10. sRGB framebuffer blending precision 13 11. Compressed texture internal format support 14 12. Sized texture internal format support 15 13. Surviving without gl_DrawID 16 14. Cross architecture control of framebuffer restore and resolve to save bandwidth 17 15 Building platform specific code paths 18 16 Max texture sizes 19 17 Hardware compression format support 20 18 Draw buffers differences between APIs 21 19 iOS OpenGL ES extensions 22 20 Asynchronous pixel transfers 23 Change log 24 0. Cross platform support Initially released on January 1992, OpenGL has a long history which led to many versions; market specific variations such as OpenGL ES in July 2003 and WebGL in 2011; a backward compatibility break with OpenGL core profile in August 2009; and many vendor specifics, multi vendors (EXT), standard (ARB, OES), and cross API extensions (KHR). OpenGL is massively cross platform but it doesn’t mean it comes automagically. Just like C and C++ languages, it allows cross platform support but we have to work hard for it. The amount of work depends on the range of the application- targeted market. Across vendors? Eg: AMD, ARM, Intel, NVIDIA, PowerVR and Qualcomm GPUs. Across hardware generations? Eg: Tesla, Fermi, Kepler, Maxwell and Pascal architectures. -
Reverse Engineering Power Management on NVIDIA Gpus - a Detailed Overview
Reverse engineering power management on NVIDIA GPUs - A detailed overview Martin Peres Ph.D. student at LaBRI University of Bordeaux Hobbyist Linux/Nouveau Developer Email: [email protected] Abstract—Power management in Open Source operating sys- by Linus Torvalds [2], creator of Linux. With the exception of tems is currently limited by the fact that hardware companies the ARM-based Tegra, NVIDIA has never released enough do not release enough documentation to write the most power- documentation to provide 3D acceleration to a single GPU efficient drivers. This problem is even more present in GPUs after the Geforce 4 (2002). Power management was never despite having the highest performance-per-Watt ratio found in supported nor documented by NVIDIA. Reverse engineering today’s processors. This paper presents an overview of GPUs from a power management point of view and what power management the power management features of NVIDIA GPUs would allow features have been found when reverse engineering modern to improve the efficiency of the Linux community driver for NVIDIA GPUs. This paper finally discusses about the possibility NVIDIA GPUs called Nouveau [3]. This work will allow of achieving self-management on NVIDIA GPUs and also dis- driving the hardware more efficiently which should reduce the cusses the future challenges that will be faced by the community temperature, lower the fan noise and improve the battery-life when autonomic systems like Broadcom’s videocore R will become of laptops. mainstream. I. INTRODUCTION Nouveau is a fork of NVIDIA’s limited Open Source driver, xf86-video-nv [4] by Stephane Marchesin aimed at delivering Historically, CPU and GPU manufacturers were aiming Open Source 3D acceleration along with a port to the new to increase performance as it was the usual metric used by graphics Open Source architecture DRI [5].