High Performance, Ultra-Low Power Streaming Systems

Total Page:16

File Type:pdf, Size:1020Kb

High Performance, Ultra-Low Power Streaming Systems TECHNICAL UNIVERSITY OF CATALONIA School of Informatics Master on Computer Architecture, Networks and Systems Master Thesis High Performance, Ultra-Low Power Streaming Systems Student: Jos´eMar´ıaArnau Advisor: Joan-Manuel Parcerisa Advisor: Polychronis Xekalakis Advisor: Antonio Gonz´alez Date: September 20, 2011 2 Abstract Smartphones are emerging as one of the fastest growing markets, with new devices and im- provements in their operating systems taking place every few months. The design of a CPU/GPU for such a mobile devices is challenging due to the users demands for a truly mobile computing experience, including highly responsive user interfaces, uncompromised web browsing performance or visually compelling gaming experiences, and the power constrains due to the limited capacity of the battery. In the last years, the power demand of these mobile devices has increased much faster than the battery improvements. Our key ambition is to design a CPU/GPU for such a system, trying to minimize the power consumed, while also achieving the highest performance possible. We first analyze commercial Android workloads and establish that the most demanding applications in terms of performance are, as expected, games. We show that because these systems are based on OpenGL ES, the vast majority of the time the CPU is idle. In fact, we find that the GPU is much more active than the CPU and that the major performance limitation for these systems is the use of memory by the GPU. We thus focus on the GPU and more specifically on its memory behavior. We show that for most of the caches employed in these systems, traditional prefetchers provide significant benefits. The exception is the texture cache, for which the patterns are irregular, especially for 3D games. We then demonstrate how we can aleviate this issue by using a decoupled access/execute like architecture. We also show that an important part of the power consumed can be reduced by carefully moving data around and by orchestrating the accesses to the L2 cache. The end design is able to achieve similar performance with a more traditional many-warp system, while consuming only a fraction of its power. Our experimental results using the latest version of Android and a commercial set of games, proves this claim. More specifically, our proposed system is able to achieve 29% improvements over state-of-the-art prefetchers, while consuming 6% less power. Keywords Prefetching, GPU, Android, rasterization, smartphones. 3 Contents 1 Introduction 11 1.1 Motivation......................................... 11 1.2 Objectives and contributions............................... 13 1.3 Organization........................................ 13 2 Related work 15 2.1 Rasterization........................................ 15 2.2 Android........................................... 17 2.2.1 Android Software Renderer............................ 19 2.3 State of the art architectures for mobile devices..................... 21 2.3.1 Qualcomm Snapdragon.............................. 21 2.3.2 PowerVR chipsets................................. 22 2.3.3 NVIDIA Tegra 2.................................. 24 2.4 Data Cache Prefetching.................................. 26 2.4.1 CPU prefetchers.................................. 26 2.4.2 GPU prefetchers.................................. 29 3 Problem statement: Memory Wall for Low Power GPUs 37 3.1 Hiding memory latency on a modern low power mobile GPU............. 37 4 Proposal: Decoupled Access Execute Prefetching 41 4.1 Ultra-low power decoupled prefetcher........................... 41 5 CONTENTS 4.1.1 Baseline GPU Architecture............................ 41 4.1.2 Decoupled prefetcher................................ 43 4.1.3 Decoupled prefetcher improvements....................... 45 5 Evaluation methodology 49 5.1 Simulation infrastructure................................. 49 5.1.1 GPU trace generation............................... 49 5.1.2 Cycle accurate GPU simulator.......................... 51 6 Experimental results 55 6.1 Workload characterization................................. 55 6.2 State of the art prefetchers performance......................... 59 6.3 Ultra-low power decoupled prefetcher performance................... 61 7 Conclusions 67 6 List of Figures 1.1 Smartphones sales vs Desktop and Notebook sales. Data obtained from [1]...... 12 1.2 Energy need vs energy available in a standard size battery. Two days of battery life cannot be achieved with current batteries and the gap is getting bigger. Data obtained from [2]....................................... 12 2.1 Initial scene, intermediate results produced by the different stages of the rasterization process and final result................................... 16 (a) 3D triangles plot................................... 16 (b) 2D triangles plot................................... 16 (c) Clipped 2D triangles plot.............................. 16 (d) Pixels after rasterization............................... 16 (e) Visible pixels after Z-test.............................. 16 (f) Shaded and textures pixels after Pixel stage.................... 16 2.2 Rasterization pipeline.................................... 18 2.3 Android architecture..................................... 19 2.4 Qualcomm Snapdragon System on Chip.......................... 21 2.5 PowerVR GPU architecture................................. 23 2.6 NVIDIA Tegra 2 architecture................................ 24 2.7 Ultra-low power GeForce architecture........................... 25 2.8 Stride prefetching table................................... 27 2.9 Markov prefetching. The left side of the figure shows the state of the correlation table after processing the miss address stream shown at the top of the figure. The right side illustrates the Markov transition graph that corresponds to the example miss address stream..................................... 28 7 LIST OF FIGURES 2.10 Distance prefetching. The address delta stream corresponds to the sequence of ad- dresses used in the example of figure 2.9.......................... 28 2.11 Global History Buffer.................................... 30 2.12 Distance prefetcher implemented by using a Global History Buffer. The Head Pointer points to the last inserted address in the GHB...................... 30 2.13 An overview of the baseline GPGPU architecture.................... 31 2.14 An example of memory address with/without warp interleaving............ 32 (a) Accesses by warps.................................. 32 (b) Accesses by a hardware prefetcher......................... 32 2.15 Many-thread aware hardware prefetcher.......................... 33 2.16 Throttling heuristics..................................... 34 2.17 Baseline architecture for texture mapping......................... 35 2.18 Texture cache prefetcher architecture........................... 35 3.1 Effectiveness of multithreading for hiding memory latency. As we increase the num- ber of warps on each processor we obtain better performance.............. 38 3.2 Power consumed by the GPU main register file for different configurations...... 38 4.1 Baseline GPU architecture (based on the ultra-low power GeForce GPU in the NVIDIA Tegra 2 chipset).................................. 42 4.2 Decoupled prefetcher architecture............................. 45 4.3 Improved decoupled prefetcher............................... 46 5.1 GPU trace generation system................................ 50 5.2 GPU architecture modelled by the cycle accurate simulator............... 52 6.1 CPU configuration for the experiments.......................... 56 6.2 CPI stacks for several Android applications. iCommando, Shooting Range 3D and PolyBreaker 3D are commercial games from the Android market............ 56 6.3 Misses per 1000 instructions for the different caches in the GPU............ 57 6.4 Texture and pixel cache analysis.............................. 57 8 LIST OF FIGURES 6.5 Analysis of the strides of the cache misses in the Pixel and Texture cache of one Streaming processor when running the 2D game iCommando. In the Sequitur gram- mars non-terminal symbols (rules) are represented by numbers and terminal symbols (strides) are represented by numbers in square brackets. After each rule we show the number of times the rule is applied to form the input sequence of strides. We only show the 5 most frequent rules of the grammar...................... 58 6.6 Analysis of the strides of the cache misses in the Pixel and Texture cache of one Streaming processor when running the 3D game PolyBreaker 3D. For each cache the figure shows the 5 most frequent rules of the grammar and the 5 most frequent strides. 58 6.7 GPU configuration for the experiments. The baseline GPU architecture is the one illustrated in figure 5.2................................... 60 6.8 Speedups for different state-of-the-art prefetchers..................... 60 6.9 Normalized power consumption for different state-of-the-art prefetchers........ 61 6.10 Ultra-low power decoupled prefetcher compared with state-of-the-art prefetchers... 62 6.11 Ultra-low power decoupled prefetcher compared with the distance prefetcher imple- mented with GHB...................................... 63 6.12 Decoupled prefetcher power consumption......................... 63 6.13 Normalized energy-delay product.............................. 64 6.14 Prefetch queue size evaluation. The graph shows the speedup achieved by the de- coupled prefetcher over the baseline GPU without prefetching for different sizes of the prefetch queue, for the game shooting........................
Recommended publications
  • Powervr SGX Series5xt IP Core Family
    PowerVR SGX Series5XT IP Core Family The PowerVR™ SGX Series5XT Graphics Processing Unit (GPU) IP core family is a series Features of highly efficient graphics acceleration IP cores that meet the multimedia requirements of • Most comprehensive IP core family the next generation of consumer, communications and computing applications. and roadmap in the industry PowerVR SGX Series5XT architecture is fully scalable for a wide range of area and • USSE2 delivers twice the peak performance requirements, enabling it to target markets from low cost feature-rich mobile floating point and instruction multimedia products to very high performance consoles and computing devices. throughput of Series5 USSE • YUV and colour space accelerators The family incorporates the second-generation Universal Scalable Shader Engine (USSE2™), for improved performance with a feature set that exceeds the requirements of OpenGL 2.0 and Microsoft Shader • Upgraded PowerVR Series5XT Model 3, enabling 2D, 3D and general purpose (GP-GPU) processing in a single core. shader-driven tile-based deferred rendering (TBDR) architecture • Multi-processor options enable scalability to higher performance • Support for all industry standard PowerVR SGX Family mobile and desktop graphics APIs and operating sytems Series5XT SGX543MP1-16, SGX544MP1-16, SGX554MP1-16 • Fully backwards compatible with PowerVR MBX and SGX Series5 Series5 SGX520, SGX530, SGX531, SGX535, SGX540, SGX545 Benefits Multi-standard API and OS • Extensive product line supports all area/performance requirements OpenGL
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 1 of 34 1 BOIES, SCHILLER & FLEXNER LLP WILLIAM A. ISAACSON (pro hac vice) 2 5301 Wisconsin Ave. NW, Suite 800 Washington, D.C. 20015 3 Telephone: (202) 237-2727 Facsimile: (202) 237-6131 4 Email: [email protected] 5 6 BOIES, SCHILLER & FLEXNER LLP BOIES, SCHILLER & FLEXNER LLP JOHN F. COVE, JR. (CA Bar No. 212213) PHILIP J. IOVIENO (pro hac vice) 7 DAVID W. SHAPIRO (CA Bar No. 219265) ANNE M. NARDACCI (pro hac vice) KEVIN J. BARRY (CA Bar No. 229748) 10 North Pearl Street 8 1999 Harrison St., Suite 900 4th Floor Oakland, CA 94612 Albany, NY 12207 9 Telephone: (510) 874-1000 Telephone: (518) 434-0600 Facsimile: (510) 874-1460 Facsimile: (518) 434-0665 10 Email: [email protected] Email: [email protected] [email protected] [email protected] 11 [email protected] 12 Attorneys for Plaintiff Jordan Walker Interim Class Counsel for Direct Purchaser 13 Plaintiffs 14 15 UNITED STATES DISTRICT COURT 16 NORTHERN DISTRICT OF CALIFORNIA 17 18 IN RE GRAPHICS PROCESSING UNITS ) Case No.: M:07-CV-01826-WHA ANTITRUST LITIGATION ) 19 ) MDL No. 1826 ) 20 This Document Relates to: ) THIRD CONSOLIDATED AND ALL DIRECT PURCHASER ACTIONS ) AMENDED CLASS ACTION 21 ) COMPLAINT FOR VIOLATION OF ) SECTION 1 OF THE SHERMAN ACT, 15 22 ) U.S.C. § 1 23 ) ) 24 ) ) JURY TRIAL DEMANDED 25 ) ) 26 ) ) 27 ) 28 THIRD CONSOLIDATED AND AMENDED CLASS ACTION COMPLAINT BY DIRECT PURCHASERS M:07-CV-01826-WHA Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 2 of 34 1 Plaintiffs Jordan Walker, Michael Bensignor, d/b/a Mike’s Computer Services, Fred 2 Williams, and Karol Juskiewicz, on behalf of themselves and all others similarly situated in the 3 United States, bring this action for damages and injunctive relief under the federal antitrust laws 4 against Defendants named herein, demanding trial by jury, and complaining and alleging as 5 follows: 6 NATURE OF THE CASE 7 1.
    [Show full text]
  • An Emerging Architecture in Smart Phones
    International Journal of Electronic Engineering and Computer Science Vol. 3, No. 2, 2018, pp. 29-38 http://www.aiscience.org/journal/ijeecs ARM Processor Architecture: An Emerging Architecture in Smart Phones Naseer Ahmad, Muhammad Waqas Boota * Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan Abstract ARM is a 32-bit RISC processor architecture. It is develop and licenses by British company ARM holdings. ARM holding does not manufacture and sell the CPU devices. ARM holding only licenses the processor architecture to interested parties. There are two main types of licences implementation licenses and architecture licenses. ARM processors have a unique combination of feature such as ARM core is very simple as compare to general purpose processors. ARM chip has several peripheral controller, a digital signal processor and ARM core. ARM processor consumes less power but provide the high performance. Now a day, ARM Cortex series is very popular in Smartphone devices. We will also see the important characteristics of cortex series. We discuss the ARM processor and system on a chip (SOC) which includes the Qualcomm, Snapdragon, nVidia Tegra, and Apple system on chips. In this paper, we discuss the features of ARM processor and Intel atom processor and see which processor is best. Finally, we will discuss the future of ARM processor in Smartphone devices. Keywords RISC, ISA, ARM Core, System on a Chip (SoC) Received: May 6, 2018 / Accepted: June 15, 2018 / Published online: July 26, 2018 @ 2018 The Authors. Published by American Institute of Science. This Open Access article is under the CC BY license.
    [Show full text]
  • Webcore: Architectural Support for Mobile Web Browsing
    WebCore: Architectural Support for Mobile Web Browsing Yuhao Zhu Vijay Janapa Reddi Department of Electrical and Computer Engineering The University of Texas at Austin [email protected], [email protected] Abstract The Web browser is undoubtedly the single most impor- Browser Browser tant application in the mobile ecosystem. An average user 63% 54% spends 72 minutes each day using the mobile Web browser. Web browser internal engines (e.g., WebKit) are also growing 23% 8% 32% Media 6% in importance because they provide a common substrate for 7% 7% Others developing various mobile Web applications. In a user-driven, Media Games Others interactive, and latency-sensitive environment, the browser’s Email performance is crucial. However, the battery-constrained na- (a) Time dist. of window focus. (b) Time dist. of CPU processing. ture of mobile devices limits the performance that we can de- Fig. 1: Mobile Web browser share study conducted by our industry liver for mobile Web browsing. As traditional general-purpose research partner on their employees’ devices [2]. Similar observa- techniques to improve performance and energy efficiency fall tions were reported by NVIDIA on Tegra-based mobile handsets [3,4]. short, we must employ domain-specific knowledge while still maintaining general-purpose flexibility. network limited. However, this trend is changing. With about In this paper, we first perform design-space exploration 10X improvement in round-trip time from 3G to LTE, network to identify appropriate general-purpose architectures that latency is no longer the only performance bottleneck [51]. uniquely fit the characteristics of a popular Web browsing Prior work has shown that over the past decade, network engine.
    [Show full text]
  • Driver Riva Tnt2 64
    Driver riva tnt2 64 click here to download The following products are supported by the drivers: TNT2 TNT2 Pro TNT2 Ultra TNT2 Model 64 (M64) TNT2 Model 64 (M64) Pro Vanta Vanta LT GeForce. The NVIDIA TNT2™ was the first chipset to offer a bit frame buffer for better quality visuals at higher resolutions, bit color for TNT2 M64 Memory Speed. NVIDIA no longer provides hardware or software support for the NVIDIA Riva TNT GPU. The last Forceware unified display driver which. version now. NVIDIA RIVA TNT2 Model 64/Model 64 Pro is the first family of high performance. Drivers > Video & Graphic Cards. Feedback. NVIDIA RIVA TNT2 Model 64/Model 64 Pro: The first chipset to offer a bit frame buffer for better quality visuals Subcategory, Video Drivers. Update your computer's drivers using DriverMax, the free driver update tool - Display Adapters - NVIDIA - NVIDIA RIVA TNT2 Model 64/Model 64 Pro Computer. (In Windows 7 RC1 there was the build in TNT2 drivers). http://kemovitra. www.doorway.ru Use the links on this page to download the latest version of NVIDIA RIVA TNT2 Model 64/Model 64 Pro (Microsoft Corporation) drivers. All drivers available for. NVIDIA RIVA TNT2 Model 64/Model 64 Pro - Driver Download. Updating your drivers with Driver Alert can help your computer in a number of ways. From adding. Nvidia RIVA TNT2 M64 specs and specifications. Price comparisons for the Nvidia RIVA TNT2 M64 and also where to download RIVA TNT2 M64 drivers. Windows 7 and Windows Vista both fail to recognize the Nvidia Riva TNT2 ( Model64/Model 64 Pro) which means you are restricted to a low.
    [Show full text]
  • (GPU) Computing
    Graphics Processing Unit (GPU) computing This section describes the graphics processing unit (GPU) computing feature of OptiSystem. Note: The GPU computing feature is only configurable with OptiSystem Version 11 (or higher) What is GPU computing? GPU computing or GPGPU takes advantage of a computer’s grahics processing card to augment the speed of general purpose scientific and engineering computing tasks. Compute Unified Device Architecture (CUDA) implementation for OptiSystem NVIDIA revolutionized the GPGPU and accelerated computing when it introduced a new parallel computing architecture: Compute Unified Device Architecture (CUDA). CUDA is both a hardware and software architecture for issuing and managing computations within the GPU, thus allowing it to operate as a generic data-parallel computing device. CUDA allows the programmer to take advantage of the parallel computing power of an NVIDIA graphics card to perform general purpose computations. OptiSystem CUDA implementation The OptiSystem model for GPU computing involves using a central processing unit (CPU) and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. In the OptiSystem GPU programming model, the application has been modified to map the compute-intensive kernels to the GPU. The remainder of the application remains within the CPU. CUDA parallel computing architecture The NVIDIA CUDA parallel computing architecture is enabled on GeForce®, Quadro®, and Tesla™ products. Whereas GeForce and Quadro are designed for consumer graphics and professional visualization respectively, the Tesla product family is designed ground-up for parallel computing and offers exclusive computing 3 GRAPHICS PROCESSING UNIT (GPU) COMPUTING features, and is the recommended choice for the OptiSystem GPU.
    [Show full text]
  • A Configurable General Purpose Graphics Processing Unit for Power, Performance, and Area Analysis
    Iowa State University Capstones, Theses and Creative Components Dissertations Summer 2019 A configurable general purpose graphics processing unit for power, performance, and area analysis Garrett Lies Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Digital Circuits Commons Recommended Citation Lies, Garrett, "A configurable general purpose graphics processing unit for power, performance, and area analysis" (2019). Creative Components. 329. https://lib.dr.iastate.edu/creativecomponents/329 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A Configurable General Purpose Graphics Processing Unit for Power, Performance, and Area Analysis by Garrett Joseph Lies A Creative Component submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Computer Engineering (Very-Large Scale Integration) Program of Study Committee: Joseph Zambreno, Major Professor The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation/thesis. The Graduate College will ensure this dissertation/thesis is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Garrett Joseph Lies, 2019. All rights reserved. ii TABLE OF CONTENTS Page LIST OF TABLES . iv LIST OF FIGURES . .v ACKNOWLEDGMENTS . vi ABSTRACT . vii CHAPTER 1.
    [Show full text]
  • Troubleshooting Guide Table of Contents -1- General Information
    Troubleshooting Guide This troubleshooting guide will provide you with information about Star Wars®: Episode I Battle for Naboo™. You will find solutions to problems that were encountered while running this program in the Windows 95, 98, 2000 and Millennium Edition (ME) Operating Systems. Table of Contents 1. General Information 2. General Troubleshooting 3. Installation 4. Performance 5. Video Issues 6. Sound Issues 7. CD-ROM Drive Issues 8. Controller Device Issues 9. DirectX Setup 10. How to Contact LucasArts 11. Web Sites -1- General Information DISCLAIMER This troubleshooting guide reflects LucasArts’ best efforts to account for and attempt to solve 6 problems that you may encounter while playing the Battle for Naboo computer video game. LucasArts makes no representation or warranty about the accuracy of the information provided in this troubleshooting guide, what may result or not result from following the suggestions contained in this troubleshooting guide or your success in solving the problems that are causing you to consult this troubleshooting guide. Your decision to follow the suggestions contained in this troubleshooting guide is entirely at your own risk and subject to the specific terms and legal disclaimers stated below and set forth in the Software License and Limited Warranty to which you previously agreed to be bound. This troubleshooting guide also contains reference to third parties and/or third party web sites. The third party web sites are not under the control of LucasArts and LucasArts is not responsible for the contents of any third party web site referenced in this troubleshooting guide or in any other materials provided by LucasArts with the Battle for Naboo computer video game, including without limitation any link contained in a third party web site, or any changes or updates to a third party web site.
    [Show full text]
  • EVA: an Efficient Vision Architecture for Mobile Systems
    EVA: An Efficient Vision Architecture for Mobile Systems Jason Clemons, Andrea Pellegrini, Silvio Savarese, and Todd Austin Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, Michigan 48109 fjclemons, apellegrini, silvio, [email protected] Abstract The capabilities of mobile devices have been increasing at a momen- tous rate. As better processors have merged with capable cameras in mobile systems, the number of computer vision applications has grown rapidly. However, the computational and energy constraints of mobile devices have forced computer vision application devel- opers to sacrifice accuracy for the sake of meeting timing demands. To increase the computational performance of mobile systems we Figure 1: Computer Vision Example The figure shows a sock present EVA. EVA is an application-specific heterogeneous multi- monkey where a computer vision application has recognized its face. core having a mix of computationally powerful cores with energy The algorithm would utilize features such as corners and use their efficient cores. Each core of EVA has computation and memory ar- geometric relationship to accomplish this. chitectural enhancements tailored to the application traits of vision Watts over 250 mm2 of silicon, typical mobile processors are limited codes. Using a computer vision benchmarking suite, we evaluate 2 the efficiency and performance of a wide range of EVA designs. We to a few Watts with typically 5 mm of silicon [4] [22]. show that EVA can provide speedups of over 9x that of an embedded To meet the limited computation capability of mobile proces- processor while reducing energy demands by as much as 3x. sors, computer vision application developers reluctantly sacrifice image resolution, computational precision or application capabili- Categories and Subject Descriptors C.1.4 [Parallel Architec- ties for lower quality versions of vision algorithms.
    [Show full text]
  • POWERVR 3D Application Development Recommendations
    Imagination Technologies Copyright POWERVR 3D Application Development Recommendations Copyright © 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is protected by copyright. The information contained in this publication is subject to change without notice and is supplied 'as is' without warranty of any kind. Imagination Technologies and the Imagination Technologies logo are trademarks or registered trademarks of Imagination Technologies Limited. All other logos, products, trademarks and registered trademarks are the property of their respective owners. Filename : POWERVR. 3D Application Development Recommendations.1.7f.External.doc Version : 1.7f External Issue (Package: POWERVR SDK 2.05.25.0804) Issue Date : 07 Jul 2009 Author : POWERVR POWERVR 1 Revision 1.7f Imagination Technologies Copyright Contents 1. Introduction .................................................................................................................................4 1. Golden Rules...............................................................................................................................5 1.1. Batching.........................................................................................................................5 1.1.1. API Overhead ................................................................................................................5 1.2. Opaque objects must be correctly flagged as opaque..................................................6 1.3. Avoid mixing
    [Show full text]
  • 6Th Generation Intel® Core™ Processors Based on the Mobile U-Processor for Iot Solutions (Intel® Core™ I7-6600U, I5-6300U, and I3-6100U Processors)
    PLATFORM BRIEF 6th Generation Intel® Core™ Mobile Processor Family Internet of Things 6th Generation Intel® Core™ Processors Based on the Mobile U-Processor for IoT Solutions (Intel® Core™ i7-6600U, i5-6300U, and i3-6100U Processors) Harness the Performance, Features, and Edge-to-Cloud Scalability to Build Tomorrow’s IoT Solutions Today Product Overview Stunning Visual Performance Intel is proud to announce its 6th The 6th generation Intel Core generation Intel® Core™ processor processors utilize the new Gen9 family featuring ultra low-power, graphics engine, which improves 64-bit, multicore processors built on graphic performance by up to the latest 14 nm technology. Designed 34 percent.1 The improvements are for small form-factor applications, this demonstrated through faster 3-D multichip package (MCP) integrates graphics performance and rendering a low-power CPU and platform applications at low power. Video controller hub (PCH) onto a common playback is also faster and smoother package substrate. thanks to the new multiplane overlay capability. The new generation offers The 6th generation Intel Core processor up to three independent audio streams family offers dramatically higher CPU and displays, Ultra HD 4K support, and and graphics performance, a broad workload consolidation for lower BOM range of power and features scaling costs and energy output. the entire Intel product line, and new, advanced features that boost edge-to- Users will also enjoy enhanced cloud Internet of Things (IoT) designs high-density streaming applications in a wide variety of markets. These and optimized 4K videoconferencing processors run at 15W thermal design with accelerated 4K hardware media power (TDP) and are ideal for small, codecs HEVC (8-bit), VP8, VP9, and energy-efficient, form-factor designs, VDENC encoding, decoding, and including digital signage, point-of-sale transcoding.
    [Show full text]