Download.Nvidia.Com/Pdf/Tegra/Tegra-X1-Whitepaper

Total Page:16

File Type:pdf, Size:1020Kb

Download.Nvidia.Com/Pdf/Tegra/Tegra-X1-Whitepaper Models and Techniques for Designing Mobile System-on-Chip Devices by Ayub Ahmed Gubran A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Electrical and Computer Engineering) The University of British Columbia (Vancouver) August 2020 © Ayub Ahmed Gubran, 2020 The following individuals certify that they have read, and recommend to the Fac- ulty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Models and Techniques for Designing Mobile System-on-Chip Devices submitted by Ayub Ahmed Gubran in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Examining Committee: Tor M. Aamodt, Electrical and Computer Engineering Supervisor Steve Wilton, Electrical and Computer Engineering Supervisory Committee Member Alan Hu, Computer Science University Examiner Andre Ivanov, Electrical and Computer Engineering University Examiner John Owens, The University of California-Davis, Electrical and Computer Engineering External Examiner Additional Supervisory Committee Members: Sidney Fels, Electrical and Computer Engineering Supervisory Committee Member ii Abstract Mobile SoCs have become ubiquitous computing platforms, and, in recent years, they have become increasingly heterogeneous and complex. A typical SoC to- day includes CPUs, GPUs, image processors, video encoders/decoders, and AI engines. This dissertation addresses some of the challenges associated with SoCs in three pieces of work. The first piece of work develops a cycle-accurate model, Emerald, which pro- vides a platform for studying system-level SoC interactions while including the impact of graphics. Our cycle-accurate infrastructure builds upon well-established tools, GPGPU-Sim and gem5, with support for graphics and GPGPU workloads, and full system simulation with Android. We present two case studies using Emer- ald. First, we use Emerald’s full-system mode to highlight the importance of system-wide interactions by studying and analyzing memory organization and schedul- ing in SoCs. Second, we use Emerald’s standalone mode to evaluate a dynamic mechanism for balancing the shading work assigned to GPU cores. Our dynamic mechanism speeds up frame rendering by 7.3–19% compared to static load-balancing. The second work highlights the time-variant traffic asymmetry in heteroge- neous SoCs. We analyze the impact of this asymmetry on network performance and propose interleaved source injection (ISI), an interconnect topology and asso- ciated flow control mechanism to manage time-varying asymmetric network traf- fic. We evaluate ISI using stochastic traffic patterns and a set of traces that emulate mobile use cases with traffic from various IP blocks. We show that ISI increases saturation throughput by 80–184% for 12% increase in NoC area. In the last piece of work, we study the compression properties of framebuffer surfaces and highlight the characteristics of surfaces generated by different appli- iii cations. We use our analysis to propose Dynamic Color Palettes (DCP), a hardware scheme that dynamically constructs color palettes and employs them to efficiently compress framebuffer surfaces. We evaluated DCP against a set of 124 workloads and found that DCP improves compression rates by 91% for UI and 20% for 2D applications compared to previous proposals. We also propose a hybrid scheme (HDCP) that combines DCP with a generic compression scheme. HDCP outper- forms previous proposals by 161%, 124% and 83% for UI, 2D, and 3D applica- tions, respectively. iv Lay Summary Mobile devices, like phone and tablets, have become interwoven into the fabric of our daily lives. These devices are powered by integrated mobile chips, or system- on-chips (SoCs). Modern SoCs consist of a dozen or more specialized modules, where each module specializes in a specific task, like processing audio or video, rendering graphics, computing AI decisions, encrypting data, etc. As SoCs have become larger and more complicated, challenges have followed. These challenges include figuring out how to design hardware and software for such systems to deliver the best possible user experience. This dissertation con- tributes a set of tools that allow us to improve our understanding of how SoCs work and how their modules interact. The work in this dissertation also studies and proposes techniques to improve how SoC modules communicate, and techniques to reduce the energy consumption of SoCs’ memory system. v Preface The work presented in this dissertation was carried by the author Ayub A. Gubran under the supervision of Prof. Tor M. Aamodt at the University of British Columbia, Point Grey campus. The research material presented in this dissertation, namely, Chapter 2, Chap- ter 3, and Chapter 4, is based on work that has either been published or currently under submission as detailed below. A version of Chapter 2 has been presented at the 2019 International Interna- tional Symposium on Computer Architecture (ISCA’19) and was included in the proceedings as: • Ayub A. Gubran and Tor M. Aamodt. 2019. Emerald: graphics modeling for SoC systems. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). ACM, New York, NY, USA, 169-182. DOI: https://doi.org/10.1145/3307650.3322221. Ayub A. Gubran was the lead investigator, responsible for all major areas of con- cept formation, implementation, data collection and analysis, as well as manuscript composition. Prof. Tor M. Aamodt provided early concept formation, techni- cal guidance and feedback, and contributed to editing the final version of the manuscript. A version of Chapter 3 is currently under submission as “Ayub A. Gubran and Tor M. Aamodt. The Case for Interleaved Source Injection in Networks with Asymmetric Loads”. Ayub A. Gubran was the lead investigator, responsible for all major areas of concept formation, implementation, data collection and analysis, as well as manuscript composition. Francois Demoullin implemented the scripts vi that were used to carry the experiments in Section 3.6.3.1. He also carried and col- lected the data for a number of experiments in the early stages of the work under the supervision of A. Gubran. Prof. Tor M. Aamodt provided technical guidance and feedback throughout the process and aided in writing the manuscript. A version of Chapter 4 was presented as a peer reviewed poster at the 7th Con- ference on High-Performance Graphics (HPG ’15) as “Framebuffer Compression Using Dynamic Color Palettes”. Also a pre-print version of this work is published as: • Ayub A. Gubran, Felix Huang, and Tor M. Aamodt. “Surface Compression Using Dynamic Color Palettes.” arXiv preprint arXiv:1903.06658 (2019). Ayub A. Gubran was the lead investigator, responsible for all major areas of con- cept formation, implementation, data collection and analysis. Felix Huang worked under the supervision of Ayub Gubran; he collected the workloads listed in Ta- ble 4.2, evaluated replacement policies (Section 4.7.3.2), and worked on collecting the results of the hybrid compression scheme (Figure 4.18). Prof. Tor M. Aamodt provided technical guidance and feedback throughout the process. vii Table of Contents Abstract . iii Lay Summary . v Preface . vi Table of Contents . viii List of Tables . xiv List of Figures . xvi List of Abbreviations . xx Acknowledgments . xxiii Dedication . xxiv 1 Introduction . 1 1.1 Motivation . 2 1.2 Research Challenges . 5 1.3 Research Objectives and Contributions . 6 1.4 Dissertation Organization . 7 2 Emerald: Graphics Modeling for SoC Systems . 9 2.1 Introduction . 9 2.2 Background . 12 viii 2.2.1 Computer Architecture Simulators . 12 2.2.2 Graphics APIs . 14 2.2.2.1 The OpenGL Pipeline . 14 2.2.2.2 Recent Trends in Graphics APIs . 17 2.2.2.3 Example: Life of a Triangle . 17 2.2.3 Hardware Graphics Pipeline Optimizations . 19 2.2.3.1 Early Elimination of Invisible Primitives and Frag- ments . 19 2.2.3.1.a Early Depth Testing . 19 2.2.3.1.b Hierarchical Depth Testing . 20 2.2.3.1.c Deferred Shading . 22 2.2.3.1.d Primitive Culling and Clipping . 23 2.2.3.1.e Face Culling . 23 2.2.3.1.f View Frustum Culling . 23 2.2.3.1.g Clipping . 25 2.2.3.2 Hierarchical Position Rasterization . 26 2.2.3.3 Summary . 27 2.2.4 Graphics Hardware Architectures . 27 2.2.4.1 Tile-Based Rendering Architectures . 28 2.2.4.1.a TBR Performance Bottlenecks . 30 2.2.4.1.b Contemporary TBR Architectures . 31 2.2.4.2 Immediate Tiled Rendering Architectures . 31 2.2.5 GPU Compute Architecture . 33 2.3 Emerald SoC Architecture . 34 2.4 Emerald Graphics Architecture . 37 2.4.1 Emerald Graphics Pipeline . 38 2.4.1.1 Pipeline Overview . 38 2.4.1.2 Description of Pipeline Stages . 39 2.4.2 Emerald GPU Architecture . 42 2.4.3 Vertex Shading . 44 2.4.4 Primitive Processing . 46 2.4.5 Hierarchical-Z (Hi-Z) Operations . 49 2.4.6 The TC Stage . 52 ix 2.4.6.1 Tile Coalescing Example . 55 2.4.6.2 Out-of-order Primitive Rendering . 56 2.4.7 Model Validation . 56 2.4.8 Model Limitations and Future Work . 58 2.5 Emerald Software Design . 60 2.5.1 Emerald Standalone Mode . 60 2.5.2 Emerald Full-system Mode . 61 2.6 Case Study I: Memory Organization and Scheduling on Mobile SoCs . 61 2.6.1 Implementation . 62 2.6.1.1 DASH Scheduler . 62 2.6.1.1.a Clustering Bandwidth . 62 2.6.1.2 HMC Controller . 63 2.6.2 Evaluation . 65 2.6.2.1 Regular-load Scenario . 65 2.6.2.2 High-load Scenario . 67 2.6.2.3 Summary and Discussion . 69 2.7 Case Study II: Dynamic Fragment Shading Load-Balancing (DFSL) . 71 2.7.1 Experimental Setup . 71 2.7.2 Load-Balance vs.
Recommended publications
  • High End Visualization with Scalable Display System
    HIGH END VISUALIZATION WITH SCALABLE DISPLAY SYSTEM Dinesh M. Sarode*, Bose S.K.*, Dhekne P.S.*, Venkata P.P.K.*, Computer Division, Bhabha Atomic Research Centre, Mumbai, India Abstract display, then the large number of pixels shows the picture Today we can have huge datasets resulting from in greater details and interaction with it enables the computer simulations (CFD, physics, chemistry etc) and greater insight in understanding the data. However, the sensor measurements (medical, seismic and satellite). memory constraints, lack of the rendering power and the There is exponential growth in computational display resolution offered by even the most powerful requirements in scientific research. Modern parallel graphics workstation makes the visualization of this computers and Grid are providing the required magnitude difficult or impossible. computational power for the simulation runs. The rich While the cost-performance ratio for the component visualization is essential in interpreting the large, dynamic based on semiconductor technologies doubling in every data generated from these simulation runs. The 18 months or beyond that for graphics accelerator cards, visualization process maps these datasets onto graphical the display resolution is lagging far behind. The representations and then generates the pixel resolutions of the displays have been increasing at an representation. The large number of pixels shows the annual rate of 5% for the last two decades. The ability to picture in greater details and interaction with it enables scale the components: graphics accelerator and display by the greater insight on the part of user in understanding the combining them is the most cost-effective way to meet data more quickly, picking out small anomalies that could the ever-increasing demands for high resolution.
    [Show full text]
  • Download Speeds Performance and Optimized for Long Battery Life
    ® QUALCOMM FEATURING THE LATEST IN MOBILE TECHNOLOGY. TM SNAPDRAGON Capture sharper, higher-quality images, in challenging lighting situations Qualcomm’s Spectra 14-bit dual image signal processors tap into the enhanced performance and feature enhancements that Hexagon 680 DSP’s HVX 820MOBILE PROCESSOR performance adds with amazing features like Low Light Photo and Video and Touch-to-Track where ISP and DSP work intelligently together to enhance imaging as well as track movements and improve zoom. Enabling a more immersive, intuitive and Immersive, life-like connected experience. virtual reality The Snapdragon 820 mobile processor Experience realistic, visual and audio immersion and offers many advantages: smooth VR action enabled by Snapdragon 820’s • New X12 LTE: Industry leading Heterogeneous compute platform, designed for high connectivity with LTE download speeds performance and optimized for long battery life. of up to 600 Mbps and multi-gigabit 802.11ad Wi-Fi • New Qualcomm® Kryo CPU: Delivering Next-generation maximum performance and low power consumption Kryo is QTI’s first custom computer vision 64-bit quad-core CPU, manufactured Drive more safely with object detection and enhanced in advanced 14nm FinFET LPP process navigation and enhance your smartphone camera • New Qualcomm® Adreno 530: Up to capability with features that can track faces and 40% better graphics and compute objects for a more intelligent mobile experience. performance with the Adreno 530 GPU • Qualcomm Spectra™ 14-bit dual image signal processors (ISPs) deliver high Deeply immersive resolution DSLR-quality images using heterogeneous compute for advanced 3D gaming processing and additional power The combination of Snapdragon 820’s Adreno 530 GPU savings and Kryo CPU creates enough compute performance • New Hexagon 680 DSP includes to enable console quality games and exciting, next Hexagon Vector eXtensions and Sensor generation virtual reality applications.
    [Show full text]
  • Overview: Graphics Processing Units
    Overview: Graphics Processing Units l advent of GPUs l GPU architecture n the NVIDIA Fermi processor l the CUDA programming model n simple example, threads organization, memory model n case study: matrix multiply using shared memory n memories, thread synchronization, scheduling n case study: reductions n performance considerations: bandwidth, scheduling, resource conflicts, instruction mix u host-device data transfer: multiple GPUs, NVLink, Unified Memory, APUs l the OpenCL programming model l directive-based programming models refs: Lin & Snyder Ch 10, CUDA Toolkit Documentation, An Even Easier Introduction to CUDA (tutorial); NCI NF GPU page, Programming Massively Parallel Processors, Kirk & Hwu, Morgan-Kaufman, 2010; Cuda By Example, by Sanders and Kandrot; OpenCL web page, OpenCL in Action, by Matthew Scarpino COMP4300/8300 L18,19: Graphics Processing Units 2021 JJJ • III × 1 Advent of General-purpose Graphics Processing Units l many applications have massive amounts of mostly independent calculations n e.g. ray tracing, image rendering, matrix computations, molecular simulations, HDTV n can be largely expressed in terms of SIMD operations u implementable with minimal control logic & caches, simple instruction sets l design point: maximize number of ALUs & FPUs and memory bandwidth to take advantage of Moore’s’ Law (shown here) n put this on a co-processor (GPU); have a normal CPU to co-ordinate, run the operating system, launch applications, etc l architecture/infrastructure development requires a massive economic base for its development (the gaming industry!) n pre 2006: only specialized graphics operations (integer & float data) n 2006: ‘General Purpose’ (GPGPU): general computations but only through a graphics library (e.g.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • An Evolution of Mobile Graphics
    AN EVOLUTION OF MOBILE GRAPHICS Michael C. Shebanow Vice President, Advanced Processor Lab Samsung Electronics July 20, 20131 DISCLAIMER • The views herein are my own • They do not represent Samsung’s vision nor product plans 2 • The Mobile Market • Review of GPU Tech • GPU Efficiency • User Experience • Tech Challenges • Summary 3 The Rise of the Mobile GPU & Connectivity A NEW WORLD COMING? 4 DISCRETE GPU MARKET Flattening 5 MOBILE GPU MARKET Smart • In 2012, an estimated 800+ Phones million mobile GPUs shipped “Phablets” • ~123M tablets • ~712M smart phones Tablets • Will easily exceed 1B in the coming years • Trend: • Discrete GPU relatively flat • Mobile is growing rapidly 6 WW INTERNET TRAFFIC • Source: Cisco VNI Mobile INET IP Traffic growth Traffic • Internet traffic growth Year (TB/sec) rate (TB/sec) rate is staggering 2005 0.9 0.00 2006 1.5 65% 0.00 • 2012 total traffic is 2007 2.5 61% 0.01 13.7 GB per person 2008 3.8 54% 0.01 per month 2009 5.6 45% 0.04 2010 7.8 40% 0.10 • 2012 smart phone 2011 10.6 36% 0.23 traffic at 2012 12.4 17% 0.34 0.342 GB per person per month • 2017 smart phone traffic expected at 2.7 GB per person per month 7 WHERE ARE WE HEADED?… • Enormous quantity of GPUs • Large amount of interconnectivity • Better I/O 8 GPU Pipelines A BRIEF REVIEW OF GPU TECH 9 MOBILE GPU PIPELINE ARCHITECTURES Tile-based immediate mode rendering IA VS CCV RS PS ROP (TBIMR) Tile-based deferred IA VS CCV scene rendering (TBDR) RS PS ROP IA = input assembler VS = vertex shader CCV = cull, clip, viewport transform RS = rasterization,
    [Show full text]
  • Comparison of Technologies for General-Purpose Computing on Graphics Processing Units
    Master of Science Thesis in Information Coding Department of Electrical Engineering, Linköping University, 2016 Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman Master of Science Thesis in Information Coding Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman LiTH-ISY-EX–16/4923–SE Supervisor: Robert Forchheimer isy, Linköpings universitet Åsa Detterfelt MindRoad AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet Organisatorisk avdelning Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2016 Torbjörn Sörman Abstract The computational capacity of graphics cards for general-purpose computing have progressed fast over the last decade. A major reason is computational heavy computer games, where standard of performance and high quality graphics con- stantly rise. Another reason is better suitable technologies for programming the graphics cards. Combined, the product is high raw performance devices and means to access that performance. This thesis investigates some of the current technologies for general-purpose computing on graphics processing units. Tech- nologies are primarily compared by means of benchmarking performance and secondarily by factors concerning programming and implementation. The choice of technology can have a large impact on performance. The benchmark applica- tion found the difference in execution time of the fastest technology, CUDA, com- pared to the slowest, OpenCL, to be twice a factor of two. The benchmark applica- tion also found out that the older technologies, OpenGL and DirectX, are compet- itive with CUDA and OpenCL in terms of resulting raw performance. iii Acknowledgments I would like to thank Åsa Detterfelt for the opportunity to make this thesis work at MindRoad AB.
    [Show full text]
  • Gscale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics
    gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space Mochi Xue, Shanghai Jiao Tong University and Intel Corporation; Kun Tian, Intel Corporation; Yaozu Dong, Shanghai Jiao Tong University and Intel Corporation; Jiacheng Ma, Jiajun Wang, and Zhengwei Qi, Shanghai Jiao Tong University; Bingsheng He, National University of Singapore; Haibing Guan, Shanghai Jiao Tong University https://www.usenix.org/conference/atc16/technical-sessions/presentation/xue This paper is included in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0 Open access to the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16) is sponsored by USENIX. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space Mochi Xue1,2, Kun Tian2, Yaozu Dong1,2, Jiacheng Ma1, Jiajun Wang1, Zhengwei Qi1, Bingsheng He3, Haibing Guan1 {xuemochi, mjc0608, jiajunwang, qizhenwei, hbguan}@sjtu.edu.cn {kevin.tian, eddie.dong}@intel.com [email protected] 1Shanghai Jiao Tong University, 2Intel Corporation, 3National University of Singapore Abstract As one of the key enabling technologies of GPU cloud, GPU virtualization is intended to provide flexible and With increasing GPU-intensive workloads deployed on scalable GPU resources for multiple instances with high cloud, the cloud service providers are seeking for practi- performance. To achieve such a challenging goal, sev- cal and efficient GPU virtualization solutions. However, eral GPU virtualization solutions were introduced, i.e., the cutting-edge GPU virtualization techniques such as GPUvm [28] and gVirt [30]. gVirt, also known as GVT- gVirt still suffer from the restriction of scalability, which g, is a full virtualization solution with mediated pass- constrains the number of guest virtual GPU instances.
    [Show full text]
  • Order Independent Transparency in Opengl 4.X Christoph Kubisch – [email protected] TRANSPARENT EFFECTS
    Order Independent Transparency In OpenGL 4.x Christoph Kubisch – [email protected] TRANSPARENT EFFECTS . Photorealism: – Glass, transmissive materials – Participating media (smoke...) – Simplification of hair rendering . Scientific Visualization – Reveal obscured objects – Show data in layers 2 THE CHALLENGE . Blending Operator is not commutative . Front to Back . Back to Front – Sorting objects not sufficient – Sorting triangles not sufficient . Very costly, also many state changes . Need to sort „fragments“ 3 RENDERING APPROACHES . OpenGL 4.x allows various one- or two-pass variants . Previous high quality approaches – Stochastic Transparency [Enderton et al.] – Depth Peeling [Everitt] 3 peel layers – Caveat: Multiple scene passes model courtesy of PTC required Peak ~84 layers 4 RECORD & SORT 4 2 3 1 . Render Opaque – Depth-buffer rejects occluded layout (early_fragment_tests) in; fragments 1 2 3 . Render Transparent 4 – Record color + depth uvec2(packUnorm4x8 (color), floatBitsToUint (gl_FragCoord.z) ); . Resolve Transparent 1 2 3 4 – Fullscreen sort & blend per pixel 4 2 3 1 5 RESOLVE . Fullscreen pass uvec2 fragments[K]; // encodes color and depth – Not efficient to globally sort all fragments per pixel n = load (fragments); sort (fragments,n); – Sort K nearest correctly via vec4 color = vec4(0); register array for (i < n) { blend (color, fragments[i]); – Blend fullscreen on top of } framebuffer gl_FragColor = color; 6 TAIL HANDLING . Tail Handling: – Discard Fragments > K – Blend below sorted and hope error is not obvious [Salvi et al.] . Many close low alpha values are problematic . May not be frame- coherent (flicker) if blend is not primitive- ordered K = 4 K = 4 K = 16 Tailblend 7 RECORD TECHNIQUES . Unbounded: – Record all fragments that fit in scratch buffer – Find & Sort K closest later + fast record - slow resolve - out of memory issues 8 HOW TO STORE .
    [Show full text]
  • CES 2016 Exhibitor Listing As of 1/19/16
    CES 2016 Exhibitor Listing as of 1/19/16 Name Booth * Cosmopolitan Vdara Hospitality Suites 1 Esource Technology Co., Ltd. 26724 10 Vins 80642 12 Labs 73846 1Byone Products Inc. 21953 2 the Max Asia Pacific Ltd. 72163 2017 Exhibit Space Selection 81259 3 Legged Thing Ltd 12045 360fly 10417 360-G GmbH 81250 360Heros Inc 26417 3D Fuel 73113 3D Printlife 72323 3D Sound Labs 80442 3D Systems 72721 3D Vision Technologies Limited 6718 3DiVi Company 81532 3Dprintler.com 80655 3DRudder 81631 3Iware Co.,Ltd. 45005 3M 31411 3rd Dimension Industrial 3D Printing 73108 4DCulture Inc. 58005 4DDynamics 35483 4iiii Innovations, Inc. 73623 5V - All In One HC 81151 6SensorLabs BT31 Page 1 of 135 6sensorlabs / Nima 81339 7 Medical 81040 8 Locations Co., Ltd. 70572 8A Inc. 82831 A&A Merchandising Inc. 70567 A&D Medical 73939 A+E Networks Aria 36, Aria 53 AAC Technologies Holdings Inc. Suite 2910 AAMP Global 2809 Aaron Design 82839 Aaudio Imports Suite 30-116 AAUXX 73757 Abalta Technologies Suite 2460 ABC Trading Solution 74939 Abeeway 80463 Absolare USA LLC Suite 29-131 Absolue Creations Suite 30-312 Acadia Technology Inc. 20365 Acapella Audio Arts Suite 30-215 Accedo Palazzo 50707 Accele Electronics 1110 Accell 20322 Accenture Toscana 3804 Accugraphic Sales 82423 Accuphase Laboratory Suite 29-139 ACE CAD Enterprise Co., Ltd 55023 Ace Computers/Ace Digital Home 20318 ACE Marketing Inc. 59025 ACE Marketing Inc. 31622 ACECAD Digital Corp./Hongteli, DBA Solidtek 31814 USA Acelink Technology Co., Ltd. Suite 2660 Acen Co.,Ltd. 44015 Page 2 of 135 Acesonic USA 22039 A-Champs 74967 ACIGI, Fujiiryoki USA/Dr.
    [Show full text]
  • 3D Graphics on the ADS512101 Board Using Opengl ES By: Francisco Sandoval Zazueta Infotainment Multimedia and Telematics (IMT)
    Freescale Semiconductor Document Number: AN3793 Application Note Rev. 0, 12/2008 3D Graphics on the ADS512101 Board Using OpenGL ES by: Francisco Sandoval Zazueta Infotainment Multimedia and Telematics (IMT) 1 Introduction Contents 1 Introduction . 1 OpenGL is one of the most widely used graphic standard 2 Preparing the Environment . 2 2.1 Assumptions on the Environment . 2 specifications available. OpenGL ES is a reduced 2.2 Linux Target Image Builder (LTIB) . 2 adaptation of OpenGL that offers a powerful yet limited 2.3 Installing PowerVR Software Development Kit . 4 3 The PowerVR SDK . 5 version for embedded systems. 3.1 Introduction to SDK . 5 3.2 PVRShell . 5 One of the main features of the MPC5121e is its graphics 3.3 PVRtools . 6 co-processor, the MBX core. It is a wide spread standard 4 Developing Example Application . 6 of mobile 3D graphics acceleration for mobile solutions. 4.1 3D Model Loader. 6 5 Conclusion. 9 Together with Imagination Technologies OpenGL ES 1.1 6 References . 9 SDK, the ADS512101 board can be used to produce 7 Glossary . 10 Appendix A eye-catching graphics. This document is an introduction, Applying DIU Patches to the Kernel . 11 after the development environment is ready, it is up to the A.1 Applying the MBXpatch2.patch . 11 developer’s skills to exploit the board’s graphics A.2 byte_flip Application. 11 Appendix B capabilities. Brief Introduction to OpenGL ES . 12 B.1 OpenGL ES . 12 B.2 Main Differences Between OGLES 1.1 and . 12 B.3 Obtaining Frustum Numbers . 13 B.4 glFrustum.
    [Show full text]
  • Vivante Corporation Introduction
    World’s Smallest OpenGL ES 2.0 Silicon Introduction to Vivante Graphics Processor IP Khronos DevU - December 2009 2009 Silicon Proven in the Marketplace Vivante GPUs to ship in tier one consumer products four key market segments Smart Phone Digital Picture & Frame Mobile MID Gaming Netbook Home Embedded Entertainment 2009 World’s Smallest OpenGL ES 2.0 GPU Technology Leader in GPU IP Delivers 2x Advantages • Most efficient solution • 2x Performance per mm2 • Support 1080HD and higher • 30+ Licensees • Silicon proven in multiple applications 2009 smaller faster cooler 30+ Licensees Vivante Marketplace GC2000 in 65LP 500 MHz 100 Mtri/s 1.0 Gpix/s 18 GFLOPS GC400 in 65LP 2.5 mm2 silicon 49 mW active 13 Mtri/s 125 Mpix/s CAMERA ∗ MOBILE PHONE ∗ NAVIGATION ∗ PRINTER ∗ AUTOMOTIVE ∗ NETBOOK ∗ MID ∗ SET-TOP BOX ∗ HDTV ∗ MOBILE GAMING DTV Digital Smartphone MID Blu-ray Picture Camera GPS Netbook Frame Set-top box 2009 Vivante Differentiation ScalarMorphicTM Architecture GC GPU Core Multiple dimensions of AHB AXI scalability for optimal Host Interface cost/area/power balance Memory Controller Unified shaders Scalable multicore design 3-D Pipeline Texture Hyper-threading virtually Engine Engine 3-D RenderingEngine eliminates latency effects Engine Simple integration Ultra-threaded Unified Shader Ultra-threadedUltra-threaded Unified Unified Shader Shader Low bus bandwidth Graphics Up to 256 threads per Shader Pipeline Pixel Produces a small area, Front Engine high performance End 2-D Pipeline 2-D Drawing and Scaling Engine 2009 Complete Graphics
    [Show full text]
  • The Openvx™ Specification
    The OpenVX™ Specification Version 1.0.1 Document Revision: r31169 Generated on Wed May 13 2015 08:41:43 Khronos Vision Working Group Editor: Susheel Gautam Editor: Erik Rainey Copyright ©2014 The Khronos Group Inc. i Copyright ©2014 The Khronos Group Inc. All Rights Reserved. This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specifica- tion for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re-formatted AS LONG AS the contents of the specifi- cation are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group web-site should be included whenever possible with specification distributions.
    [Show full text]