Lecture 9 – Opengl

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 9 – Opengl 2018/2019, 4th quarter INFOGR: Graphics / NeedToKnow Lecture 9 – OpenGL Author: Jacco Bikker TL;DR In this installment of ‘NeedToKnow’ we switch to rasterization. After a brief introduction of the graphics pipeline we investigate the GPU architecture and OpenGL as a means to control this architecture. Figure 1: A typical game scene: static geometry, objects, and objects attached to objects. And bullets. Engine A 3D engine is the software component that produces images for a game. Early engines such as id Tech 1 (Doom), the Quake Engine and Unreal Engine 1 did little beyond that. Later engines such as the Source Engine (Half Life 2, Portal), Frostbite and Unity added quite a bit: physics, audio, scripting and multi-core aware job management. Such an engine is called a game engine, and part of the game engine is still a block that we refer to as the rendering engine. Figure 2: From left to right: Doom 1, Quake, Unreal 1. The task of the rendering engine is thus to produce images. The engine visualizes a virtual world, which typically consists of triangles. The triangles are grouped into meshes, and meshes are organized in a scene graph, which we will discuss in a minute. Visualization starts with the transform stage. The input for this stage is a stream of vertices and a 4x4 matrix; the output is a stream of vertices transformed into camera space. These vertices are then projected into screen space. “Rasterization” takes the final vertex positions and connectivity data (three vertex indices per triangle) to determine which pixels are affected by each triangle. These pixels are then shaded and drawn. Scene Graph Objects in the real world typically do not move independently. My hand is attached to my wrist, and I rotate when my chair rotates. The Earth revolves around the Sun, and the Moon around the Earth (Figure 3). When I jump during a moonwalk, I move relative to the Moon (Figure 4). Figure 3: Like this. Figure 4: Like so. Object relations are conveniently stored as a hierarchy, which is the scene graph. The spatial relations themselves are conveniently expressed using matrices*. If we store a matrix for each object in the world, the transform we should use for an individual scene graph node is the recursive concatenation of its matrix with the matrix of its ancestors. The camera is a special case: to get objects into camera space, we move everything by the inverse of the camera matrix. *: Note that quaternions are insufficient: these do not store translations. GPU Architecture Once upon a time (before 1997) graphics were simply produced by CPUs. Video processors did exist, but they simply provided a frame buffer, and a way to get this frame buffer to a television set or monitor*. Drawing a textured triangle using a CPU means: calculating the outline of the triangle, interpolating texture coordinates over this outline and finally filling the horizontal spans of pixels that make up the triangle. Early GPUs, such as the 3Dfx VooDoo, contained dedicated hardware for just the triangle filling. Transform, lighting and clipping all remained the responsibility of the CPU. The rasterization hardware was not programmable: it was fixed function hardware. NVidia's GeForce 256 also implemented transform and lighting in hardware. It implemented the following flow, which is still ‘fixed function’, and not programmable: This flow illustrates an important property of GPU architectures: a stream of data (vertices) is processed by a functional block that applies a single operation to all input elements. Likewise, the stream of pixels that leaves the rasterizer is processed by several blocks that together implement shading. The elements in the stream are independent from each other, can safely be processed in parallel, or in arbitrary order. This type of processing is called the streaming model, and it makes it relatively easy to design the GPU as a massively parallel unit. Shaders Modern GPUs keep the streaming model. They do however allow programming of certain parts of the functional flow. A vertex shader is a programmable transform and lighting block, and a pixel shader replaces many of the blocks that follow the rasterization phase. Rasterization itself remains fixed function hardware, even on the latest GPUs. *: Exceptions existed obviously, such as the Amiga blitter chip, the N64 and some arcade hardware. OpenGL Today there is a rather diverse collection of graphics processors from several vendors. AMD, Intel, NVidia and Imagination all produce GPUs, and typically renew their lineup every other year. Most vendors offer GPUs for mobile devices as well as for high-end rendering. Obviously, it is impossible to write software that supports all these GPUs, as well as future devices. For that reason, we use a hardware abstraction layer or HAL. The HAL offers a common programming interface for applications to the hardware. OpenGL, Vulkan, DirectX and Metal are therefore also known as graphics APIs. OpenGL’s history, in bullets: ▪ 1985: Silicon Graphics develops IRIS GL, which became the predecessor of OpenGL. ▪ 1992: An industry consortium, the ARB, is formed to steer development. ▪ 1995: Direct3D, the main competitor of OpenGL, is introduced by Microsoft. ▪ 1997: Microsoft and SGI attempt to unify APIs in Fahrenheit. ▪ In 2006, development of OpenGL is transferred to the Khronos Group. Some other important developments, to put things in perspective: ▪ 1997: Rise of the GPU, with 3Dfx’s VooDoo cards and GLQuake. (so: Direct3D started without hardware 3D support!) ▪ 2001: Rise of the Shader, with NVidia’s GeForce3. ▪ 2014: Multiple vendors introduce lower level APIs: Metal, Mantle, DirectX12. The history of OpenGL more or less ends in 2016, with the introduction of Vulkan. Vulkan, like Metal, Mantle and DirectX12, aims to give much lower level control over the hardware. This means more responsibilities for the developer, but also better performance. Practical Early versions of OpenGL operated in immediate mode. An example of this: public void TickGL() { GL.Begin( PrimitiveType.Triangles ); GL.Color3( 1.0f, 0, 0 ); GL.Vertex2( 0.0f, 1.0f ); GL.Color3( 0, 1.0f, 0 ); GL.Vertex2( -1.0f, -1.0f ); GL.Color3( 0, 0, 1.0f ); GL.Vertex2( 1.0f, -1.0f ); GL.End(); } Note that this code draws one triangle. We can put a loop around the central three commands to draw many triangles. They are however passed one by one to OpenGL, which doesn’t suit the streaming data model at all. OpenGL will try to batch operations to improve performance, but this is obviously quite high- level behavior, and not under programmer control. An additional problem is the constant communication between CPU and GPU. Transfers are slow, so this communication easily can become the bottleneck for an application. Modern OpenGL Although modern OpenGL still allows the use of immediate mode (which is sometimes really convenient), the use of the core profile is encouraged. This mode requires the use of vertex buffer objects (VBOs) to render graphics. A VBO is a buffer that it resides (in principle*) on the GPU. Rendering anything with the data in the buffer thus does not require transfers from the host to the rendering device. The execution flow now becomes: The main difference with the earlier shown flows is that all data is already on the GPU. Of course, we need to get it there first, but from there, we only send updates. This means that most textures and meshes are transferred only once. This is how all modern graphics APIs operate. State Machine OpenGL operates as a state machine. The state is a combination of: ▪ the active transformation matrix ▪ the active shaders and textures ▪ whether or not the z-buffer is enabled ▪ the render target (e.g. the screen, or a texture) ▪ …and many other properties. Drawing a mesh is strongly affected by this state. *: OpenGL does not guarantee this, but the programmer may assume this. In low-level APIs, such as Vulkan, this kind of decisions is explicitly transferred to the programmer. A similar mechanism is found in other APIs. This makes sense: the streaming model which GPUs use suffers whenever the state changes. A state change temporarily stops the vertex stream to e.g. adjust a matrix or some other constant (‘uniform’) data. And: drawing all polygons that use texture X at once maximizes cache efficiency. The scene graph matches this model. Between scene graph nodes the state changes. Inside a scene graph node, the state is constant. THE END That’s it for this installment. If you have any questions, feel free to ask by email or on Slack! INFOGR2019 – NEEDTOKNOW .
Recommended publications
  • Release Notes for X11R6.8.2 the X.Orgfoundation the Xfree86 Project, Inc
    Release Notes for X11R6.8.2 The X.OrgFoundation The XFree86 Project, Inc. 9February 2005 Abstract These release notes contains information about features and their status in the X.Org Foundation X11R6.8.2 release. It is based on the XFree86 4.4RC2 RELNOTES docu- ment published by The XFree86™ Project, Inc. Thereare significant updates and dif- ferences in the X.Orgrelease as noted below. 1. Introduction to the X11R6.8.2 Release The release numbering is based on the original MIT X numbering system. X11refers to the ver- sion of the network protocol that the X Window system is based on: Version 11was first released in 1988 and has been stable for 15 years, with only upwardcompatible additions to the coreX protocol, a recordofstability envied in computing. Formal releases of X started with X version 9 from MIT;the first commercial X products werebased on X version 10. The MIT X Consortium and its successors, the X Consortium, the Open Group X Project Team, and the X.OrgGroup released versions X11R3 through X11R6.6, beforethe founding of the X.OrgFoundation. Therewill be futuremaintenance releases in the X11R6.8.x series. However,efforts arewell underway to split the X distribution into its modular components to allow for easier maintenance and independent updates. We expect a transitional period while both X11R6.8 releases arebeing fielded and the modular release completed and deployed while both will be available as different consumers of X technology have different constraints on deployment. Wehave not yet decided how the modular X releases will be numbered. We encourage you to submit bug fixes and enhancements to bugzilla.freedesktop.orgusing the xorgproduct, and discussions on this server take place on <[email protected]>.
    [Show full text]
  • Matrox MGA-1064SG Developer Specification
    Matrox Graphics Inc. Matrox MGA-1064SG Developer Specification Document Number 10524-MS-0100 February 10, 1997 Trademark Acknowledgements MGA,™ MGA-1064SG,™ MGA-1164SG,™ MGA-2064W,™ MGA-2164W,™ MGA-VC064SFB,™ MGA-VC164SFB,™ MGA Marvel,™ MGA Millennium,™ MGA Mystique,™ MGA Rainbow Run- ner,™ MGA DynaView,™ PixelTOUCH,™ MGA Control Panel,™ and Instant ModeSWITCH,™ are trademarks of Matrox Graphics Inc. Matrox® is a registered trademark of Matrox Electronic Systems Ltd. VGA,® is a registered trademark of International Business Machines Corporation; Micro Channel™ is a trademark of International Business Machines Corporation. Intel® is a registered trademark, and 386,™ 486,™ Pentium,™ and 80387™ are trademarks of Intel Corporation. Windows™ is a trademark of Microsoft Corporation; Microsoft,® and MS-DOS® are registered trade- marks of Microsoft Corporation. AutoCAD® is a registered trademark of Autodesk Inc. Unix™ is a trademark of AT&T Bell Laboratories. X-Windows™ is a trademark of the Massachusetts Institute of Technology. AMD™ is a trademark of Advanced Micro Devices. Atmel® is a registered trademark of Atmel Corpora- tion. Catalyst™ is a trademark of Catalyst Semiconductor Inc. SGS™ is a trademark of SGS-Thompson. Toshiba™ is a trademark of Toshiba Corporation. Texas Instruments™ is a trademark of Texas Instru- ments. National™ is a trademark of National Semiconductor Corporation. Microchip™ is a trademark of Microchip Technology Inc. All other nationally and internationally recognized trademarks and tradenames are hereby acknowledged. This document contains confidential proprietary information that may not be disclosed without written permission from Matrox Graphics Inc. © Copyright Matrox Graphics Inc., 1997. All rights reserved. Disclaimer: Matrox Graphics Inc. reserves the right to make changes to specifications at any time and without notice.
    [Show full text]
  • Matrox MGA-2164W Developer's Specification
    Matrox Graphics Inc. Matrox MGA-2164W Developer’s Specification Document Number 10568-XX-0100 August 18, 1997 Trademark Acknowledgements MGA,™ MGA-1064SG,™ MGA-1164SG,™ MGA-2064W,™ MGA-2164W,™ MGA-VC064SFB,™ MGA-VC164SFB,™ MGA Marvel,™ MGA Millennium,™ MGA Mystique,™ MGA Rainbow Run- ner,™ MGA DynaView,™ PixelTOUCH,™ MGA Control Panel,™ and Instant ModeSWITCH,™ are trademarks of Matrox Graphics Inc. Matrox® is a registered trademark of Matrox Electronic Systems Ltd. VGA,® is a registered trademark of International Business Machines Corporation; Micro Channel™ is a trademark of International Business Machines Corporation. Intel® is a registered trademark, and 386,™ 486,™ Pentium,™ and 80387™ are trademarks of Intel Corporation. Windows™ is a trademark of Microsoft Corporation; Microsoft,® and MS-DOS® are registered trade- marks of Microsoft Corporation. AutoCAD® is a registered trademark of Autodesk Inc. Unix™ is a trademark of AT&T Bell Laboratories. X-Windows™ is a trademark of the Massachusetts Institute of Technology. AMD™ is a trademark of Advanced Micro Devices. Atmel® is a registered trademark of Atmel Corpora- tion. Catalyst™ is a trademark of Catalyst Semiconductor Inc. SGS™ is a trademark of SGS-Thompson. Toshiba™ is a trademark of Toshiba Corporation. Texas Instruments™ is a trademark of Texas Instru- ments. National™ is a trademark of National Semiconductor Corporation. Microchip™ is a trademark of Microchip Technology Inc. All other nationally and internationally recognized trademarks and tradenames are hereby acknowledged. This document contains confidential proprietary information that may not be disclosed without written permission from Matrox Graphics Inc. © Copyright Matrox Graphics Inc., 1997. All rights reserved. Disclaimer: Matrox Graphics Inc. reserves the right to make changes to specifications at any time and without notice.
    [Show full text]
  • Intel 815EM Chipset: 82815EM Graphics and Memory Controller
    查询82815E供应商 捷多邦,专业PCB打样工厂,24小时加急出货 R Intel® 815EM Chipset: 82815EM Graphics and Memory Controller Hub (GMCH2-M) Datasheet October 2000 Document Reference Number: 290689-001 Intel® 82815EM GMCH R Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Intel® 815EM chipset may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. I2C is a 2-wire communications bus/protocol developed by Philips. SMBus is a subset of the I2C bus/protocol and was developed by Intel.
    [Show full text]
  • Performance Analysis of Intel Gen9.5 Integrated GPU Architecture
    Performance Analysis of Intel Gen9.5 Integrated GPU Architecture Helder Francisco Pereira Duarte Thesis to obtain the Master of Science Degree in Electrical and Computer Engineering Supervisor(s): Doctor Aleksandar Ilic Examination Committee Chairperson: Doctor António Manuel Raminhos Cordeiro Grilo Supervisor: Doctor Aleksandar Ilic Member of the Committee: Doctor Ricardo Jorge Fernandes Chaves June 2018 Declaration I declare that this document is an original work of my own authorship and that it fulfills all the require- ments of the Code of Conduct and Good Practices of the Universidade de Lisboa. Acknowledgments I would like to thank my family and friends, whose support was paramount to the completion of this thesis, and to INESC-ID Lisboa, for providing the facilities in which this thesis was developed. In particular I would like to thank Diogo Marques for his tips that helped drive my research forward and, of course, to Prof. Aleksander for his incredible patience in dealing with me. i ii Resumo Recentemente os CPUs vemˆ equipados com placas graficas´ integradas. Este acoplamento tem o potencial de oferecer ganhos de desempenho consideraveis´ caso as ditas GPUs sejam usadas como aceleradores. No entanto, placas graficas´ integradas temˆ dificuldade em atingir os n´ıveis de desem- penho que placas discretas proporcionam devido ao menor numero´ de nucleos.´ Contudo, a sua prox- imidade com o CPU significa uma partilha de dados com menos sobrecargas associadas. Ademais, as vantagens de partilhar a hierarquia de memoria´ com o processador e o consumo de energia mais baixo que as placas discretas a` custa de desempenho permite atingir n´ıveis de eficienciaˆ energetica´ mais elevados.
    [Show full text]
  • Fire GL1 User's Guide
    User's Guide DISCLAIMER The manufacturer (MFR) reserves the right to make changes to this document and the products which it describes without notice. The MFR shall not be liable for technical or editorial errors or omissions made herein; not for incidental or consequential damages resulting from the furnishing, performance, or use of this material. The MFR makes no representation that the interconnection of products in the manner described herein will not infringe on existing or future patent rights, nor do the descriptions contained herein imply the granting of license to make, use or sell equipment constructed in accordance with this description. The PCI accelerators have been designed to support the PCI local bus standards. Some computers use proprietary local bus circuitry and therefore may not be fully compatible with the MFR’s local bus cards. Although tested successfully in a wide variety of computer systems, the MFR cannot be held responsible for any incompatibilities which may occur between this card and the system configuration you plan to use. We recommend that you check with the dealer or distributor for your computer system before installing your card. © Copyright 1999 Fire GL Graphics. All rights reserved 2 Table of Contents 1ABOUT FIRE GL1................................................... 5 HARDWARE FEATURES............................................................6 SOFTWARE FEATURES .............................................................6 PROFESSIONAL 3D RENDERING ............................................7
    [Show full text]
  • Datasheet – Volume 1 of 2
    Mobile 4th Generation Intel® Core™ Processor Family, Mobile Intel® Pentium® Processor Family, and Mobile Intel® Celeron® Processor Family Datasheet – Volume 1 of 2 Supporting 4th Generation Intel® Core™ processor based on Mobile M-Processor and H-Processor Lines Supporting Mobile Intel® Pentium® Processor and Mobile Intel® Celeron® Processor Families July 2014 Order No.: 328901-007 By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD
    [Show full text]
  • United States Patent (19) 11 Patent Number: 6,011,546 Bertram (45) Date of Patent: *Jan
    US00601-1546A United States Patent (19) 11 Patent Number: 6,011,546 Bertram (45) Date of Patent: *Jan. 4, 2000 54 PROGRAMMING STRUCTURE FOR USER 58 Field of Search ..................................... 345/326, 327, INTERFACES 345/339, 352, 353, 354; 348/6, 10, 734, 725, 601,906, 473, 563-565, 589, 584, 75 Inventor: Randal Lee Bertram, Raleigh, N.C. 586, 588; 455/3.1, 4.1, 4.2, 6.1, 6.2 73 Assignee: International Business Machines 56) References Cited Corporation, Armonk, N.Y. U.S. PATENT DOCUMENTS * Notice: This patent is Subject to a terminal dis- 5,422,674 6/1995 Hooper et al. .............................. 34.8/6 claimer. 5,539,479 7/1996 Bertram .................................. 34.8/564 5,548,340 8/1996 Bertram .................................. 34.8/559 5,657,091 8/1997 Bertram .................................. 34.8/559 21 Appl. No.: 08/937,056 5,801,941 9/1998 Bertram .................................. 364/188 22 Filed: Sep. 24, 1997 Primary Examiner-John W. Miller Attorney, Agent, or Firm-Daniel E. McConnell; Bernard Related U.S. Application Data D. Bogdon 63 Continuation-in-part of application No. 08/700,606, Aug. 57 ABSTRACT application12, 1996, Pat. No. No. 08/899,038, 5,801.941, andJul. a 23,continuation-in-part 1997, abandoned, of E. Stored E. t asSociated with irrn which is a continuation of application No. 08/551,620, Nov. COntrollerS contro Ing a display to a user are constructe 1, 1995, Pat. No. 5,657,091. a language which uses layered Statements, each of which can 7 have a description portion,s an action portion,s and a unique 51 Int.
    [Show full text]
  • Apov Issue 4 Regulars
    issue 4 - june 2010 - an abime.net publication the amiga dedicated to amIga poInt of vIew AMIGA reviews w news tips w charts apov issue 4 regulars 8 editorial 10 news 14 who are we? 116 charts 117 letters 119 the back page reviews 16 leander 18 dragon's breath 22 star trek: 25th anniversary 26 operation wolf 28 cabal 30 cavitas 32 pinball fantasies 36 akira 38 the king of chicago ap o 40 wwf wrestlemania v 4 42 pd games 44 round up 5 features 50 in your face The first person shooter may not be the first genre that comes to mind when you think of the Amiga, but it's seen plenty of them. Read about every last one in gory detail. “A superimposed map is very useful to give an overview of the levels.” 68 emulation station There are literally thousands of games for the Amiga. Not enough for you? Then fire up an emulator and choose from games for loads of other systems. Wise guy. “More control options than you could shake a joypad at and a large number of memory mappers.” 78 sensi and sensibility Best football game for the Amiga? We'd say so. Read our guide to the myriad versions of Sensi. “The Beckhams had long lived in their estate, in the opulence which their eminence afforded them.” wham into the eagles nest 103 If you're going to storm a castle full of Nazis you're going to need a plan. colorado 110 Up a creek without a paddle? Read these tips and it'll be smooth sailing.
    [Show full text]
  • Introducing the X68000: Japan's 16-Bit Beast
    Introducing the X68000: Japan’s 16-Bit Beast Ford Seidel Japanese Home Computers Circa 1987 Unlike today where nearly all machines are boring x86 boxes, there were several wildly different options back then - MSX - NEC PC-88 and PC-98 - IBM PC Compatibles - Fujitsu FM series - Sharp X1 - Sega SC-3000 - Way too many to list here MSX: Dominant Japanese Computer Standard - Created by Microsoft in 1983 in an attempt to standardize the PC market - Manufacturers would implement their own machine within the MSX standard - Z80 CPU at 3.58MHz - also sold in Europe and the Middle East - In 1987, current standard was MSX2/2+ - >= 64KB main RAM - >= 128KB VRAM - AY-3-8910/YM2149 sound chip - 3 square waves and a noise generator - 512 color palette - 256 colors at 256x212 or 16 colors at 512x512 MSX2 Graphics - Hardware only supports vertical scrolling - To avoid this, games were often restricted to 1 screen Nemesis 2: a Fairly Standard MSX2 game https://www.youtube.com/watch?v=ZfZf8XpF7_o Good Scrolling (REALLY hard to do) https://www.youtube.com/watch?v=gcZJ64PgtgA Sharp X1 - Not super popular, but also not unpopular by any means - Technically inferior to MSX2 - Z80 CPU at 4MHz - 8 colors - 320x200 or 640x200 resolution - AY-3-8910/YM2149 sound chip - Optional YM2151 FM synth soundchip X1 Twin - Contains an X1 and a PC Engine/TurboGrafx-16 - Inspired X68000’s design Space Harrier on X1 https://youtu.be/RUad5VrOQsY?t=47s The Sharp X68000 - Sharp realized that there was space in the high end market not occupied by MSX - Aimed to design machine vastly
    [Show full text]
  • Intel740™ Graphics Accelerator
    Intel740™ Graphics Accelerator Software Developer’s Manual September 1998 Order Number: 290617-003 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. The Intel740 graphics accelerator may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available upon request. I2C is a two-wire communications bus/protocol developed by Philips. SMBus is a subset of the I2C bus/protocol and was developed by Intel. Implementations of the I2C bus/protocol or the SMBus bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from: http://www.intel.com or call 1-800-548-4725 Copyright © Intel Corporation, 1997-1998 *Third-party brands and names are the property of their respective owners.
    [Show full text]
  • Memory System Optimizations for CPU-GPU Heterogeneous Chip-Multiprocessors
    Memory System Optimizations for CPU-GPU Heterogeneous Chip-multiprocessors A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Siddharth Rai to the DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY KANPUR, INDIA July, 2018 Synopsis Recent commercial chip-multiprocessors (CMPs) have integrated CPU as well as GPU cores on the same chip [42, 43, 44, 93]. In today's designs, these cores typically share parts of the memory system resources between the applications executing on the two types of cores. However, since the CPU and the GPU cores execute very different workloads leading to very different resource requirements, designing intelligent protocols for sharing resources between them such that both CPU and GPU gain in performance brings forth new challenges to the design space of these heterogeneous processors. In this dissertation, we explore solutions to dynamically allocate last-level cache (LLC) capacity and DRAM bandwidth to the CPU and GPU cores in a design where both the CPU and the GPU share the large on- die LLC, DRAM controllers, DRAM channels, DRAM ranks, and DRAM device resources (banks, rows). CPU and GPU differ vastly in their execution models, workload characteristics, and performance requirements. On one hand, a CPU core executes instructions of a latency-sensitive and/or moderately bandwidth-sensitive job progressively in a pipeline generating memory accesses (for instruction and data) only in a few pipeline stages (instruction fetch and data memory access stages). On the other hand, GPU can access different data streams having different semantic meanings and disparate access patterns throughout the rendering pipeline.
    [Show full text]