Versapipe: a Versatile Programming Framework for Pipelined Computing on GPU

Total Page:16

File Type:pdf, Size:1020Kb

Versapipe: a Versatile Programming Framework for Pipelined Computing on GPU VersaPipe: A Versatile Programming Framework for Pipelined Computing on GPU Zhen Zheng Chanyoung Oh Jidong Zhai Tsinghua University University of Seoul Tsinghua University [email protected] [email protected] [email protected] Xipeng Shen Youngmin Yi Wenguang Chen North Carolina State University University of Seoul Tsinghua University [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Pipeline is an important programming pattern, while GPU, designed Pipeline parallel programming is commonly used to exert the full mostly for data-level parallel executions, lacks an ecient mecha- parallelism of some important applications, ranging from face detec- nism to support pipeline programming and executions. This paper tion to network packet processing, graph rendering, video encoding provides a systematic examination of various existing pipeline exe- and decoding [11, 31, 34, 42, 46]. A pipeline program consists of a cution models on GPU, and analyzes their strengths and weaknesses. chain of processing stages, which have a producer-consumer rela- To address their shortcomings, this paper then proposes three new tionship. The output of a stage is the input of the next stage. Figure 1 execution models equipped with much improved controllability, shows Reyes rendering pipeline [12], a popular image rendering al- including a hybrid model that is capable of getting the strengths gorithm in computer graphics. It uses one recursive stage and three of all. These insights ultimately lead to the development of a soft- other stages to render an image. In general, any programs that have ware programming framework named VersaPipe. With VersaPipe, multiple stages with the input data going through these stages in users only need to write the operations for each pipeline stage. succession could be written in a pipeline fashion. Ecient supports VersaPipe will then automatically assemble the stages into a hybrid of pipeline programming and execution are hence important for a execution model and congure it to achieve the best performance. broad range of workloads [5, 22, 26, 51]. Experiments on a set of pipeline benchmarks and a real-world face detection application show that VersaPipe produces up to 6.90 N ⇥ (2.88 on average) speedups over the original manual implementa- ⇥ Y tions. Primitives Bound Split Dice Shade Image diceable? CCS CONCEPTS Figure 1: Reyes rendering: an example pipeline workload. • General and reference → Performance; • Computing method- On the other hand, modern graphics processing unit (GPU) ologies → Parallel computing methodologies; • Computer shows a great power in general-purpose computing. Thousands of systems organization → Heterogeneous (hybrid) systems; cores make data-parallel programs gain a signicant performance boost with GPU. While GPU presents powerful performance po- KEYWORDS tential for a wide range of applications, there are some limitations GPU, Pipelined Computing for pipeline programs to make full use of GPU’s power. With its basic programming model, each major stage of a pipeline is put ACM Reference Format: into a separate GPU kernel, invoked one after one. It creates an Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, ill-t design for the pipeline nature of the problem. In the Reyes and Wenguang Chen. 2017. VersaPipe: A Versatile Programming Framework rendering pipeline, for instance, the recursive process would need for Pipelined Computing on GPU. In Proceedings of MICRO-50, Cambridge, a large number of kernel calls, which could introduce substantial MA, USA, October 14–18, 2017, 13 pages. https://doi.org/10.1145/3123939.3123978 overhead. And as stages are separated by kernel boundaries, no par- allelism would be exploited across stages, which further hurts the performance. In the Reyes example, as the recursive depth varies for dierent data items in the pipeline, a few items that have deep Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed recursive depth will cause the whole pipeline to stall. for prot or commercial advantage and that copies bear this notice and the full citation How to make GPU best accelerate pipelined computations re- on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, mains an open question. Some work has devoted manual eorts to post on servers or to redistribute to lists, requires prior specic permission and/or a in developing some specic pipeline applications, such as graphic fee. Request permissions from [email protected]. processing [34, 35, 37, 45] and packet processing [46]. They provide MICRO-50, October 14–18, 2017, Cambridge, MA, USA some useful insights, but do not oer a programming support to © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-4952-9/17/10...$15.00 general pipeline problems. Some programming models have been https://doi.org/10.1145/3123939.3123978 proposed recently trying to simplify the development of ecient 587 MICRO-50, October 14–18, 2017, Cambridge, MA, USA Z. Zheng et al. pipeline applications on GPU. Most of them employ a Megakernel- It proposes VersaPipe, a versatile programming framework • based approach [3, 16], in which, the code for all stages of a pipeline that provides automatic support to create the combination is put into one large GPU kernel that branches to dierent sub- of execution models that best suit a given pipeline problem. functions based on which stage the current data item needs to go It evaluates VersaPipe on a set of pipeline workloads and • through. At launching time, the method tries to launch as many compares its performance with several alternative methods. blocks as can be concurrently scheduled on GPU and schedule the Results show that VersaPipe outperforms the state-of-art so- tasks through software managed task queues [10, 43, 44, 47]. lutions signicantly (by up to 1.66 ), accelerates the original ⇥ Although these eorts have made good contributions to advance manual implementations by 2.88 on average (up to 6.9 ). ⇥ ⇥ the state of the art, they have some serious limitations. Putting Our approach targets NVIDIA GPUs, but the general principles all stages into one kernel, Megakernel-based methods cause large behind the approach could be applied to any model based on an register and shared memory pressure on GPU. As a result, many equivalent of NVIDIA’s streaming multiprocessors, so long as the programs can have only a small number of threads running con- model provides a way to gain access to the id of the SM on which a currently on GPU, leaving the massive parallelism of GPU largely thread is executing. untapped (detailed in Section 8). An important observation made in this work is that, fundamen- tally, the prior eorts have been limited by the lack of software 2 BACKGROUND controllability of executions on GPU. GPU uses hardware sched- This section provides background on GPU that is relevant to this ulers to decide when and where (on which computing unit of a work. We use NVIDIA CUDA [30] terminology for the discussion. GPU) a GPU thread runs. All prior pipeline model designs have Graphic processing unit (GPU) is widely used in a variety of taken this limitation for granted, which has largely limited the general purpose computing domains now. It adopts a very wide Sin- design exibilities. gle Instruction Multiple Data (SIMD) architecture, where hundreds In this paper, by combining two recent software techniques (per- of units can process instructions simultaneously. Each unit in the sistent threads [3] and SM-centric method [50], explained later), SIMD of GPU is called Streaming Processor (SP). Each SP has its we show that the limitation can be largely circumvented with- own PC and registers, and can access and process a dierent instruc- out hardware extensions. The software controllability released by tion in the kernel code. A GPU device has multiple clusters of SPs, that method opens up a much greater exibility for the designs of called Streaming Multiprocessors (SMs). Each SM has indepen- pipeline execution models. dent hardware resources of shared memory, L1 caches, and register Leveraging the exibility, we propose three new execution mod- les, while all SMs share L2 cache. For more ecient processing and els with much improved congurability. They signicantly expand data mapping, threads on GPU are grouped as CTAs (Cooperative the selection space of pipeline models. By examining the various Thread Arrays, also called Thread Block). Threads in the same models, we reveal their respective strengths and weaknesses. We CTA can communicate with each other via global memory, shared then develop a software framework named VersaPipe to translate memory and barrier synchronization. The number of threads in a these insights and models into an enhanced pipeline programming CTA is set by programs, but also limited by the architecture. All system. VersaPipe consists of an API, a library that incorporates CTAs run independently and dierent CTAs can run concurrently. the various execution models and the scheduling mechanisms, and GPU is a complex architecture in which resources are dynami- an auto-tuner that automatically nds the best congurations of a cally partitioned depending on the usage. Since an SM has a limited pipeline. With VersaPipe, users only need to write the operations size of registers and shared memory, any of them can be a lim- for each pipeline stage. VersaPipe will then automatically assemble iting factor that could result in under-utilization of GPU. When the stages into a hybrid execution model and congure it to achieve hardware resource usage of a thread increases, the total number of the best performance. threads that can run concurrently on GPU decreases. Due to this For its much improved congurability and much enriched ex- nature of GPU architecture, capturing the workload concurrency ecution model space, VersaPipe signicantly advances the state into a kernel should be done carefully considering the resource of the art in supporting pipeline executions on GPU.
Recommended publications
  • How to Manually Switch Graphics Cards Macbook Pro
    How To Manually Switch Graphics Cards Macbook Pro How to manually switch between dedicated and integrated graphics Is there a "hack" to switch between graphics processors on the Retina Display MacBook Pro makes it possible to easily switch between graphics cards manually on these. gpu-switch is an application that allows to switch between the graphic cards of dual-GPU Macbook Pro models. On these computers, the "automatic graphics switching" option is turned on by how to determine which graphics card is in use on a 15" or 17" MacBook Pro. MacBook Pro models that were affected by this problem would often have visual the picture above for a half a second while switching between graphics cards). capsule backup in this mode so have to keep manually backing up my data. When using a Mac that supports automatic graphics switching, the system will Two entries should appear under Video Card: "Intel HD Graphics 4000". Installing Arch Linux on a MacBook (Air/Pro) or an iMac is quite similar to installing it on any other computer. Boot installation medium and switch to a free tty. and work through until you get to Prepare Hard Drive and use the "Manually configure block devices. Different MacBook models have different graphic cards. How To Manually Switch Graphics Cards Macbook Pro Read/Download Several support threads discussing issue related to GPU switching. Some early Retina-equipped MacBook Pro users are seeing graphical issues in Safari. Earlier "Late 2008" and "Mid-2009" MacBook Pro models equipped with dual graphics A tool called gfxCardStatus allows to switch it manually: gfx.io/.
    [Show full text]
  • Op E N So U R C E Yea R B O O K 2 0
    OPEN SOURCE YEARBOOK 2016 ..... ........ .... ... .. .... .. .. ... .. OPENSOURCE.COM Opensource.com publishes stories about creating, adopting, and sharing open source solutions. Visit Opensource.com to learn more about how the open source way is improving technologies, education, business, government, health, law, entertainment, humanitarian efforts, and more. Submit a story idea: https://opensource.com/story Email us: [email protected] Chat with us in Freenode IRC: #opensource.com . OPEN SOURCE YEARBOOK 2016 . OPENSOURCE.COM 3 ...... ........ .. .. .. ... .... AUTOGRAPHS . ... .. .... .. .. ... .. ........ ...... ........ .. .. .. ... .... AUTOGRAPHS . ... .. .... .. .. ... .. ........ OPENSOURCE.COM...... ........ .. .. .. ... .... ........ WRITE FOR US ..... .. .. .. ... .... 7 big reasons to contribute to Opensource.com: Career benefits: “I probably would not have gotten my most recent job if it had not been for my articles on 1 Opensource.com.” Raise awareness: “The platform and publicity that is available through Opensource.com is extremely 2 valuable.” Grow your network: “I met a lot of interesting people after that, boosted my blog stats immediately, and 3 even got some business offers!” Contribute back to open source communities: “Writing for Opensource.com has allowed me to give 4 back to a community of users and developers from whom I have truly benefited for many years.” Receive free, professional editing services: “The team helps me, through feedback, on improving my 5 writing skills.” We’re loveable: “I love the Opensource.com team. I have known some of them for years and they are 6 good people.” 7 Writing for us is easy: “I couldn't have been more pleased with my writing experience.” Email us to learn more or to share your feedback about writing for us: https://opensource.com/story Visit our Participate page to more about joining in the Opensource.com community: https://opensource.com/participate Find our editorial team, moderators, authors, and readers on Freenode IRC at #opensource.com: https://opensource.com/irc .
    [Show full text]
  • Hard Disk Drive
    V10.1.00 Preface Notice The company reserves the right to revise this publication or to change its contents without notice. Information contained herein is for reference only and does not constitute a commitment on the part of the manufacturer or any subsequent vendor. They assume no responsibility or liability for any errors or inaccuracies that may appear in this publication nor are they in anyway responsible for any loss or damage resulting from the use (or misuse) of this publication. This publication and any accompanying software may not, in whole or in part, be reproduced, translated, trans- mitted or reduced to any machine readable form without prior consent from the vendor, manufacturer or creators of this publication, except for copies kept by the user for backup purposes. Brand and product names mentioned in this publication may or may not be copyrights and/or registered trade- marks of their respective companies. They are mentioned for identification purposes only and are not intended as an endorsement of that product or its manufacturer. ©May 2010 Trademarks Intel and Intel Core are trademarks/registered trademarks of Intel Corporation. I Preface R&TTE Directive This device is in compliance with the essential requirements and other relevant provisions of the R&TTE Direc- tive 1999/5/EC. This device will be sold in the following EEA countries: Austria, Italy, Belgium, Liechtenstein, Denmark, Lux- embourg, Finland, Netherlands, France, Norway, Germany, Portugal, Greece, Spain, Iceland, Sweden, Ireland, United Kingdom, Cyprus, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Malta, Slovakia, Poland, Slov- enia. II Preface FCC Statement (Federal Communications Commission) You are cautioned that changes or modifications not expressly approved by the party responsible for compliance could void the user's authority to operate the equipment.
    [Show full text]
  • Intel740™ Graphics Accelerator
    Intel740™ Graphics Accelerator Software Developer’s Manual September 1998 Order Number: 290617-003 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. The Intel740 graphics accelerator may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available upon request. I2C is a two-wire communications bus/protocol developed by Philips. SMBus is a subset of the I2C bus/protocol and was developed by Intel. Implementations of the I2C bus/protocol or the SMBus bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from: http://www.intel.com or call 1-800-548-4725 Copyright © Intel Corporation, 1997-1998 *Third-party brands and names are the property of their respective owners.
    [Show full text]
  • Windows Certification Program Hardware Certification Taxonomy & Requirements
    Windows Certification Program Hardware Certification Taxonomy & Requirements December 2014 This document is provided “as-is”. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. © 2012 Microsoft. All rights reserved. Microsoft, Windows and Windows Server are trademarks of the Microsoft group of companies. UPnP™ is a certification mark of the UPnP™ Implementers Corp. All other trademarks are property of their respective owners. Page 1 of 254 Microsoft Corporation Technical Documentation License Agreement READ THIS! THIS IS A LEGAL AGREEMENT BETWEEN MICROSOFT CORPORATION ("MICROSOFT") AND THE RECIPIENT OF THESE MATERIALS, WHETHER AN INDIVIDUAL OR AN ENTITY ("YOU"). IF YOU HAVE ACCESSED THIS AGREEMENT IN THE PROCESS OF DOWNLOADING MATERIALS ("MATERIALS") FROM A MICROSOFT WEB SITE, BY CLICKING "I ACCEPT", DOWNLOADING, USING OR PROVIDING FEEDBACK ON THE MATERIALS, YOU AGREE TO THESE TERMS. IF THIS AGREEMENT IS ATTACHED TO MATERIALS, BY ACCESSING, USING OR PROVIDING FEEDBACK ON THE ATTACHED MATERIALS, YOU AGREE TO THESE TERMS. For good and valuable consideration, the receipt and sufficiency of which are acknowledged, You and Microsoft agree as follows: 1. You may review these Materials only (a) as a reference to assist You in planning and designing Your product, service or technology ("Product") to interface with a Microsoft Product as described in these Materials; and (b) to provide feedback on these Materials to Microsoft. All other rights are retained by Microsoft; this agreement does not give You rights under any Microsoft patents.
    [Show full text]
  • Sound Blaster Cinema EQ
    V13.3.00 Preface Notice The company reserves the right to revise this publication or to change its contents without notice. Information contained herein is for reference only and does not constitute a commitment on the part of the manufacturer or any subsequent vendor. They assume no responsibility or liability for any errors or inaccuracies that may appear in this publication nor are they in anyway responsible for any loss or damage resulting from the use (or misuse) of this publication. This publication and any accompanying software may not, in whole or in part, be reproduced, translated, trans- mitted or reduced to any machine readable form without prior consent from the vendor, manufacturer or creators of this publication, except for copies kept by the user for backup purposes. Brand and product names mentioned in this publication may or may not be copyrights and/or registered trade- marks of their respective companies. They are mentioned for identification purposes only and are not intended as an endorsement of that product or its manufacturer. ©July 2013 Trademarks Intel and Intel Core are trademarks/registered trademarks of Intel Corporation. I Preface R&TTE Directive This device is in compliance with the essential requirements and other relevant provisions of the R&TTE Direc- tive 1999/5/EC. This device will be sold in the following EEA countries: Austria, Italy, Belgium, Liechtenstein, Denmark, Lux- embourg, Finland, Netherlands, France, Norway, Germany, Portugal, Greece, Spain, Iceland, Sweden, Ireland, United Kingdom, Cyprus, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Malta, Slovakia, Poland, Slov- enia. ErP Off Mode Power Consumption Statement: The figures below note the power consumption of this computer in compliance with European Commission (EC) regulations on power consumption in off mode • Off Mode < 0.5W II Preface CE Marking This device has been tested to and conforms to the regulatory requirements of the European Union and has at- tained CE Marking.
    [Show full text]
  • Memory Hierarchies for Future Hpc Architectures
    MEMORY HIERARCHIES FOR FUTURE HPC ARCHITECTURES V´ICTOR GARC´IA FLORES ADISSERTATION SUBMITTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY /DOCTOR PER LA UPC TO THE DEPARTMENT OF COMPUTER ARCHITECTURE UNIVERSITAT POLITECNICA` DE CATALUNYA ADVISOR:NACHO NAVARRO CO-ADVISORS:ANTONIO J. PENA˜ ,EDUARD AYGUADE BARCELONA SEPTEMBER 2017 Abstract Efficiently managing the memory subsystem of modern multi/manycore architectures is in- creasingly becoming a challenge as systems grow in complexity and heterogeneity. From multicore architectures with several levels of on-die cache to heterogeneous systems com- bining graphics processing units (GPUs) and traditional general-purpose processors, the once simple Von Neumann machines have transformed into complex systems with a high reliance on an efficient memory subsystem. In the field of high performance computing (HPC) in particular, where massively parallel architectures are used and input sets of several terabytes are common, careful management of the memory hierarchy is crucial to exploit the full com- puting power of these systems. The goal of this thesis is to provide computer architects with valuable information to guide the design of future systems, and in particular of those more widely used in the field of HPC, i.e., symmetric multicore processors (SMPs) and GPUs. With that aim, we present an analysis of some of the inefficiencies and shortcomings of current memory management techniques and propose two novel schemes leveraging the opportunities that arise from the use of new and emerging programming models and computing paradigms. The first contribution of this thesis is a block prefetching mechanism for task-based pro- gramming models.
    [Show full text]
  • TPS59640EVM-751 Evaluation Module User's Guide
    User's Guide SLUU796–January 2012 Using the TPS59640EVM-751 IMVP-7, 3-Phase CPU/1-Phase GPU SVID Power System The TPS59640EVM-751 evaluation module (EVM) is a complete solution for the Intel™ IMVP-7 Serial VID (SVID) Power System from a 9-V to 20-V input bus. This EVM uses the TPS59640 for IMVP-7 3-Phase CPU and 1-Phase GPU Vcore, TPS51219 for 1.05VCCIO, TPS51916 for DDR3L/DDR4 memory rail (1.2VDDQ, 0.6VTT, and 0.6VTTREF). The TPS59640EVM-751 also uses the 5-mm x 6-mm TI power block MOSFET (CSD87350Q5D) for high-power density and superior thermal performance. Contents 1 Description ................................................................................................................... 4 1.1 Typical Applications ................................................................................................ 4 1.2 Features ............................................................................................................. 4 2 TPS59640EVM-751 Power System Block Diagram .................................................................... 5 3 Electrical Performance Specifications .................................................................................... 6 4 Schematic .................................................................................................................... 8 5 Test Setup .................................................................................................................. 21 5.1 Test Equipment ..................................................................................................
    [Show full text]
  • Low-Cost, High-Speed Computer Vision Using NVIDIA's CUDA
    Low-Cost, High-Speed Computer Vision Using NVIDIA’s CUDA Architecture Seung In Park, Sean P. Ponce, Jing Huang, Yong Cao and Francis Quek† Center of Human Computer Interaction, Virginia Polytechnic Institute and University Blacksburg, VA 24060, USA †[email protected] Abstract— In this paper, we introduce real time image processing To support this trend of GPGPU (General-Purpose techniques using modern programmable Graphic Processing Units Computing on GPUs) computation, graphics card vendors (GPU). GPUs are SIMD (Single Instruction, Multiple Data) device have provided programmable GPUs and high-level that is inherently data-parallel. By utilizing NVIDIA’s new GPU languages to allow developers to generate GPU-based programming framework, “Compute Unified Device Architecture” applications. (CUDA) as a computational resource, we realize significant acceleration in image processing algorithm computations. We show This research aimed at accomplishing efficient and cheaper that a range of computer vision algorithms map readily to CUDA application of image processing for intensive computation with significant performance gains. Specifically, we demonstrate using GPUs. We show that a range of computer vision the efficiency of our approach by a parallelization and optimization algorithms map readily to CUDA by using edge detection of Canny’s edge detection algorithm, and applying it to a and motion tracking algorithm as our example application. computation and data-intensive video motion tracking algorithm The remainder of the paper is organized as follows. In known as “Vector Coherence Mapping” (VCM). Our results show Section 2, we describe the recent advances in GPU hardware the promise of using such common low-cost processors for and programming framework, discuss previous efforts on intensive computer vision tasks.
    [Show full text]
  • TPS59650EVM-753 Evaluation Module (EVM) Is a Complete Solution for Intel™ IMVP7 Serial VID(SVID) Power System from a 9V-20V Input Bus
    User's Guide SLUU896–March 2012 Using the TPS59650EVM-753 Intel™ IMVP-7 3-Phase CPU/2-Phase GPU SVID Power System The TPS59650EVM-753 evaluation module (EVM) is a complete solution for Intel™ IMVP7 Serial VID(SVID) Power System from a 9V-20V input bus. This EVM uses the TPS59650 for IMVP7 - 3-Phase CPU and 2-Phase GPU Vcore controller, the TPS51219 for 1.05VCCIO, TPS51916 for DDR3L/DDR4 Memory rail (1.2VDDQ, 0.6VTT and 0.6VTTREF) and also uses the (CSD87350Q5D) a 5mm x 6mm TI’s power block MOSFETs that uses Powerstack™ technology with high-side and low-side MOSFETs for high power density and superior thermal performance. Contents 1 Description ................................................................................................................... 5 1.1 Typical Applications ................................................................................................ 5 1.2 Features ............................................................................................................. 5 2 TPS59650EVM-753 Power System Block Diagram .................................................................... 6 3 Electrical Performance Specifications .................................................................................... 7 4 Test Setup ................................................................................................................... 8 4.1 Test Equipment ..................................................................................................... 8 4.2 Recommended Wire Gauge .....................................................................................
    [Show full text]
  • Parallel Algorithms for Querying Spatial Properties in the Protein Data Bank
    Parallel algorithms for querying spatial properties in the Protein Data Bank Joshua Selvan A research report submitted to the Faculty of Engineering and the Built Environ- ment, University of the Witwatersrand, Johannesburg, in partial fulfilment of the requirements for the degree of Master of Science in Engineering. Johannesburg, December 2019 1 Declaration I declare that this research report is my own, unaided work, except where other- wise acknowledged. It is being submitted for the degree of Master of Science in Engineering to the University of the Witwatersrand, Johannesburg. It has not been submitted before for any degree or examination to any other university. Signed this day of 20 Joshua Selvan 2 Contents Declaration1 Contents2 Glossary7 Abstract9 1 Introduction 10 1.1 Overview and motivation........................ 10 1.2 Research Objectives........................... 11 1.3 Overview of approach.......................... 12 1.4 Structure of Report............................ 13 2 Background 15 2.1 Overview................................. 15 2.2 Proteins.................................. 16 2.2.1 The roles and composition of proteins in cells......... 16 2.2.2 Describing protein structures in four levels........... 16 2.2.3 Examples of bio-molecular research featuring spatial data.. 20 3 2.2.4 Protein Data Bank (PDB) files................. 24 2.3 Spatial data structures.......................... 25 2.3.1 Spatial indexing......................... 25 2.3.2 Binary trees............................ 26 2.3.3 Kd-trees.............................. 27 2.3.4 Range searching with kd-trees.................. 29 2.3.5 Other types of spatial structures................ 29 2.4 Approaches to increasing compute power vs increasing data load sizes 31 2.5 Measuring performance gains in parallel systems..........
    [Show full text]
  • Intel740 Graphics Accelerator
    Intel740 Graphics Accelerator Software Developer’s Manual February 1998 Order Number: 290617-001 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Intel740 may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. I2C is a two-wire communications bus/protocol developed by Philips. SMBus is a subset of the I2C bus/protocol and was developed by Intel. Implementations of the I2C bus/protocol or the SMBus bus/protocol may require licenses from various entities, including Philips Electronics N.V.
    [Show full text]