Heterogeneous Computing with Opencl 2.0 This Page Intentionally Left Blank Heterogeneous Computing with Opencl 2.0 Third Edition

Heterogeneous Computing with OpenCL 2.0 This page intentionally left blank Heterogeneous Computing with OpenCL 2.0 Third Edition David Kaeli Perhaad Mistry Dana Schaa Dong Ping Zhang AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Todd Green Editorial Project Manager: Charlie Kent Project Manager: Priya Kumaraguruparan Cover Designer: Matthew Limbert Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Copyright © 2015, 2013, 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-12-801414-1 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress For information on all MK publications visit our website at www.mkp.com Contents ListofFigures ......................................................................... xi ListofTables ........................................................................ xvii Foreword.............................................................................. xix Acknowledgments.................................................................... xxi CHAPTER 1 Introduction ....................................................... 1 1.1 Introduction to Heterogeneous Computing......................... 1 1.2 TheGoalsofThisBook............................................. 2 1.3 ThinkingParallel .................................................... 2 1.4 ConcurrencyandParallelProgrammingModels................... 7 1.5 ThreadsandSharedMemory ....................................... 8 1.6 Message-PassingCommunication.................................. 9 1.7 DifferentGrainsofParallelism ....................................10 1.7.1 DataSharingandSynchronization..........................11 1.7.2 SharedVirtualMemory .....................................11 1.8 HeterogeneousComputingwithOpenCL.........................12 1.9 BookStructure......................................................13 References................................................................14 CHAPTER 2 Device Architectures ..........................................15 2.1 Introduction .........................................................15 2.2 HardwareTrade-offs................................................15 2.2.1 Performance Increase with Frequency, anditsLimitations...........................................17 2.2.2 SuperscalarExecution.......................................18 2.2.3 VeryLongInstructionWord.................................19 2.2.4 SIMDandVectorProcessing................................21 2.2.5 Hardware Multithreading....................................22 2.2.6 Multicore Architectures .....................................25 2.2.7 Integration:Systems-on-ChipandtheAPU................26 2.2.8 CacheHierarchiesandMemorySystems...................28 2.3 TheArchitecturalDesignSpace ...................................29 2.3.1 CPUDesigns.................................................29 2.3.2 GPUArchitectures...........................................33 2.3.3 APUandAPU-likeDesigns.................................37 2.4 Summary............................................................38 References................................................................39 v vi Contents CHAPTER 3 Introduction to OpenCL........................................41 3.1 Introduction .........................................................41 3.1.1 TheOpenCLStandard.......................................41 3.1.2 TheOpenCLSpecification..................................42 3.2 TheOpenCLPlatformModel......................................43 3.2.1 PlatformsandDevices.......................................44 3.3 TheOpenCLExecutionModel ....................................45 3.3.1 Contexts......................................................45 3.3.2 Command-Queues...........................................47 3.3.3 Events ........................................................48 3.3.4 Device-Side Enqueuing .....................................49 3.4 KernelsandtheOpenCLProgrammingModel ...................50 3.4.1 CompilationandArgumentHandling ......................53 3.4.2 StartingKernelExecutionona Device.....................55 3.5 OpenCLMemoryModel...........................................56 3.5.1 MemoryObjects.............................................56 3.5.2 DataTransferCommands ...................................59 3.5.3 MemoryRegions ............................................60 3.5.4 GenericAddressSpace......................................62 3.6 TheOpenCLRuntimewithanExample ..........................62 3.6.1 Complete Vector Addition Listing ..........................66 3.7 Vector Addition Using an OpenCL C++ Wrapper ................69 3.8 OpenCLforCUDAProgrammers .................................71 3.9 Summary............................................................73 Reference.................................................................73 CHAPTER 4 Examples ..........................................................75 4.1 OpenCLExamples..................................................75 4.2 Histogram...........................................................75 4.3 ImageRotation......................................................83 4.4 ImageConvolution .................................................91 4.5 Producer-Consumer ................................................99 4.6 Utility Functions .................................................. 107 4.6.1 ReportingCompilationErrors............................. 107 4.6.2 Creatinga ProgramString................................. 108 4.7 Summary.......................................................... 109 CHAPTER 5 OpenCL Runtime and Concurrency Model ............. 111 5.1 CommandsandtheQueuingModel............................. 111 5.1.1 BlockingMemoryOperations............................. 111 Contents vii 5.1.2 Events ...................................................... 112 5.1.3 CommandBarriersandMarkers.......................... 113 5.1.4 EventCallbacks............................................ 114 5.1.5 ProfilingUsingEvents..................................... 114 5.1.6 UserEvents ................................................ 115 5.1.7 Out-of-OrderCommand-Queues.......................... 116 5.2 Multiple Command-Queues...................................... 118 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges ................................... 121 5.3.1 Synchronization............................................ 124 5.3.2 Work-GroupBarriers...................................... 125 5.3.3 Built-InWork-GroupFunctions........................... 128 5.3.4 PredicateEvaluationFunctions ........................... 128 5.3.5 BroadcastFunctions....................................... 129 5.3.6 ParallelPrimitiveFunctions............................... 129 5.4 NativeandBuilt-InKernels...................................... 130 5.4.1 Nativekernels.............................................. 130 5.4.2 Built-in kernels ............................................ 132 5.5 Device-SideQueuing............................................. 132 5.5.1 Creatinga Device-SideQueue............................ 135 5.5.2 Enqueuing Device-Side Kernels .......................... 136 5.6 Summary.......................................................... 142 Reference............................................................... 142 CHAPTER 6 OpenCL Host-Side Memory Model ....................... 143 6.1 MemoryObjects.................................................. 144 6.1.1 Buffers...................................................... 144 6.1.2 Images...................................................... 145 6.1.3 Pipes.......................................................

Heterogeneous Computing with Opencl 2.0 This Page Intentionally Left Blank Heterogeneous Computing with Opencl 2.0 Third Edition

Other Apis What’S Wrong with Openmp?

GPTPU: Accelerating Applications Using Edge Tensor Processing Units Kuan-Chieh Hsu and Hung-Wei Tseng University of California, Riverside {Khsu037, Htseng}@Ucr.Edu

Opencl on Shared Memory Multicore Cpus

Owner's Manual

Amd Filed: February 24, 2009 (Period: December 27, 2008)

AMD Accelerated Parallel Processing Opencl Programming Guide

Processamento Paralelo Em Cuda Aplicado Ao Modelo De Geração De Cenários Sintéticos De Vazões E Energias - Gevazp

On Heterogeneous Compute and Memory Systems

Evaluation of AMD EPYC

Multiprocessing Contents

Amd Athlon Ii X2 270 Manual

Multiprocessing and Scalability