CUDA Application Design and Development This Page Intentionally Left Blank CUDA Application Design and Development

CUDA Application Design and Development This page intentionally left blank CUDA Application Design and Development Rob Farber AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Todd Green Development Editor: Robyn Day Project Manager: Danielle S. Miller Designer: Dennis Schaeffer Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA © 2011 NVIDIA Corporation and Rob Farber. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-388426-8 For information on all MK publications visit our website at www.mkp.com Typeset by: diacriTech, Chennai, India Printed in the United States of America 1112131415 10987654321 Dedication This book is dedicated to my wife Margy and son Ryan, who could not help but be deeply involved as I wrote it. In particular to my son Ryan, who is proof that I am the older model – thank you for the time I had to spend away from your childhood. To my many friends who reviewed this book and especially those who caught errors, I cannot thank you enough for your time and help. In particular, I’d like to thank everyone at ICHEC (the Irish Center for High-End Computing) who adopted me as I finished the book’s birthing process and completed this manuscript. Finally, thank you to my colleagues and friends at NIVDIA, who made the whole CUDA revolution possible. v This page intentionally left blank Contents FOREWORD . ..........................................xi PREFACE .............................................xiii CHAPTER 1 First Programs and How to Think in CUDA . ........1 Source Code and Wiki . 2 Distinguishing CUDA from Conventional Programming with a Simple Example . 2 Choosing a CUDA API . 5 Some Basic CUDA Concepts . 8 Understanding Our First Runtime Kernel. 11 Three Rules of GPGPU Programming . 13 Big-O Considerations and Data Transfers . 15 CUDA and Amdahl’sLaw..........................17 Data and Task Parallelism . 18 Hybrid Execution: Using Both CPU and GPU Resources . 19 Regression Testing and Accuracy . 21 Silent Errors . 22 Introduction to Debugging . 23 UNIX Debugging . 24 Windows Debugging with Parallel Nsight . 29 Summary . 30 CHAPTER 2 CUDA for Machine Learning and Optimization. 33 Modeling and Simulation . 34 Machine Learning and Neural Networks . 38 XOR: An Important Nonlinear Machine-Learning Problem. .39 Performance Results on XOR . 53 Performance Discussion . 53 Summary . 56 The C++ Nelder-Mead Template . 57 vii viii Contents CHAPTER 3 The CUDA Tool Suite: Profiling a PCA/NLPCA Functor . ...................................63 PCA and NLPCA . 64 Obtaining Basic Profile Information . 71 Gprof: A Common UNIX Profiler . 73 The NVIDIA Visual Profiler: Computeprof . 74 Parallel Nsight for Microsoft Visual Studio. 77 Tuning and Analysis Utilities (TAU) . 82 Summary . 83 CHAPTER 4 The CUDA Execution Model . ................85 GPU Architecture Overview. 86 Warp Scheduling and TLP . 92 ILP: Higher Performance at Lower Occupancy . 94 Little’sLaw....................................100 CUDA Tools to Identify Limiting Factors . 102 Summary . 108 CHAPTER 5 CUDA Memory . ..........................109 The CUDA Memory Hierarchy . 109 GPU Memory . 111 L2 Cache . 112 L1 Cache . 114 CUDA Memory Types . 116 Global Memory. 124 Summary . 131 CHAPTER 6 Efficiently Using GPU Memory . ...............133 Reduction. 134 Utilizing Irregular Data Structures . 146 Sparse Matrices and the CUSP Library . 149 Graph Algorithms . 151 SoA, AoS, and Other Structures . 154 Tiles and Stencils. 154 Summary . 155 CHAPTER 7 Techniques to Increase Parallelism. ...........157 CUDA Contexts Extend Parallelism . 158 Streams and Contexts . 159 Out-of-Order Execution with Multiple Streams . 166 Contents ix Tying Data to Computation . 172 Summary . 176 CHAPTER 8 CUDA for All GPU and CPU Applications . ......179 Pathways from CUDA to Multiple Hardware Backends . 180 Accessing CUDA from Other Languages . 188 Libraries. 191 CUBLAS . 191 CUFFT . 191 Summary . 205 CHAPTER 9 Mixing CUDA and Rendering ..................207 OpenGL . 208 GLUT . 208 Introduction to the Files in the Framework . 213 Summary . 238 CHAPTER 10 CUDA in a Cloud and Cluster Environments ......241 The Message Passing Interface (MPI) . 242 How MPI Communicates . 246 Bandwidth . 248 Balance Ratios . 249 Considerations for Large MPI Runs . 252 Cloud Computing . 255 A Code Example . 256 Summary . 264 CHAPTER 11 CUDA for Real Problems ......................265 Working with High-Dimensional Data . 266 PCA/NLPCA . 267 Force-Directed Graphs . 271 Monte Carlo Methods. 272 Molecular Modeling . 273 Quantum Chemistry. 273 Interactive Workflows . 274 A Plethora of Projects. 274 Summary . 275 CHAPTER 12 Application Focus on Live Streaming Video . ......277 Topics in Machine Vision. 278 FFmpeg . 281 TCP Server . 283 x Contents Live Stream Application. 287 The simpleVBO.cpp File. 291 The callbacksVBO.cpp File. 291 Building and Running the Code . 295 The Future . 295 Summary . 297 Listing for simpleVBO.cpp . 297 WORKS CITED . ................. ....................303 INDEX . ...................... .......................311 Foreword GPUs have recently burst onto the scientific computing scene as an innovative technology that has demonstrated substantial performance and energy effi- ciency improvements for the numerous scientific applications. These initial applications were often pioneered by early adopters, who went to great effort to make use of GPUs. More recently, the critical question facing this technology is whether it can become pervasive across the multiple, diverse algorithms in scientific computing, and useful to a broad range of users, not only the early adopters. A key barrier to this wider adoption is software development: writing and optimizing massively parallel CUDA code, using new performance and correctness tools, leveraging libraries, and understanding the GPU architecture. Part of this challenge will be solved by.

Load more