CUDA Programming
Total Page:16
File Type:pdf, Size:1020Kb
CUDA Programming A Developer’s Guide to Parallel Computing with GPUs This page intentionally left blank CUDA Programming A Developer’s Guide to Parallel Computing with GPUs Shane Cook AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an Imprint of Elsevier Acquiring Editor: Todd Green Development Editor: Robyn Day Project Manager: Andre Cuello Designer: Kristen Davis Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Ó 2013 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-415933-4 For information on all MK publications visit our website at http://store.elsevier.com Printed in the United States of America 13 14 10 9 8 7 6 5 4 3 2 1 Contents Preface ................................................................................................................................................ xiii CHAPTER 1 A Short History of Supercomputing................................................1 Introduction ................................................................................................................ 1 Von Neumann Architecture........................................................................................ 2 Cray............................................................................................................................. 5 Connection Machine................................................................................................... 6 Cell Processor............................................................................................................. 7 Multinode Computing ................................................................................................ 9 The Early Days of GPGPU Coding......................................................................... 11 The Death of the Single-Core Solution ................................................................... 12 NVIDIA and CUDA................................................................................................. 13 GPU Hardware ......................................................................................................... 15 Alternatives to CUDA .............................................................................................. 16 OpenCL ............................................................................................................... 16 DirectCompute .................................................................................................... 17 CPU alternatives.................................................................................................. 17 Directives and libraries ....................................................................................... 18 Conclusion ................................................................................................................ 19 CHAPTER 2 Understanding Parallelism with GPUs ......................................... 21 Introduction .............................................................................................................. 21 Traditional Serial Code ............................................................................................ 21 Serial/Parallel Problems ........................................................................................... 23 Concurrency.............................................................................................................. 24 Locality................................................................................................................ 25 Types of Parallelism ................................................................................................. 27 Task-based parallelism ........................................................................................ 27 Data-based parallelism........................................................................................ 28 Flynn’s Taxonomy .................................................................................................... 30 Some Common Parallel Patterns.............................................................................. 31 Loop-based patterns ............................................................................................ 31 Fork/join pattern.................................................................................................. 33 Tiling/grids .......................................................................................................... 35 Divide and conquer ............................................................................................. 35 Conclusion ................................................................................................................ 36 CHAPTER 3 CUDA Hardware Overview........................................................... 37 PC Architecture ........................................................................................................ 37 GPU Hardware ......................................................................................................... 42 v vi Contents CPUs and GPUs ....................................................................................................... 46 Compute Levels........................................................................................................ 46 Compute 1.0 ........................................................................................................ 47 Compute 1.1 ........................................................................................................ 47 Compute 1.2 ........................................................................................................ 49 Compute 1.3 ........................................................................................................ 49 Compute 2.0 ........................................................................................................ 49 Compute 2.1 ........................................................................................................ 51 CHAPTER 4 Setting Up CUDA........................................................................ 53 Introduction .............................................................................................................. 53 Installing the SDK under Windows ......................................................................... 53 Visual Studio ............................................................................................................ 54 Projects ................................................................................................................ 55 64-bit users .......................................................................................................... 55 Creating projects ................................................................................................. 57 Linux......................................................................................................................... 58 Kernel base driver installation (CentOS, Ubuntu 10.4) ..................................... 59 Mac ........................................................................................................................... 62 Installing a Debugger ............................................................................................... 62 Compilation Model................................................................................................... 66 Error Handling.......................................................................................................... 67 Conclusion ................................................................................................................ 68 CHAPTER 5 Grids, Blocks, and Threads........................................................