California State University, Northridge

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Comparative Effectiveness of CPU and GPU Ray Tracing With Parallel Computation A thesis submitted in partial fulfillment of the requirements For the degree of Master of Science in Computer Science By Dustin Patrick Delmer May 2017 Copyright Dustin Patrick Delmer 2017 ii The thesis of Dustin Patrick Delmer is approved: _____________________________________________ __________________ Dr. Robert Mcllhenny Date _____________________________________________ __________________ Dr. John Noga Date _____________________________________________ __________________ Dr. G. Michael Barnes, Chair Date California State University, Northridge iii In Loving Memory of my Father, Daniel William Christian Delmer iv Table of Contents Copyright ........................................................................................................................................ ii Signature Page ............................................................................................................................... iii Dedication ….................................................................................................................................. iv List of Figures …........................................................................................................................... vii Abstract …...................................................................................................................................... vi 1. Introduction ….................................................................................................................... 1 2. Related Work ….................................................................................................................. 3 3. Technology Overview ….................................................................................................... 5 3.1. CPU vs GPU …..................................................................................................... 5 3.2. ISPC ….................................................................................................................. 5 3.3. CUDA …............................................................................................................... 6 4. Implementation ….............................................................................................................. 8 4.1. Ray Tracing Algorithm …..................................................................................... 8 4.2. C++ Serial Implementation …............................................................................. 10 4.3. ISPC Implementation …...................................................................................... 10 4.4. CUDA Implementation …................................................................................... 13 4.5. Dynamic Scene Generation …............................................................................ 16 5. Results/Comparison …..................................................................................................... 17 5.1. Hardware …......................................................................................................... 17 5.2. Rendering Results from Default Scene …........................................................... 19 5.3. Dynamic Scene: Sphere Count and Resolution Costs ….................................... 20 5.4. Data Gather Techniques and Results ….............................................................. 21 6. Conclusion …................................................................................................................... 32 v References ….............................................................................................................................. 33 vi List of Figures Figure 4.1. Ray tracing figure …..................................................................................................... 9 Figure 4.2 Reflection and refraction …........................................................................................... 9 Figure 4.3: foreach_tiled loop ...................................................................................................... 11 Figure 4.4: ISPC Multi-core - the task function ........................................................................... 12 Figure 4.5: ISPC multi-core – launch[nTasks] .............................................................................. 12 Figure 4.6: CUDA - malloc and memcpy….................................................................................. 14 Figure 4.7: CUDA - Kernel function …........................................................................................ 15 Figure 4.8: CUDA - Kernel call …................................................................................................ 16 Figure 4.9: Code snippet from common/rand_sphere …............................................................... 17 Figure 5.1: Default config, 640x480, five static spheres and a light ............................................ 19 Figure 5.2: Sphere counts ( 8, 64, 216, 512, 1000) …................................................................... 22 Figure 5.3: Sphere Count vs Time - All Techniques …................................................................ 24 Figure 5.4: Sphere Count vs Time - ISPC Only …........................................................................ 25 Figure 5.5: Sphere Count vs Time - CUDA vs ISPC …................................................................ 26 Figure 5.6: Resolution vs Time - All Techniques …...................................................................... 27 Figure 5.7: Resolution vs Time - ISPC Only …............................................................................ 28 Figure 5.8: Resolution vs Time - CUDA vs ISPC ….................................................................... 29 Figure 5.9: Bar Chart Comparison - ISPC vs CUDA …............................................................... 30 vii Abstract Comparative Effectiveness of CPU and GPU Ray Tracing With Parallel Computation By Dustin Patrick Delmer Master of Science in Computer Science In this thesis, a comparison of GPU and CPU based computation using consumer grade hardware and parallel programming languages in a raytracer is presented. The raytracers presented make use of C++, Intel’s SIMD CPU language and compiler: ISPC, and Nvidia’s GPGPU language/compiler: CUDA. Performance was measured for three levels of image resolution (2562, 5122, 10242) and five levels of image complexity ( sphere counts: 8, 64, 216, 512, 1000). Image resolution had the greatest impact on performance. Image complexity had a constant effect on performance. As image resolution increased the parallel GPU solution offered the best results. This thesis discusses the advantages and disadvantages of CPU vs GPU parallel programming. viii 1. Introduction Parallel programming has become increasingly relevant over the past decade. Manufacturers have shifted their focus from increasing clock speeds to augmenting the core and thread counts in each generation of new CPU’s and GPU’s. In this environment writing applications that take full advantage of these multi-cored components has becoming more and more essential. There are several languages and compilers that exist to help programmers write code capable of generating efficient parallel programs. In this thesis two of these languages will be discussed: ISPC and CUDA. This thesis will show how these languages can be utilized to augment a raytracer, and compare and contrast them. Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of its encounters with virtual objects. These algorithms are used in 3D rendering and animation and computer graphics. ISPC is an Intel developed compiler with extensions for “single program, multiple data” (SPMD) programming [1]. ISPC simplifies the task of spawning multiple parallel program instances across a CPU. ISPC provides a thin abstraction layer between the programmer and the hardware; to spare the programmer the burden of writing extremely low-level intrinsics to achieve high performance. ISPC methods are exposed to C++ code through *.isph headers, and objects produced by .ispc files are compiled into libraries and executables along with C++ code using conventional c++ compilers. CUDA is an Nvidia developed GPGPU (general purpose programmability on the graphics processing unit) compiler and language that allows programmers to allocate massively 1 parallel tasks directly to the GPU [2]. Through CUDA parallel functions can be written in a seemingly serial manner, which is instanced in many threads, across many blocks within a GPU. When using CUDA, the CPU’s role is allocating memory, copying data, and launching Kernel functions. CUDA programs take advantage of the large number of cores present in modern GPU’s to do the bulk of their computation. In this report there are six sections. Section 2 will present work related to comparisons of serial and parallel programming. Section 3 will present CPU and GPU architectures, and ISPC and CUDA technologies used in this report. Section 4 will present the implementation of the raytracer in C++, ISPC, and CUDA. Section 5 presents the results of running these implementations with five sphere counts, over three image resolutions. Lastly, section 6 will briefly discuss the results. 2 2. Related Work Parallel programming is widely

Load more