CS87 Project Report: Ray Marching 3-Dimensional Fractals on GPU Clusters
Total Page:16
File Type:pdf, Size:1020Kb
CS87 Project Report: Ray Marching 3-Dimensional Fractals on GPU Clusters Jonah Langlieb, Kei Imada, Liam Packer Computer Science Department, Swarthmore College, Swarthmore, PA 19081 December 16, 2018 Abstract 1 Introduction Fractals are mathematical structures defined by self-similarity that serve as a foundational il- lustration of and model for chaos theory. Un- Fractals are self-similar mathematical structures like calculus-based models which assumes that which play an important role in chaos theory and the closer an object is examined, the more simi- which serve as visually stunning examples of the com- lar it becomes to a smooth euclidean ideal, frac- plexity that can stem from simple mathematical defi- tals are designed to stay `rough' and complex nitions. They are especially amiable for use with par- at small scales. While the fundamental mathe- allel computing because they only require the (embar- matical idea can be traced back earlier, the term rassingly parallel) iteration of a complex function at `fractal' was coined and popularized by Benoit each point. However, with the advent of 3D `mandel- Mandelbrot in his seminal 1982 work The Fractal bulb' fractals, new techniques are required to more ef- Geometry of Nature which used fractals as a way ficiently visualize these structures. Ray marching of- of modeling natural processes, which similarly fers one such technique that combines the traditional requires the rejection of simplification at small approach of 3D ray tracing with mathematically- scales. Such modeling has found widespread and informed `jumps' between time steps, allowing for productive use in fields such as Biology [10] and drastically improved performance. In this paper, we Physics [9, 4], along with continued research in explore the application of ray marching `Mandelbulb' Mathematical Chaos Theory. And, because of fractals with modern CUDA GPGPU programming, the elegant and succinct equations which under- extended to run on the commodity GPU cluster avail- lie them, they are particularly useful in Com- able at Swarthmore College, in order to understand puter Graphics as astonishing illustrations of the the trade-offs involved in large-scale CUDA program- complexity that can be achieved with computers. ming. Our program, which uses C++ CUDA to inter- One type fractal which became popular in the face with the GPU and MPI for inter-node commu- Computer Graphics field in the late 2000s is 3- cation demonstrates robust scaling across resolution dimensional (3D) fractals, especially the Mandel- and number of nodes. Our hypotheses|that addi- bulb fractal, a variant of the 2-dimensional (2D) tional GPUs would improve run-time only when the Mandelbrot, which is generated using recursive image size was larger than a single GPU's memory iterations of an imaginary `escape-time' function. and that partitioning which considered the load of Unlike 2D fractals, which are relatively straight- each node would improve run-time|were both em- forward to display because the value of the frac- pirically supported by our data. tal can simply be calculated for each point on the 1 plane, 3D fractals require a different approach maximize the speedup and image resolution. In to efficiently display. The na¨ıve approach would this experiment, we used the networked, GPU- be to use standard graphics techniques, like ray equipped computers of the Swarthmore College tracing, to generate an image. Using ray trac- Computer Science Department which has both ing, rays of light are extended in incremental heterogeneous nodes and, due to broad student constant-time steps from the `eye's' viewpoint use, widely varying load. until they hit a part of the object, simulating the We used a custom CUDA C++ program to way we naturally see. One advantage of this ap- interface with the GPUs and MPI to distribute proach is that it does not require rendering any the sub-tasks and communicate between nodes, non-visible section of the object. However, for In order to measure different aspects of the par- fractals this approach is computationally expen- allelized runtime, we modified the resolution of sive (though embarrassingly parallel) and mem- the output image, the number of nodes, and ory inefficient because, unlike typical graphical the partitioning scheme with which we split up constructs which are simple enough to make it the nodes work. We hypothesized that, due to easy to detect if a point is on the objects sur- the tradeoff between communication costs and face, the calculation for a fractal at each time- limited memory, the GPU cluster would have a step to determine whether a point is on it is it- faster runtime only when the resulting image was self computationally expensive. Therefore, an- larger than a single GPUs memory. Additionally, other approach is necessary. One such approach, due to the heterogeneity of nodes, we expected ray marching, was pioneered as early as 1989 a partitioning scheme that respected such differ- by Hart et al. [2] and extensively refined by ences would out-perform a simple `even' parti- Inigo Quilez [7]. In this approach, analytically- tioning. Supporting our hypothesis, we found derived distance functions are used which pro- that our solution scales well with additional vide an estimate for the maximum distance a ray GPUs, especially when the resolution exceeds can go without reaching the object. Using these the capacity of the GPUs memory. Addition- functions, instead of ray tracing in constant-time ally, the load-respecting partition was markedly steps, we can ray march in variable-sized `jumps' faster than the even partition. for this maximum distance. This dramatically improves performance. 2 Related Work Because ray marching can efficiently take advantage of ray tracing and because of its We were particularly inspired by Hart's 1989 embarrassingly-parallel nature, we thought it paper Ray Tracing Determinstic 3-D Fractals could be applied to graphical processing units which ray marched the Julia Set (another family (GPUs). The architecture of these external units of 3D fractals) in order to render highly detailed are designed for (and force) embarrassingly- images, even on constrained hardware [3]. This parallel tasks and fractal generation has a long is this one of the first papers to apply ray march- tradition of taking advantage of them [5]. And, ing (which they call `unbounding volumes') to in recent years, GPUs have become common- fractals and is an especially good primer to un- place enough that many commodity machines derstanding more-complicated contemporary ray (laptop and desktop) include one. With this marching algorithms. Additionally, analogous to prevalence, we wanted to use (commodity) GPU our work, they used the AT&T Pixel Machine, a clusters to further take advantage of the paral- predecessor of modern GPUs, which consists of lelism inherent in Mandelbulb generation. Our 64 parallel processors dedicated to graphics pro- work explores how to parallelize 3D fractal ren- cessing [6]. Even though the processing speed dering on a commodity GPU cluster such that we is quite different (≈ 1 hour for a 1280 x 1024 2 image) and their memory much more physically we are very close to the surface, we consider the constrained, we found their techniques helpful ray on the surface. In the case of the \Mandel- in optimizing our own algorithm for speed and bulb" fractal, the details of the distance function memory consumption. Additionally, it is always are beyond the scope of this paper but has been inspiring and humbling to read a paper almost previously derived by other posts and papers [2]. three decades old which remains exceptionally While ray marching is an embarrassingly par- relevant to modern computer science. allel problem that can be implemented on a sin- Additionally, the work of Inigo Quilez, who gle GPU for fast ray marching on relatively small championed the use of ray marching in graph- resolutions, the problem of computing large res- ics, lays the foundation of this work, not only for olutions (On the scale of ≥ 216 ×216) is not fit for ray marching in general, but also for the Man- a single GPU. This is due to the limited memory delbulb. He has many blog posts about how to of a single GPU, along with the limited number render the Mandelbulb [7] and includes a fully- of possible threads to assign to different resolu- functioning web-based version of his code [8]. tion indices. We propose a scalable solution to This online version was especially invaluable in this problem. implementing our code. He also demonstrates extra optimizations, such as color and rotations. 3.2 Our Solution Additionally in Fractal Art Generation Using GPUs, Mayfield et al. help motivate much of our Our solution to this problem of large resolu- experiments into the speed-up possible by ray tion ray marching is to implement a CUDA/MPI marching fractals with their analysis of 2D frac- program that utilizes the messaging capabilities tals. Their analysis of the GPU vs CPU speedup of MPI to assign indices of a large resolution to fractal generation was helpful in understanding various nodes in the network with usable GPUs. our own speed-up tradeoffs, even though they This remedies the problem of limited GPU mem- only used 1 GPU [5]. ory, since a cluster scales linearly with the num- ber of nodes. More specifically, we are leveraging the Open- 3 The Problem and Solution MPI abilities of the Swarthmore Computer Sci- ence labs to use various connected lab machines 3.1 The Problem with GPUs to perform these large computations. Note that the GPUs provided by Swarthmore The problem that we have explored is that of have drastically different computational power 3D fractal generation through ray marching for and, due to general student use, constantly fluc- large-scaling resolutions. Ray marching is an ex- tuating use.