Cuburn, a GPU-Accelerated Fractal Flame Renderer
Total Page:16
File Type:pdf, Size:1020Kb
Cuburn, a GPU-accelerated Cuburn overcomes these challenges by what is essentially a brute-force approach. We have abandoned best practices, fractal flame renderer compatibility guidelines, and even safety considerations in cuburn in order to get as close to the hardware as possible Steven Robertson, Michael Semeniuk, Matthew and wring out every last bit of performance we can. This Znoj, and Nicolas Mejia has resulted in an implementation which, while considerably more complicated than its peers, is also faster, more complete, School of Electrical Engineering and Computer and more correct. It has also resulted in new insights about Science, University of Central Florida, Orlando, the hardware on which it runs that are generally applicable, Florida, 32816 including a few algorithms for accomplishing commonly-used tasks more quickly. Abstract — This paper describes cuburn, an implementation of a GPU-accelerated fractal flame renderer. The flame algorithm II. The fractal flame algorithm is an example of a process which is theoretically well-suited to the abstract computational model used on GPUs, but which is At its simplest, the fractal flame algorithm is a method difficult to efficiently implement on current-generation platforms. for colorizing the result of running the chaos game on a We describe specific challenges and solutions, as well as possible particular iterated function system. generalizations which may prove useful in similar problem An iterated function system, or IFS, consists of two or domains. more functions, each typically being contractive (having a derivative whose magnitude is below a certain constant across I. Introduction the area being rendered) and having fixed points or fixed The fractal flame algorithm is a method for visualizing the point sets that differ from one another. The Sierpinski gasket strange attractor described in a mathematical structure known is a commonly-used example, having the three sets of affine as an iterated function system, or IFS. In broad terms, the transforms presented in (1). algorithm plots a 2D histogram of the trajectories of a point cloud as it cycles through the function system. Due to the self- x y similarity inherent in well-defined iterated function systems, x = y = 0 2 0 2 the resulting images possess considerable detail at every x + 1 y scale, a fact which is particularly evident when animations x = y = 1 2 1 2 are produced via time-varying parameters on the IFS. x y + 1 Capturing this multi-scale detail requires a considerable x = y = (1) 2 2 2 2 amount of computation. The contractive nature of IFS trans- form functions means that histogram energy is often con- The chaos game is simple. Starting from any point in centrated in a small part of the image, leaving the lower- the set of points along the attractor, choose one equation in energy image regions susceptible to sampling noise. When the IFS and apply it to the point to receive a new point. rendering natural scenes using related techniques such as ray- Repeat this process, recording each generated point, until tracing, strong filtering is often sufficient to suppress this a sufficient number of samples have been generated. The noise in non-foreground regions, but the flame algorithm recorded point trajectories can be analyzed to reveal the shape intentionally amplifies darker regions using dynamic range of the attractor. The result, in the case of Sierpinski’s gasket, compression in order to highlight the multi-scale detail of is shown in Figure 1. an attractor. As a result, flames require a very large number The flame algorithm expands and codifies this algorithm of samples per pixel. Further, each sample on its own is in a few important ways, some of which will be described often computationally complex. Unlike many other fractal below. visualization algorithms, flames incorporate a number of To select a valid starting point for iteration, select any point operations that are slow on many CPUs, such as trigonometric within the region of interest, and iterate without recording functions. that trajectory until the error is sufficiently small. In fractal Despite requiring an atypically large number of challenging flame implementations, this process is called fusing with the samples, the flame algorithm is still fundamentally similar attractor, one of many terms chosen, arguably poorly, by the to ray-tracing algorithms, and is thus a potential target for original implementors and retained for consistency. Since it acceleration using graphics processing units. Nevertheless, is difficult to determine distance to attractor locations without past attempts at doing so have been challenged by specific first having evaluated them, most implementations simply run implementation details which constrain performance or re- a fixed, conservatively large number of fuse rounds before duce image fidelity. recording begins. accumulation buffer are read, and the tristimulus values are scaled logarithmically in proportion to the density of that cell. This is followed by density estimation, which has nothing to do with estimation, and is more accurately termed “noise filtering in extremely low density areas via a variable-width Gaussian filter proportional to accumulation cell density”, although admittedly that’s less catchy. A small amount of postprocessing handles such things as color values which exceed the representable range, and then an image is created. Fractal flame images can be beautiful on their own, but are at their best when in motion. Animation of a fractal flame is accomplished by defining transform values as functions of time, either implicitly (as in the standard, frame-based XML genome format) or explicitly (as used in cuburn’s more pow- erful animation-focused JSON representation). Each frame of an animation is generated by computing thousands of control points (which are not control points but rather the result of interpolating between the actual control points) covering the frame’s display time, and then running the chaos game Figure 1. Sierpinski’s gasket, as rendered by cuburn. for each of these control points to obtain the appearance of smooth motion. IFS transform functions in flames are considerably more III. Implementation overview expressive than the traditional affine transformations em- To render a flame, cuburn first loads a genome file and ployed in fractal compression and other uses of IFS. While analyzes it to determine which features the genome makes each of these xforms — another historically-motivated mis- use of, as well as estimates of certain constants such as nomer — does begin with an affine transform, that trans- the size of auxiliary buffers that may be required during the formed value is then passed through any number of vari- rendering process. The feature set and constant estimates are ations, a predefined set of functions which range from a supplied to the code generator, which combines annotated simple identity function to a parameterized implementation code snippets to produce kernels that will execute on the of the Möbius transform. The weighted sum of the output compute device. Estimated values are fixed based on the coordinates of these variations are then passed through a results of code generation and corrected constants are injected second affine transform to produce the final transformed into the kernels before uploading to the device. coordinates. Device buffers are then allocated and initialized, including The method of selecting which xform to apply at each a copy of the genome packed into a runtime-fixed layout round is similarly augmented. Each xform is assigned a determined in the previous step. For each frame of the density value, which affects how often it is selected. Some animation, an interpolation kernel is applied to the genome flames also make use of chaos, which is in fact not chaotic data to produce a control point for each temporal step of the but a relatively orderly Markov process depending only on frame. the immediately previously selected xform. For most flames, the next step is to perform the chaos Xforms are also augmented with a separate 1D transforma- game. A kernel is launched which performs a portion of the tion which is applied to the color coordinate, an indepently- requisite iterations. After each iteration, the current position is tracked value with a valid domain of [0,1]. The color coor- examined, and — if within the region of interest — the linear dinate is used to record the point trajectories: as each point accumulator pixel address and color coordinate are packed in is generated, the (x; y) coordinates are used to identify a a 32-bit integer and appended to a log. The log is then sorted memory cell in a 2D accumulation buffer, while the color is based on a key extracted from between 8 and 10 bits of the used to select a tristimulus value (traditionally RGB, though log entries, to obtain a degree of linear memory locality for any additive color space will do) from a palette defined by the histogram accumulation. An accumulation kernel reads the flame. The tristimulus value is then added to the appropriate sorted log in multiple passes, storing information in shared cell in the accumulation buffer, and the accumulation density memory before dispatching it to a device buffer. This process for that cell is incremented. In this way, a sort of histogram of iterate, sort, and accumulate is repeated until a sufficient is constructed from the chaos game point trajectories. number of samples have been collected in the accumulation After a prespecified number of iterations, the values in the buffer. A density estimation kernel is launched, which performs around those variations, allowing them to be merged into a log-scaling of the accumulated points, antialiasing postpro- single basic block and optimized more effectively. While this cessing, and selective blurring to provide a depth-of-field technique adds a considerable amount of complexity to the effect and correct a limited amount of sampling noise.