GPU-Assisted Ray Casting of Large Scenes

GPU-Assisted Ray Casting of Large Scenes Daniel A. Balciunas∗ Lucas P. Dulley† Marcelo K. Zuffo‡ Laboratory of Integrable Systems University of Sao˜ Paulo ABSTRACT We implemented a pipelined rendering system that pre-renders a reduced set of a scene using the raster method built in the graphics hardware. The computation performed by the graphics card is used as an estimate for evaluating the initial traversal points for a ray caster running on the CPU. This procedure replaces the use of complex spatial subdivision structures for primary rays, offloading work that would traditionally be executed by the CPU and leaving additional system memory available for loading extra scene data. The ray traversal algorithm skips even narrow empty spaces, which are usually hard to map using conventional spatial subdivision. We achieved interactive frame rates (3–10 frames/s) running the system on a single computer with conventional hardware. Keywords: Ray casting, graphics hardware, height field. Index Terms: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Raytracing; I.3.8 [Computer Graphics]: Applications; I.3.3 [Computer Graphics]: Methodology and Tech- niques Figure 1: Interactive frame rate acheived for a 268 million vox- els height field (equivalent to 537 million triangles) with a 16,384 x 1 INTRODUCTION 16,384 texture. For historical reasons, graphics hardware uses the raster method for 3D rendering. Its execution on current graphics cards is much faster than any ray caster running on a CPU and it became the most This paper is organized as follows: Section 2 presents the moti- popular choice for generic image rendering. vation for this paper. Section 3 discusses the related work. Section However, rendering large scenes can be an arduous task for 4 presents a case study, the visualization of high definition height graphics hardware, particularly when the amount of data is far fields. Section 5 describes the use of bounding volumes in our sys- greater than its available memory. This may restrict quick cam- tem. Section 6 discusses the system architecture. Finally some era displacement movements, hindering the visualization of a scene results are presented in Section 7 and conclusions in Section 8. with fine details, because scene visualization relies on scene paging algorithms [8], occlusion culling [5] and on the use of level of detail 2 MOTIVATION [30] to increase performance. Ray casting systems are an excellent choice for visualization of Since graphics hardware has become a commodity, we decided to huge data sets [9]. They render scenes with billions of polygons at propose and implement a rendering system that makes use of this interactive frame rates. However, these systems usually have to run common computational resource and the CPU to render large scene on computer clusters to achieve interactive frame rates, and they data sets at interactive frame rates. These scenes do not fit the do not make intensive use of a valuable and common hardware re- graphics card memory, but they do fit the main memory. source: the graphics card. Even when using a ray tracer, the power We had basically two choices: either to implement a common of graphics hardware processing is essential for real-time visualiza- rendering system using the graphics hardware as the main ren- tion. derer, or a ray caster rendering system running on the CPU, using Our work proposes and implements a ray casting acceleration the graphics hardware as a secondary resource for speedup. The system that makes intensive use of the graphics hardware rastering first approach is widely used, but requires complex additional al- algorithm to evaluate the starting ray traversal points for primary gorithms, such as occlusion culling [5], scene paging [8] and level rays. The starting traversal points are positioned very close to the of detail [30]. The second approach would use the graphics card intersection point, avoiding the need to traverse through complex computational power to replace a spatial subdivision algorithm that spatial subdivisions. A simple implementation achieved interactive would usually be implemented on the CPU to speedup the ray cast- frame rates for large scene data sets, running the test program on a ing time. single computer (Figure 1). We chose the ray casting approach instead of the graphics card rendering approach. This work shows the speedup achieved when ∗e-mail: [email protected] using a rendering system that makes use of the graphics card’s orig- †e-mail: [email protected] inal rendering capabilities in order to accelerate a ray caster running ‡e-mail: [email protected] on the CPU, without the use of complex algorithms or spatial subdivision for the ray caster. We did not focus on the graphics card rendering approach as it already has been widely tested. Our aim is to compare the acceleration of our method with other ray casting acceleration techniques. Our acceleration method uses the GPU to render a distance 3 RELATED WORK buffer of a simplified set of bounding volumes of the objects of Rendering large scenes using the raster method of current graph- a large scene. This is done computing the distance from the camera ics hardware relies basically on the use of frustum culling, scene to the nearest bounding volume for each pixel. The distance-buffer paging and level of detail algorithms. is then used to evaluate the starting ray traversal points by the ray Scene paging [8] is limited by the transfer rate between the caster running on the CPU, it places these starting traversal points graphics card and the system memory, which might hinder the ren- very close to the intersection, skipping most of the empty regions dering response time of the first frame after a quick camera move or of the scene. any requisition of a reasonable amount of scene data. This can re- The speedup comes from the fact that the CPU does not have to sult in a camera movement restriction imposed by the visualization traverse each primary ray through any complex spatial subdivision. application. Whether the bounding volumes are a good approximation of the Level of detail [30] helps to decrease the amount of data pro- represented objects, rays do not traverse long distances across the cessed by the GPU, increasing the frame rate, but it may severely scene before hitting an object, strongly reducing the total ray cast- increase the total amount of data stored in memory and increase the ing time. For the most generic scenes with large number and kinds traffic between CPU and GPU when rendering large scenes, spe- of objects, a simple grid [1] would be enough for object culling. cially when soft transitions between the levels are required. Level The architecture is basically a pipeline: the graphics card pro- of detail algorithms might not mimic the scene with fidelity when cesses the next frame while the ray caster renders the actual frame it is necessary to use higher levels with fewer triangles to preserve (we implemented it using multi-thread parallelism). As the graph- the frame rate. ics card has its own rendering method initially designed for per- Occlusion culling [5] is a feature unavailable in the graphics card formance and not for visual accuracy, it becomes a good choice raster method. It must be implemented apart and consumes extra to render an approximate scene (stored in the GPU), outputting an processing power. Simple implementations of ray casters already approximate distance buffer as the result of its computation. The have native object occlusion culling. distance-buffer is then used for ray casting optimization, rendering A simple approach for rendering large terrains on the GPU is the the full resolution scene (stored on the system memory) with max- use of geometry clipmaps [2], combining the techniques presented imum quality. This procedure is illustrated in Figure 2. above and the fact that terrains can be treated as 2D elevation im- ages, resulting in real-time frame rates. An interesting ray tracing solution for detailed terrain rendering using displacement mapped triangles has been proposed in [26], but interactive frame rates could not be achieved then. The use of rasterization to find primary ray intersections is intro- duced in [28]. Our work uses the graphics card rasterization method to obtain an approximation for the starting ray traversal points — a subtle but important difference. An outstanding solution for huge scene data visualization has been proposed by [9]. An iterative ray tracer uses scene paging Figure 2: The stages of our rendering system: from simplified scene techniques to provide interactive frame rates. Billions of polygons geometry to depth-buffer and then ray casting. can be visualized while moving through the scene. The fine details are loaded when the camera stops moving quickly through the The proposed system as described is flexible and can be repro- scene. Although the scene data used for this solution does not even duced easily, as it uses only the basics of GPU and thread program- fit the system memory, the time required to obtain a fine detail frame ming techniques. It requires only hardware that is readily available is considerable. It also requires the use of several cluster nodes to on recent computers. It does not use any aggressive or relaxed algo- render at interactive frame rates. rithm that would generate artifacts. It also preserves storage space An implementation of a ray tracer on the GPU [4, 21] would be by avoiding the use of complex spatial subdivision for primary rays. limited for our purpose: interactive frame rates were achieved only Since the whole scene is available for the ray caster renderer, the for scenes with less than one million triangles. The graphics hard- camera displacement movements are not restricted: we can quickly ware was not originally designed for this purpose, and ray tracing navigate through different parts of the scene, and position the cam- usually may not fit well on GPU’s streaming architecture due to the era to display the entire scene (with fine details) without having to high number of branches.

Load more