Reducem: Interactive and Memory Efficient Ray Tracing of Large Models

Eurographics Symposium on Rendering 2008 Volume 27 (2008), Number 4 Steve Marschner and Michael Wimmer (Guest Editors) ReduceM: Interactive and Memory Efficient Ray Tracing of Large Models Christian Lauterbach1 Sung-eui Yoon2 Ming Tang3 Dinesh Manocha1 1University of North Carolina at Chapel Hill 2Korea Advanced Institute of Science and Technology 3Zhejiang University Abstract We present a novel representation and algorithm, ReduceM, for memory efficient ray tracing of large scenes. Re- duceM exploits the connectivity between triangles in a mesh and decomposes the model into triangle strips. We also describe a new stripification algorithm, Strip-RT, that can generate long strips with high spatial coherence. Our approach uses a two-level traversal algorithm for ray-primitive intersection. In practice, ReduceM can sig- nificantly reduce the storage overhead and ray trace massive models with hundreds of millions of triangles at interactive rates on desktop PCs with 4-8GB of main memory. 1. Introduction I/O performance. There is considerable prior work on re- ducing the memory overhead of ray tracing algorithms. At Ray tracing has recently emerged as a viable alternative to a broad level these algorithms use a more compactrepresen- rasterization for interactive applications. This is mainly due tation of the geometric primitives or the acceleration struc- to the improvement in processing speed along with algorith- ture, perform memory reordering operations, use precom- mic developments that use optimized hierarchical represen- puted levels-of-detail, or use representations that implicitly tations and ray coherence techniques. Current ray tracers can represent the connectivity between the primitives (e.g. trian- render large models composed of a few millions of triangle strips). However, it still remains a challenge to ray trace gles at interactive rates (i.e. 10 frames per second) on cur- large data sets (e.g. Boeing 777) on current desktop or laptop rent laptop or desktop systems. In this paper, we address the systems with only 4 − 8 GB of main memory at interactive problem of interactive ray tracing of massive data sets on rates. commodity desktop systems. Models with tens or hundreds of millions of triangles are commonly used in different ap- Main Results: We present a novel representation and a con- plications including scientific visualization, terrain render- struction algorithm for memory efficient ray tracing of large ing, CAD/CAM, virtual environments, etc. The complexity scenes. Our formulation exploits the connectivity informa- of these models entails new challenges in terms of storing tion between the mesh triangles and represents them as tri- these data sets as well as ray tracing them in real time. angle strips. The two novel aspects of our work include: Current ray tracers use spatial partitioning or bounding 1. ReduceM representation: We represent the model using volume hierarchies as acceleration structures. These hierar- a two level hierarchy. The lower levels correspond to strip chies result in logarithmic behavior of ray tracing as a func- hierarchies, and each strip hierarchy implicitly encodes a hi- tion of the number of primitives of models. At the same erarchy generated on a triangle strip. The higher level hier- time, the use of hierarchies introduces many issues related archy is a global hierarchical acceleration structure (e.g. a to memory efficiency. The memory access pattern of a hier- kd-tree), whose leaf nodes contain the strip hierarchies. This archy traversal is non-local and can result in large working two level formulation is compact and requires less memory set sizes. If the size of the original model and the hierar- footprint than other hierarchies used for interactive ray trac- chy exceeds the available main memory, the performance ing. Moreover, the ReduceM representation provides coher- of ray tracing will degrade considerably due to slow disk ent access to the encoded vertices and thereby results in im- c 2008 The Author(s) Journal compilation c 2008 The Eurographics Association and Blackwell Publishing Ltd. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. C. Lauterbach & S. Yoon & M. Tang & D. Manocha / ReduceM: Interactive and Memory Efficient Ray Tracing of Large Models proved run-time performance. Our overall representation is tization [Mah05, CSE06], using a compact object hierarchy lossless and we describe an efficient algorithm for traversing with lazy building [WK06], or efficient culling methods to the two-level hierarchies and performing intersection com- allow a less detailed and thus smaller hierarchy [Res07]. putations. These methods could be combined with our proposed methods to further reduce the memory requirement of ray tracing 2. Strip-RT stripification algorithm: In order to achieve massive models. low memory overhead and high run-time performance for ray tracing, we present a new stripification method, Strip- Mesh Representations: There is considerable work on RT, to generate triangle strips. Our method is based on two computing memory-efficient mesh representations for inter- criteria: maximizing the length of the strip and increasing the active rasterization. These include triangle strips [Dee95], spatial coherence of the acceleration hierarchy implicitly en- rendering sequences [Hop99], and cache-oblivious mesh coded in the triangle strip. We use a hierarchical method to layouts [YLPM05]. In the ray tracing literature, there compute the triangle strips based on the surface-area heuris- is work on improved representations for subdivision ∗ ∗ tic (SAH) metric. We show that Strip-RT results in improved meshes [KDS98, MTF03, CLF 03, SMD 06]. Amanatides performance as compared to prior stripification algorithms and Choi [AC97] present an edge-based ray-mesh intersec- that were designed primarily for fast rasterization. tion method for regular meshes using Plücker coordinates. For general triangular meshes, the most recent work uses tri- We have tested the performance of ReduceM on mas- angle strips [LYM07]. sive models with tens and hundreds of millions of triangles. These include CAD data sets, scanned models and iso- 3. Overview surfaces generated from scientific simulation. As compared to the currently fastest ray tracing algorithms based on kd- In this section, we first describe the main issues in designing trees for massive models [RSH05, WDS04], ReduceM de- memory efficient representations for ray tracing. Then, we creases the total storage overhead by up to 5 times and the present our representation, ReduceM, and describe how the runtime memory requirement for hierarchy and connectiv- representation can be used to accelerate ray tracing on large ity by over 10 times. This makes it possible to represent the models. complex mesh and the hierarchy of Boeing 777 model (with 362M triangles) using only 9GB of memory and ray trace 3.1. Memory Issues for Interactive Ray Tracing it at 10 fps on a multi-core workstation. On the other hand, Most interactive ray tracing approaches have considerable prior interactive algorithms using kd-trees needed more than ∗ memory overhead. This is mainly due to the use of hier- 30GB of main memory for such models [SBB 06]. As com- archical acceleration structures for efficient ray-intersection pared to ray-strips [LYM07], the combination of of Re- tests. These hierarchies affect the performance in several duceM and Strip-RT improves the frame rate by 50 − 80% ways: First, the memory overhead of hierarchical accelera- and at the same time reduces the overall storage complex- tion structures can be high in addition to that of scene primi- ity by 20 − 30% (i.e. up to 3 GB for massive models). In tives. For example, hierarchies (e.g., kd-trees) optimized for practice, our approach can ray trace such massive models at run-time performance according to the surface area heuris- interactive rates on desktop workstations. tic (SAH) [GS87, MB90, Hav00], tend to have more nodes 2. Related work and may take almost as much memory as the triangle primitives. Second, hierarchy traversal for ray tracing can havea The development of memory-efficient representations has a complex memory access pattern. Given the block-based data long history in graphics. In this section, we briefly survey fetching memory hierarchy, the working set size of the ray related work in the context of ray tracing and mesh repre- tracer can thus be high. Finally, ray tracers typically require sentations. random access to the primitives since any triangle can in- tersect with the ray during traversal. Therefore, triangles are Ray tracing large models: Ray tracing of large data commonly stored in an indexed triangle list or similar repre- sets has been an active research area. There are many in- sentation, which can have large memory overhead in contrast core algorithms that use large, shared memory systems ∗ to more efficient storage methods used in rasterization. to render these data sets [SBB 06, DSW07] while using standard ray tracing representations. Other methods to im- The combination of large working set size and non-local prove the performance of out-of-core ray tracing include memory access pattern during the hierarchy traversal signif- reordering rays [PKGH97, DGP04, EMAM07] and latency icantly affects the overall performance of ray tracing. When hiding [WDS04]. However, these methods may not re- the model and the acceleration structure fit in the main mem- duce the storage overhead. Many level-of-detail algorithms ory, ray tracing has been shown to be very cache coherent for ∗ [CLF 03, WDS04, YLM06] have been proposed to reduce in-core operation and ray packets [Wal04]. However, when the working set size of ray tracing. However, they may re- the model and its hierarchy do not fit into main memory, sult in visual artifacts. Other work has concentrated on re- its performance is dominated by the slow disk I/O perfor- ducing the size of the ray tracing hierarchy by either quan- mance [WDS04, YLM06]. c 2008 The Author(s) Journal compilation c 2008 The Eurographics Association and Blackwell Publishing Ltd.

Reducem: Interactive and Memory Efficient Ray Tracing of Large Models

Memory Interference Characterization Between CPU Cores and Integrated Gpus in Mixed-Criticality Platforms

A Cache-Efficient Sorting Algorithm for Database and Data Mining

How to Solve the Current Memory Access and Data Transfer Bottlenecks: at the Processor Architecture Or at the Compiler Level?

Prefetching for Complex Memory Access Patterns

Eth-28647-02.Pdf

Detecting False Sharing Efficiently and Effectively

A Hybrid Analytical DRAM Performance Model

Performance Analysis of Cache Memory

Classifying Memory Access Patterns for Prefetching

STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud

Embedded Memory Architecture for Low-Power Application Processor

A Visual Performance Analysis Tool for Memory Bound GPU Kernels