High Quality Interactive Rendering of Massive Point Models Using Multi-Way Kd-Trees
Total Page:16
File Type:pdf, Size:1020Kb
High Quality Interactive Rendering of Massive Point Models using Multi-way kd-Trees Prashant Goswami Yanci Zhang Renato Pajarola Enrico Gobbetti University of Zurich Sichuan University University of Zurich CRS4 Switzerland China Switzerland Italy goswami@ifi.uzh.ch [email protected] [email protected] [email protected] Abstract—We present a simple and efficient technique for sized nodes. Since all blocks are equal sized, on-board out-of-core multiresolution construction and high quality visu- memory management is particularly simple and effective. alization of large point datasets. The method introduces a novel The presented approach results in fast pre-processing and hierarchical LOD data organization based on multi-way kd-trees that simplifies memory management and allows controlling the high quality rendering of massive point clouds at hundreds LOD tree’s height. The technique is incorporated in a full end- of millions of splats/second on modern GPUs, improving to-end system, which is evaluated on complex models made of the quality vs. performance ratio compared to previous large hundreds of millions of points. point data rendering methods. Keywords-point based rendering; gpu-based; multi-way kd- tree; II. RELATED WORK While the use of points as rendering primitives has been I. INTRODUCTION introduced as early as in [1], [2], only over the last decade they have reached the significance of fully established ge- Modern 3D scanning systems can generate massive data ometry and graphics primitives, see also [3]–[5]. Several sets with hundreds of millions of points. Despite the ad- techniques have since been proposed for improving upon vances in point-based modeling and rendering, these models the display quality, LOD rendering, as well as for efficient can still be a challenge to be processed and in particular out-of-core rendering of large point models. We concentrate rendered with high quality at interactive frame rates. Massive here on discussing only the approaches most closely related models are tackled with level-of-detail (LOD) and out- to ours. For more details, we refer the reader to the survey of-core techniques. However, since modern GPUs sustain literature [5]–[7]. very high primitive rendering rates, a slow CPU based QSplat [8] has for long been the reference system in adaptive data selection process can easily lead to starvation massive point rendering. It is based on a hierarchy of of the graphics pipeline. Hence the CPU-GPU bottleneck bounding spheres maintained out-of-core, that is traversed becomes the prominent challenge to be addressed as a whole, at runtime to generate points. This algorithm is CPU bound, from the choice of data structures used for preprocessing because all the computations are made per point, and CPU- and rendering the models, to the design of the display GPU communication requires a direct rendering interface, algorithm. In recent years, this consideration has lead to the thus the graphics board is never exploited at its maximum emergence of a number of coarse-grained multiresolution performance. A number of authors have also proposed approaches based on a hierarchical partitioning of the model various ways to push the rendering performance limits, into clouds consisting of thousands of points. These methods mostly through coarse grained structures and efficient usage successfully amortize data structure traversal overhead over of retained mode rendering interfaces. rendering time of large numbers of primitives, and effec- Sequential Point Trees [9] introduced a sequential adap- tively exploit on-board caching and object based rendering tive high performance GPU oriented structure for points lim- APIs. ited to models that can fit on the graphics board. XSplat [10] In this paper, we further improve the state-of-the-art in and Instant points [11] extend this approach for out-of-core the domain with a novel end-to-end system for out-of-core rendering. XSplat is limited in LOD adaptivity due to its multiresolution construction and high quality visualization sequential block building constraints, while Instant points of large point datasets. We exploit the properties of multi- mostly focuses on rapid but moderate quality rendering of way kd-trees to make high quality rendering more GPU raw point clouds. Both systems suffer from a non-trivial im- oriented. The level-of-detail (LOD) tree, constructed bottom plementation complexity. Layered point clouds (LPC) [12] up using a fast high-quality simplification method, is fully and Wand et al.’s out-of core renderer [13] are prominent balanced, has controllable depth, and contains uniformly examples of high performance GPU rendering systems based on a hierarchical model decomposition into large sized uniform data distribution also implies wasted cache memory blocks maintained out-of-core. LPC is based on adaptive due to irregular sizes of nodes. BSP subdivision, and subsamples the point distribution at Although kd-trees can be constructed so as to be balanced each level. In order to refine a LOD, it adds points from the and symmetric, they too have their limitations. Since the next level at runtime. This composition model and the pure fanout factor is two, leaf nodes of the hierarchy cannot be subsampling approach limits the applicability to uniformly made to all contain the desired number of points. Using sampled models and produces moderate quality simplifica- a fixed split strategy at the middle of the bounding box tion at coarse LODs. In [14] these limitations are removed by produces wildly different VBO sizes, leading to the same making all BSP nodes self-contained and using an iterative problems encountered with octrees. By adaptively moving edge collapse simplification to produce node representations. the split plane so as to always produce equal sized children We propose here a faster high quality simplification method can make the situation more controlled (e.g., see [12]). based on adaptive clustering that provides more control over However, even in this case, VBO sizes can vary by as much the LOD tree height through the use of N-ary trees. Wand et as 50%. al.’s approach [13] is based on an out-of-core octree of grids, The challenge, thus, is how a simple to implement hierar- and deals primarily with grid based hierarchy generation and chical multiresolution data structure can be designed so as editing of the point cloud. This method’s limitation is in the to have an even and constant point distribution among all quality of lower resolutions created by the grid no matter its nodes, while being fully balanced, spatially selective and how fine it is. parametrized by a desired target VBO size. In our work, we All the mentioned pipe-lines for massive model rendering propose to reach this goal by exploiting the properties of create coarser LOD nodes through a simplification process. multi-way kd-tree, i.e., kd-trees with a fanout factor of N at Some systems, e.g. [12], [13], are inherently forced to use each level (see also Figure 1). fast but low-quality methods based on pure subsampling or In order to maintain symmetry, at each level a node is grid-based clustering. Others, e.g. [10], [14], can use higher divided along its longest axis into N children containing quality simplification methods, see also [15]. In this context, equal number of points each. If the original model has n we propose a fast high quality technique which combines points, s is the target VBO size and m the number of clustering, greedy selection, and delayed point combination. leaf nodes in the tree then m = n=s. For a given N, The method produces high quality simplifications on large l = dlog m= log Ne is the maximum level or depth of the non-uniform point clouds. multi-way kd-tree. r2 r3 r4 r5 III. MULTIRESOLUTION DATA STRUCTURE E A. Multi-way kd-Tree A I M F Our goal is to produce a system which is at the same time B J efficient, both in rendering and pre-processing, and simple to N implement. We follow the approach of organizing the system r1 around a hierarchical coarse grained structure maintained G K O out-of-core. Most LOD point rendering approaches use C either a kd-tree or a regular octree for basic hierarchical H P data organization. D L The main benefits of octrees are that they subdivide the 3D data uniformly in space and are simple to implement. r1 However, octrees also have a number of drawbacks, espe- cially when considering GPU rendering constraints. First of r2 r3 r4 r5 all, octrees are inflexible, because of their fixed fanout factor and space partitioning. Due to this, no direct control over the number of internal nodes is possible. Octrees can thus A B C D E F G H I J K L M N O P be highly imbalanced and therefore, deeper than necessary. Figure 1. Multi-way kd-tree example for N = 4. Each of the leaf regions Moreover, due to the strict spatial subdivision strategy, the contains almost the same number of points. number of points in a (leaf) node and hence the number of nodes itself cannot be controlled directly. This leads to With the proper choice of parameters s and N (see next suboptimal distribution of points per octree node given a section) one can construct an efficient multi-way kd-tree target VBO size, which in turn leads to increased number of such that each node contains approximately s points. The context switches at runtime. Most on-board caching schemes value of N decides the trade-off between LOD adaptivity make use of similar sized cache blocks and thus a non- and traversal efficiency. For larger N, the tree height will be lower and hence traversal will be faster.