Efficient Visual Hull Computation for Real-Time 3D Reconstruction Using
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Visual Hull Computation for Real-Time 3D Reconstruction using CUDA Alexander Ladikos Selim Benhimane Nassir Navab Department of Computer Science, Technische Universitat¨ Munchen¨ Boltzmannstr. 3 , 85748 Garching, Germany [email protected], [email protected], [email protected] Abstract [21, 9, 17]. In [9], for instance, the silhouette contours are first discretized into line segments. Then the silhou- In this paper we present two efficient GPU-based visual ette vertices are backprojected to 3D viewing lines which hull computation algorithms. We compare them in terms are restricted to so called viewing edges lying on the visual of performance using image sets of varying size and differ- hull by clipping the lines based on their projections in the ent voxel resolutions. In addition, we present a real-time other silhouette images. The initial viewing edges are then 3D reconstruction system which uses the proposed GPU- connected by considering the intersection of the backpro- based reconstruction method to achieve real-time perfor- jections of the silhouette line segments, so that a complete mance (30 fps) using 16 cameras and 4 PCs. mesh is obtained. The accuracy of the reconstructed visual hull depends on the discretization of the silhouette contours. The computational cost increases with finer silhouette dis- 1. Introduction cretization and an increasing number of images. One short- coming of this approach is that it is susceptible to errors The visual hull [16], defined as the shape which is max- in the silhouette images and calibration errors, since then imally consistent with the silhouette projections of an ob- lines might not intersect exactly. To achieve real-time per- ject, lies at the heart of almost all 3D reconstruction systems formance a method for distributing the computations over aimed at real-time performance. Other 3D reconstruction multiple processors has been proposed [10]. methods (see [25] for a recent overview) which are targeted In the volumetric approach the visual hull is computed at producing high quality results are too computationally using a voxel representation [24, 29]. The reconstruction expensive, since they require costly optimizations over the volume is discretized into voxels and each voxel is projected shape of the object. The visual hull on the other hand is into the silhouette images. If the projection of a voxel lies straightforward to compute from a set of calibrated silhou- inside a silhouette in all images it is marked as occupied, ette images. The downside is that it does not recover con- otherwise it is marked as empty. This method is very ro- cavities which are not seen in the silhouettes. However, this bust, since it does not rely on the silhouette contours but is sufficient for many application areas (for instance body on the occupied/non-occupied image regions. Depending pose estimation [7]) since the visual hull still captures the on the application it might also be more useful to obtain a essence of most shapes. In addition, once the reconstruc- voxel representation of the visual hull, for instance to com- tion is textured the difference to the true reconstruction is pute the volume of the intersection between an object and a even less apparent. region of space. In addition, the accuracy of the reconstruc- The visual hull is computed by building the general- tion can be easily limited to the desired accuracy, which is ized cones created by the viewing rays emanating from the much harder in the case of the geometric approach, since the camera center and passing through the silhouette contours. discretization is done in image space and not in the volume. Having computed the generalized cones for every silhou- To speed this approach up octree representations of space ette image, they are intersected in 3D producing the visual are used. In addition, the computations can be distributed hull. There are two different approaches for performing this over multiple processors [27]. computation, namely the polyhedral and the volumetric ap- In [6] a method is proposed which combines both the proach. polyhedral and the volumetric approach. There are also In the polyhedral approach geometric properties are other approaches [22, 18, 19, 20] which do not compute the used to compute a mesh representation of the visual hull visual hull explicitly, but only render images of the visual 1 hull from a given viewpoint. 2. GPU-based Visual Hull Computation The voxel-based visual hull computation is a perfect ex- In the last decade many systems for 3D reconstruction ample of a highly parallel algorithm. Therefore, it is well mostly using the visual hull have been proposed. One early suited to be implemented on the GPU. However, there are system which was using stereo instead of the visual hull was still different design choices for performing the computa- the CMU dome [23]. However, the computations were per- tions. We evaluated two different approaches. In the first formed offline. Cheung et al. [7] present a voxel-based sys- approach we precompute the bounding boxes of the voxel tem which achieves a frame rate of 16 fps using 5 cameras projections in each image and store them in a lookup ta- at a voxel resolution of 643. Borovikov et al. [4, 5] propose ble which is used during the visual hull computation. In a system using a voxel-based reconstruction method. They the second approach we downsample the images so that a report frame rates of about 10 fps using 14 cameras at a vol- voxel approximately projects into one pixel. This allows us ume resolution of 643. Arita et al. [3] present a system with to only project the voxel center point to obtain the corre- a resolution of 1003 and 14 fps. Wu et al. [30, 31] propose sponding pixel in the image. a system which is using a plane intersection test for the re- The first approach is most useful when the camera con- construction. They report values of 12 fps with 9 cameras. figuration is static which is the case for a 3D reconstruction system. If on the other hand one just wants to compute the More recently there have been several systems which are visual hull once for a given camera configuration, the over- capable of running at frame rates of 30 fps. Franco et al. head in terms of memory and time that the precomputation [10] parallelize the polyhedral approach over a cluster of 8 requires, make this approach less attractive. The second ap- dual Xeon 2.66 GHz PCs to achieve frame rates of 30 fps proach does not perform any precomputations and is there- using four cameras. Allard et al. [1, 2] also use the poly- fore suited for both cases. hedral reconstruction algorithm in a system with 11 dual- We implemented both an octree version and a non-octree Xeon 2.6 GHz PCs. They report frame rates of 30 fps using version of each approach. Using an octree representation 6 cameras. Soares et al. [27] compute the visual hull of an 8 of space speeds up the computations, but limits the size of image data set at octree level 8 in 16.82 ms using a PC with the reconstruction volume to cubes with power of 2 side 8 AMD Opteron dual core 2.2 GHz CPUs. Hasenfratz et al. lengths. The non-octree version also allows to compute the [13, 14] also report frame rates of 30 fps with four cameras 3 visual hull on non-cubic volumes but requires a longer com- using a voxel-based method (resolution 64 ) implemented putation time. on the GPU of a SGI Onyx IR-3000 with 8 R12000 proces- sors. 2.1. Precomputation-based Algorithm One thing all these systems have in common is that they In this algorithm, called GPU1, we first compute the are using very powerful and expensive hardware to achieve bounding boxes of the voxel projections in each image and real-time frame rates. We on the other hand would like store them in a lookup table. This is done in an offline step. to achieve equivalent results using more readily available The memory required by the lookup table is proportional to hardware. Specifically we are using only four PCs with the number of images and the voxel resolution used. For the 2.6 GHz Quad-Core CPUs and Geforce 8800 GTX graph- octree version we compute a lookup table for every octree ics adapters to obtain 30 fps using 16 cameras. We achieve level. The lookup table is copied onto the GPU during an this by performing the visual hull computation on the GPU. initialization step. Previous GPU-based methods make use of hardware tex- For every new set of silhouette images, we first com- turing. Hasenfratz et al. [13, 14] use the GPU to map the pute the corresponding integral images. This allows us to silhouette images as textures onto slices through the recon- efficiently evaluate the number of occupied pixels given a struction volume and intersect them using image blending. bounding box in the image by using only four memory ac- Hornung et al. [15] suggest to project every voxel into the cesses instead of looking at every pixel in the bounding box. images using a fragment shader and texture mip-mapping. The time required to compute the integral images is signifi- We propose a method making use of CUDA [8] to perform cantly shorter than the overhead of looking at every pixel in the reconstruction by using kernels which compute the pro- the bounding box. All the before mentioned steps are per- jection of every voxel and accumulate the occupancy infor- formed on the CPU. To speed them up, the computations mation.