J. Vis. Commun. Image R. 21 (2010) 427–441
Contents lists available at ScienceDirect
J. Vis. Commun. Image R.
journal homepage: www.elsevier.com/locate/jvci
A quality controllable multi-view object reconstruction method for 3D imaging systems
Wen-Chao Chen a, Hong-Long Chou b, Zen Chen a,* a Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu 300, Taiwan b Altek Corporation, Science-Based Industrial Park, Hsinchu, Taiwan article info abstract
Article history: This paper addresses a novel multi-view visual hull mesh reconstruction for 3D imaging with a system Received 2 July 2009 quality control capability. There are numerous 3D imaging methods including multi-view stereo algo- Accepted 22 February 2010 rithms and various visual hull/octree reconstruction methods known as modeling from silhouettes. Available online 21 April 2010 The octree based reconstruction methods are conceptually simple to implement, while encountering a conflict between model accuracy and memory size. Since the tree depth is discrete, the system perfor- Keywords: mance measures (in terms of accuracy, memory size, and computation time) are generally varying rapidly 3D imaging system with the pre-specified tree depth. This jumping system performance is not suitable for practical applica- Modeling from silhouettes tions; a desirable 3D reconstruction method must have a finer control over the system performance. The Octree model XOR projection error proposed method aims at the visual quality control along with better management of memory size and System performance computation time. Furthermore, dynamic object modeling is made possible by the new method. Also, Dynamic modeling progressive transmission of the reconstructed model from coarse to fine is provided. The reconstruction Progressive transmission accuracy of the 3D model acquired is measured by the exclusive OR (XOR) projection error between the Multi-camera system pairs of binary images: the reconstructed silhouettes and the true silhouettes in the multiple views. Inter- esting properties of the new method and experimental comparisons with other existing methods are reported. The performance comparisons are made under either a comparable silhouette inconsistency or a similar triangle number of the mesh model. It is shown that under either condition the new method requires less memory size and less computation time. Ó 2010 Published by Elsevier Inc.
1. Introduction can provide high quality images by interpolation, the viewing an- gle is limited by the initial camera positions. The three-dimensional imaging technology is an emerging re- In contrast to view interpolation, multi-view 3D modeling ap- search topic for capturing, processing and displaying the true 3D proaches construct the 3D geometry of the scene and, therefore, information of a scene object. The next generation of 3D imaging are more suitable for applications in the holographic 3D display systems allows spectators to view from any desired viewpoint, just or 3DTV systems, which require to render views from all direc- like in the real world. Two common multi-view solutions to the tions, not just the in-between views. The 3D mesh with texture next generation of 3D imaging systems are: (1) novel view synthe- mapping is the most common approach for producing the photore- sis and (2) multi-view 3D modeling [1–5]. The main challenges of alistic objects. In recent years, there has been an increasing amount the emerging 3D imaging systems include 3D model reconstruc- of literature on image-based 3D modeling. Multi-view stereo algo- tion, video transmission rate, and 3D free viewpoint rendering, etc. rithms were proposed to reconstruct the high quality 3D model The stereoscopic display based on view synthesis is a practical from images captured at multiple viewpoints [9–14]. Seitz et al. approach for 3D imaging nowadays. The depth-image-based ren- [15] used high quality multi-view datasets as the benchmark to dering (DIBR) methods for free viewpoint TV (FTV) were intro- evaluate the performances of different reconstruction algorithms duced in [3–5]. Virtual images were rendered from a small based on accuracy and completeness of the reconstructed objects. number of views with 3D warping. Many researchers also utilized Multi-view stereo algorithms are generally based on photometric video interpolation technique to synthesize novel views without consistency measurement which is a time consuming procedure. building the 3D shape [6–8]. Although these kinds of methods On the other hand, visual hull is an alternative approach to 3D modeling using multi-camera systems [16–19]. Franco and Boyer [21] addressed the exact polyhedral visual hull reconstruction by * Corresponding author. Fax: +886 3 573 1875. E-mail addresses: [email protected] (W.-C. Chen), hlchou@altek. cutting and joining the visual rays passing through silhouettes. com.tw (H.-L. Chou), [email protected] (Z. Chen). Despite the complexity in joining the visual ray segments, the
1047-3203/$ - see front matter Ó 2010 Published by Elsevier Inc. doi:10.1016/j.jvcir.2010.03.004 428 W.-C. Chen et al. / J. Vis. Commun. Image R. 21 (2010) 427–441 resultant visual hull is highly accurate in terms of silhouette con- processing time of our method including the conversion of sistency. However, the constructed polyhedral visual hull tends the octant-base octree to a surface mesh representation is to produce ill-formed triangles (irregular and narrow planar sur- faster than the conventional octree method with the stan- face strips) [35]. Liang and Wong presented a simple and efficient dard marching cubes method for generating the final surface way to compute the exact visual hull vertices by the exact intersec- mesh model (the ‘‘Conv + MC” method) and the method of tion computation which replaces the interpolation value used in generating an exact visual hull from marching cubes based the conventional marching cubes (MC) approach [35]. This exact on the standard octree model (the ‘‘Conv + ExMC” method visual hull from marching cubes was reported to significantly im- or simply the ExMC method) by a factor from 2 to 3. Further- prove the quality of the reconstructed visual hull, comparable to more, the reconstructed visual hull contains a triangle num- that of the polyhedral approaches and requiring less computational ber for the surface representation which is less than that of time. Nevertheless, the method demands a pre-specified subdivi- the existing methods. sion level number for the octree reconstruction. If the level number (3) The proposed method can generally achieve better accuracy, is too small, the octants generated intend to have a larger volume, while spending less computation time through the introduc- resulting in lower accuracy, while if the level number is too large, tion of new octant types to the conventional octree ones. requiring a tremendous computation time and memory space. On (4) Due to the application of exact marching cubes the recon- the other hand, the parallel execution of the voxel-based visual structed visual hull mesh is relatively smooth, so it does hull reconstruction is feasible. In [25] Ladikos et al. proposed a not need further surface smoothing for texture mapping. method making use of CUDA to perform the reconstruction by using kernels which perform the projection of every voxel and The rest of the paper is organized as follows. Section 2 briefly gather the occupancy information. They also showed a real-time describes the framework of a proposed multi-view capturing and 3D reconstruction system which used the GPU-based visual hull processing integrated system for generating the 3D texture computation algorithm. There is a website where the information mapped objects. Section 3 presents useful properties of the new on CUDA is available [26]. Starck et al. [22] matched surface fea- reconstruction method with regard to its system performance. In tures between views to modify the visual hull and applied a global Section 4 the experimental setup is described. Synthetic and real optimization approach to do a dense reconstruction. Vlasic et al. objects of different geometry complexity are used in the visual hull [23] and de Aguiar et al. [24] acquired a template body mesh which reconstruction. The system performance measures of the proposed could be generated by a laser scanner and then tracked the motion reconstruction method are given. Comparisons of our reconstruc- sequences by deforming the non-rigid body mesh shape to match tion method with other existing methods are provided. Conclu- faithfully with a video stream of silhouettes. sions and further research directions are given in Section 5. This paper proposes a novel octree reconstruction method. The conventional approach often faces with a conflict between model 2. The proposed system accuracy and memory size. Since the tree depth or the subdivision level is discrete, these system performance measures (accuracy, To fulfill the requirements of FTV systems or 3DTV applications, memory size, and computation time) are varying rapidly. To rem- a multi-view capturing and processing system generates the edy this drawback, we modify the conventional method to attain photorealistic appearance of 3D dynamic objects from multi-view much finer control over the system performance. In the new meth- images based on some image-based reconstruction algorithm. od the visual quality of the octree reconstructions is controlled by a Fig. 1 shows the overall system organization. Yemez and Schmitt specified exclusive OR (XOR) projection error upper bound; the also presented a similar efficient approach to progressively trans- resultant XOR projection error reflects the inconsistency between mit the octree structure and then perform triangulation on the the binary projected silhouette of the reconstructed object and fly [38]. In our system, the preprocessing module first comprises the true object silhouette in each image. The introduction of new camera calibration, background estimation, and color normaliza- types of octants in the new reconstruction method indicates a mix- tion, etc. After capturing a synchronized video sequence, the sys- ture of protrusions and indents on the reconstructed object surface tem applies background subtraction for 3D model reconstruction, which is no longer a bounding volume of the true object. Both use- and finally the multi-view texture mapping is derived for photore- ful properties and computer simulations of the system perfor- alistic display of the 3D object. Only octree structure information mance are presented. Furthermore, dynamic object modeling is and multi-view images are transmit from a server to clients. Here, made possible based on the new method. Also, progressive trans- the focus is placed on the extension of the conventional octree mission of the reconstructed model with an increasing degree of reconstruction and the application of exact marching cubes to con- model accuracy is provided. Since the proposed method is fast vert the octant-based volumetric representation to surface mesh and has the dynamic and progressive nature, together with a con- representation of the object under reconstruction. Also, dynamic trollable system performance measure, the method is suitable to modeling and progressive transmission based on the new method be incorporated into 3D imaging systems. Comparisons between are to be presented. Texture mapping and Z buffer will be imple- the proposed reconstruction method and other existing methods mented with the commercial graphic cards. are made to illustrate the merits of the new method. The main features of our work include 2.1. A multi-view capturing system (1) The proposed method does not enforce a maximum tree depth for the octant generation process; instead an XOR pro- We implement a multi-view capturing system consisting of jection error upper bound parameter is imposed. This eight IEEE 1394b synchronized color cameras which are connected parameter value selection depends roughly on the intended to eight PCs. The PCs are synchronized with an IEEE 1394a daisy level of detail. Therefore, the user has some sort of control chain. The system configuration is shown in Fig. 2. Only a single over the visual reconstruction quality. computer command is needed to trigger the capturing process (2) Under a comparable XOR projection error (i.e., silhouette through the communication interface mechanism. The cameras inconsistency) constraint the proposed octree reconstruc- capture a video sequence at a rate of 30 fps with a frame resolution tion method is faster than the conventional octree recon- of 1024 768. We also set up a blue screen-like studio of dimen- struction method by a factor from 10 to 40. The total sion 3 m 6m 2 m and all the cameras are mounted on the W.-C. Chen et al. / J. Vis. Commun. Image R. 21 (2010) 427–441 429
Fig. 1. System block diagram.
Fig. 2. A multi-view capturing system. ceiling to acquire the images of the objects from different viewing 2.2. A novel octree reconstruction method with a projection error positions. Fig. 3 shows our studio configuration. control A color checker chart is used to estimate the true color informa- tion for color normalization and a custom-made calibration board In the conventional octree reconstruction a coarse-to-fine oc- is used for acquiring the projection matrix for each camera based tant subdivision process is recursively applied to a bounding root on the Bouguet’s method [28]. Although objects are captured in a cell of an object under reconstruction. During the subdivision pro- blue screen environment, foreground extraction is still not easy cess the black nodes indicate the octants lying inside the object, due to shadows casted by the objects. This paper combines the while the white nodes indicate the octants lying outside the object methods proposed in [30–31] with the manually selected thresh- and the gray nodes indicating the octant lying partially outside and old to extract foreground objects for the 3D reconstruction. partially inside the object. The octant subdivision process is
Fig. 3. Multi-view images captured. 430 W.-C. Chen et al. / J. Vis. Commun. Image R. 21 (2010) 427–441 , XM XM repeated until no gray octants remain. A typical octree representa- O T T tion of an object model reconstruction is shown in Fig. 4.In the con- ErrðO; TÞ¼ Sv Sv AreaðSv Þð2Þ ventional octree reconstruction method, the type of an octant is v¼1 v¼1 classified by checking the intersection relation between the pro- where area (ST ) is the area (or the total pixel number) of the region jected octant image and the real object silhouette in each view. v of the true silhouette. For a good object reconstruction this ratio In order to speed up the intersection test, an octant is approxi- should be much smaller than one. mated by its bounding sphere and the intersection test is executed Note that the error caused by the approximation with a bound- using precompiled signed distance maps derived from the object ing circle becomes negligible as the subdivision level increases to a silhouettes [27]. The signed Chebyshev distance map for each sil- sufficiently large final level. Also, the computation of the above houette view can be generated with a graphic card to reduce com- XOR projection error uses the actual hexagonal projected images putation time using the method proposed by Hoff et al. [32]. The of the reconstructed object octants rather than using the bounding distance value at the circle center c of the bounding circle of a pro- circles of the octants. jected octant image is denoted by DistMap(c). The radius r of the In the new octree reconstruction method a pre-specified XOR bounding circle of the projected octant image is calculated and octant projection error bound is denoted by P. There are five types then compared with DistMap(c) to determine the intersection rela- of octants at each resolution level l in the new method: B (black), tionship. A positive DistMap(c) distance value indicates the circle l,N W (white), GB (gray–black), GW (gray–white), and GG center is inside the object silhouette, and a negative distance value l,N l,N l,N l,N (gray–gray). The black and white octants are defined in the con- for a circle center being outside the silhouette, and a zero value for ventional way and the three new types of gray octants are defined a circle center being on the silhouette boundary. The spatial rela- below: tionship checking is carried out on the view-by-view basis. Table A conventional gray octant G at level l is redefined as a gray– 1 gives the intersection relationship between the projected octant l,C white (GW ) octant in the new method if its bounding circle cen- image and its associated silhouette. l,N ter c has a negative distance value, DistMap(c) < 0 and the black ex- To measure the model accuracy of the reconstructed visual hull O tent of the octant satisfies the constraint: 0 < r + DistMap(c) 6 P in O relative to the true object T, we use their binary silhouettes Sv T any particular view v 2½1; M . and S across all views (v =1,..., M) to define the following aver- v A conventional gray octant G at level l is redefined as a gray– age exclusive OR error: l,C black (GBl,N) octant in the new generalized method if its bounding 1 XM circle center c has a non-negative distance value, DistMap(c) P 0 XOR error ¼ SO ST ð1Þ M v v and the white extent of the octant satisfies the constraint: v¼1 0