3D Reconstruction Using Multiple Depth Cameras

3D Reconstruction Using Multiple Depth Cameras Maximum Wilder-Smith California State Polytechnic University, Pomona [email protected] Abstract—In this paper, we present an approach cameras, but use last gen hardware to do so [2]. to creating 3D models, through the RGB-D data This means they miss out on leveraging the captured by multiple aligned Azure Kinects newer capabilities and high resolutions of the surrounding an object. To improve the quality of newer Azure Kinect cameras. In terms of the point clouds of the entire object, a reconstruction and alignment, the Open3D background subtraction method is employed to library contains fast implementations of a isolate the high fidelity point clouds from scan variety of algorithms [3]. This library features frames. We then construct an aligned 3D surface extensive I/O support as well as point cloud and using the point clouds’ geometry. mesh operations. Keywords—Point cloud registration, Azure In this project, we present a 3D scanning Kinects, 3D reconstruction pipeline that digitizes a real object using the RGB-D data captured by multiple depth I. INTRODUCTION cameras. In particular, we place multiple depth cameras at different angles around a real object. Creating realistic 3D models of physical objects While scanning, the depth cameras are plays an important role in a variety of real-world synchronized with overlapping views to capture applications including game development, RGB-D images of the object at different computer generated art, and digital visualization. viewpoints. This helps to reduce the number of 3D scanning through depth cameras is one of the scan operations required for complete scanning popular ways to digitize an object, which allows coverage to estimate the shape and surface people to rapidly obtain the geometry and appearance of the scanned object. The result texture information of real objects, without demonstrates the effectiveness of the presented having to manually model every detail, and paint pipeline in reconstructing a 3D surface from a the textures from scratch. single scan from each camera. Much of the current research into this II. METHODOLOGY application uses the older scanners, such as the Kinect v2, or present methods based around a The proposed pipeline in this project consists of single camera. One of the most popular three main stages: initial alignment, cropping, applications for 3D reconstruction using Kinects and surface reconstruction. is the Kinfu application [1]. This program builds textured models from a single Kinect moving A. Initial Alignment through a space. While this program creates a complete textured model, it does not leverage The initial alignment stage involves taking a multiple capture angles from multiple cameras. single capture from each camera in the system. While other works such as Alexiadis, Zarpalas, We use two Azure Kinects, and each is and Daras, present systems that utilize multiple connected directly to the computer. As each camera uses an infrared projection to register depth, there needs to be some delay between each camera’s capture to avoid interference. When testing with more cameras, it is preferable to use a daisy chain or subordinate configuration between the cameras with a preprogrammed capture offset [4]. The captured data is loaded into the Open3D library where the data is decoded into the RGBDImage structure [3]. Figure 2. Misaligned point clouds from the two cameras. These point clouds can then be aligned using a variety of techniques. The first is Fast Global Registration [5]. This method allows a rough alignment between the point clouds using common features. The alignment transformation from this step is then passed as the initial Figure 1. RGBDImages from both cameras. The left transform matrix for the Colored ICP shows the color image captured, which right side registration stage [6]. This method of iterative shows the depth data captured. closest point registration uses the color data as well as the geometry of the two point clouds to From these RGBDImages we read each Kinect’s more accurately align the point clouds. The intrinsic matrix to obtain the transformations more overlapping and distinguishing colored needed to combine the color data and the depth objects between the two point clouds, the better data into a single colored point cloud with this registration is. Below is the output point accurate depth information. As the intrinsic cloud after performing colored ICP. matrix is slightly different for each camera, each both must be polled for the calibration data. Using the intrinsic matrix, the RGBDImages are converted into colored point clouds. These point clouds display the geometry from the depth data as a series of points in 3D space, with each point’s color being determined by the color data from the camera. As the point density for the Azure Kinect is very high, the point clouds often appear as solid objects in the visualizer. When first loaded into the visualizer, there is no data about how the point clouds line up, meaning their geometries are misaligned. This is shown Figure 3. Point clouds are aligned after Fast Global in Figure 2. Registration and Colored ICP registration. The alignment shown above has an accuracy of 0.5420. It is important to note that as each camera is seeing slightly different images, the two point clouds can not 100% match. The more extreme the angle and placement is between the cameras in the setup, the lower the initial alignment accuracies will be. Ensuring the scenes captured by each camera overlap a fair amount and are equally exposed as to provide similar colors between the point clouds, helps to achieve a more accurate alignment. After aligning the point clouds, we merge them into Figure 4. Bounding box around the total capture one point cloud by combining the points and performing a small voxel down sampling to ensure duplicated points are removed. If the setup has more than two cameras, the underlying alignment and merging process would be continued using the merged point cloud and the initial capture from each successive camera. This is repeated until a transformation matrix is calculated for each camera and the merged point cloud contains the sum of the captures from all of the cameras. If the scanning setup requires cameras to be moved to capture different angles of the subject without the use of additional cameras, then the transformation matrices are discarded, and this alignment process can be repeated. Figure 5. Bounding box moved around the target B. Cropping The next stage is to isolate the subject from the Once the bounding box has been closed around background. To this end, a bounding box is the subject being scanned, the point cloud can be needed to be generated around the subject. This cropped with the built in method. This removes bounding box can be found by performing object the background and additional point cloud points segmentation, or by manually closing in on the that were only needed to get the alignment of the subject. For the following example manually cameras. This leaves only the subject's point subject selection is used as it is more reliable for cloud, and some floor base around it. complex objects and incomplete point clouds. To begin selecting the object a bounding box is generated around the entire point cloud. The user can then move and scale this bounding box to close in on the subject. Once the point cloud has been cleaned up, we begin surface reconstruction to generate a triangle mesh based on the colored point cloud. We investigate three different surface reconstruction algorithms based on the Open3D library, i.e., Poisson reconstruction [7], Alpha Shape reconstruction [8], and Ball Pivot reconstruction [9]. Depending on the scanning situation certain methods may work better than others. Below is a comparison for the previous scans. Figure 6. The cropped point cloud Cropping the point cloud can be useful if the subject is spinning or on a turntable in which case further alignment can be performed between captures, that ignores the features and alignment of the background, and just aligns the captures of the subject (See Discussion A.). Figure 8. Poisson reconstruction C. Surface Reconstruction The final stage of the reconstruction is creating a solid surface from the point cloud. Once the subject has been isolated through cropping, the pipeline will perform post-processing on the pointcloud to remove outlier points as well as any extra or internal points. It is often useful to calculate the normal of the point cloud data for the surface reconstruction algorithms Figure 9. Alpha Shape reconstruction Figure 7. Cleaned up point cloud which has been further cropped. Figure 10. Ball Pivot reconstruction One can find that for the above example, for reconstruction scenarios where there are Poisson produces a smooth watertight mesh at chaotic changes in the scene such as moving the cost of surface resolution. Alpha Shape was objects. unable to compute an efficient convex mesh resulting in an unusable mesh. Ball Pivot was C. Parallelization able to produce the highest resolution mesh of This final extension of the program would be a the point cloud, though it had missing faces and series of optimizations to allow the program to took an incredibly long time to calculate. more efficiently utilize multi-core processors III. DISCUSSION and compute clusters. While the core library, Open3D, already performs some parallelized A. Turntable Sequence Processing optimizations in visualization and reconstruction techniques, this milestone would seek to turn The next milestone for this program will be this program into a more scalable solution for turntable sequence processing. Following the 3D reconstruction. cropping stage, the bounding box as well as the transformation matrices for the cameras will be IV. CONCLUSION saved for quick use in aligning a capture sequence.

Load more