3D Reconstruction Using Multiple Depth Cameras

Maximum Wilder-Smith California State Polytechnic University, Pomona [email protected]

Abstract—In this paper, we present an approach cameras, but use last gen hardware to do so [2]. ​ to creating 3D models, through the RGB-D data This means they miss out on leveraging the captured by multiple aligned Azure Kinects newer capabilities and high resolutions of the surrounding an object. To improve the quality of newer Azure cameras. In terms of the point clouds of the entire object, a reconstruction and alignment, the Open3D background subtraction method is employed to library contains fast implementations of a isolate the high fidelity point clouds from scan variety of algorithms [3]. This library features frames. We then construct an aligned 3D extensive I/O support as well as point cloud and using the point clouds’ geometry. mesh operations.

Keywords—Point cloud registration, Azure In this project, we present a 3D scanning ​ Kinects, 3D reconstruction pipeline that digitizes a real object using the RGB-D data captured by multiple depth I. INTRODUCTION cameras. In particular, we place multiple depth cameras at different angles around a real object. Creating realistic 3D models of physical objects While scanning, the depth cameras are plays an important role in a variety of real-world ​ synchronized with overlapping views to capture applications including game development, RGB-D images of the object at different computer generated art, and digital visualization. viewpoints. This helps to reduce the number of 3D scanning through depth cameras is one of the scan operations required for complete scanning popular ways to digitize an object, which allows coverage to estimate the shape and surface people to rapidly obtain the geometry and appearance of the scanned object. The result texture information of real objects, without demonstrates the effectiveness of the presented having to manually model every detail, and paint pipeline in reconstructing a 3D surface from a the textures from scratch. single scan from each camera. Much of the current research into this II. METHODOLOGY application uses the older scanners, such as the Kinect v2, or present methods based around a The proposed pipeline in this project consists of single camera. One of the most popular three main stages: initial alignment, cropping, applications for 3D reconstruction using Kinects and surface reconstruction. is the Kinfu application [1]. This program builds textured models from a single Kinect moving A. Initial Alignment through a space. While this program creates a complete textured model, it does not leverage The initial alignment stage involves taking a multiple capture angles from multiple cameras. single capture from each camera in the system. While other works such as Alexiadis, Zarpalas, We use two Azure Kinects, and each is and Daras, present systems that utilize multiple connected directly to the computer. As each

camera uses an infrared projection to register depth, there needs to be some delay between each camera’s capture to avoid interference. When testing with more cameras, it is preferable to use a daisy chain or subordinate configuration between the cameras with a preprogrammed capture offset [4]. The captured data is loaded into the Open3D library where the data is decoded into the RGBDImage structure [3].

Figure 2. Misaligned point clouds from the two cameras.

These point clouds can then be aligned using a variety of techniques. The first is Fast Global Registration [5]. This method allows a rough alignment between the point clouds using common features. The alignment transformation from this step is then passed as the initial Figure 1. RGBDImages from both cameras. The left transform matrix for the Colored ICP shows the color image captured, which right side registration stage [6]. This method of iterative shows the depth data captured. closest point registration uses the color data as well as the geometry of the two point clouds to From these RGBDImages we read each Kinect’s more accurately align the point clouds. The intrinsic matrix to obtain the transformations more overlapping and distinguishing colored needed to combine the color data and the depth objects between the two point clouds, the better data into a single colored point cloud with this registration is. Below is the output point accurate depth information. As the intrinsic cloud after performing colored ICP. matrix is slightly different for each camera, each both must be polled for the calibration data. Using the intrinsic matrix, the RGBDImages are converted into colored point clouds. These point clouds display the geometry from the depth data as a series of points in 3D space, with each point’s color being determined by the color data from the camera. As the point density for the is very high, the point clouds often appear as solid objects in the visualizer. When first loaded into the visualizer, there is no data about how the point clouds line up, meaning their geometries are misaligned. This is shown Figure 3. Point clouds are aligned after Fast Global in Figure 2. Registration and Colored ICP registration.

The alignment shown above has an accuracy of 0.5420. It is important to note that as each camera is seeing slightly different images, the two point clouds can not 100% match. The more extreme the angle and placement is between the cameras in the setup, the lower the initial alignment accuracies will be. Ensuring the scenes captured by each camera overlap a fair amount and are equally exposed as to provide similar colors between the point clouds, helps to achieve a more accurate alignment. After aligning the point clouds, we merge them into Figure 4. Bounding box around the total capture one point cloud by combining the points and performing a small voxel down sampling to ensure duplicated points are removed. If the setup has more than two cameras, the underlying alignment and merging process would be continued using the merged point cloud and the initial capture from each successive camera. This is repeated until a transformation matrix is calculated for each camera and the merged point cloud contains the sum of the captures from all of the cameras. If the scanning setup requires cameras to be moved to capture different angles of the subject without the use of additional cameras, then the transformation matrices are discarded, and this alignment process can be repeated. Figure 5. Bounding box moved around the target B. Cropping

The next stage is to isolate the subject from the Once the bounding box has been closed around background. To this end, a bounding box is the subject being scanned, the point cloud can be needed to be generated around the subject. This cropped with the built in method. This removes bounding box can be found by performing object the background and additional point cloud points segmentation, or by manually closing in on the that were only needed to get the alignment of the subject. For the following example manually cameras. This leaves only the subject's point subject selection is used as it is more reliable for cloud, and some floor base around it. complex objects and incomplete point clouds.

To begin selecting the object a bounding box is generated around the entire point cloud. The user can then move and scale this bounding box to close in on the subject.

Once the point cloud has been cleaned up, we begin surface reconstruction to generate a triangle mesh based on the colored point cloud. We investigate three different surface reconstruction algorithms based on the Open3D library, i.e., Poisson reconstruction [7], Alpha Shape reconstruction [8], and Ball Pivot reconstruction [9]. Depending on the scanning situation certain methods may work better than others. Below is a comparison for the previous scans.

Figure 6. The cropped point cloud

Cropping the point cloud can be useful if the subject is spinning or on a turntable in which case further alignment can be performed between captures, that ignores the features and alignment of the background, and just aligns the captures of the subject (See Discussion A.). Figure 8. Poisson reconstruction

C. Surface Reconstruction

The final stage of the reconstruction is creating a solid surface from the point cloud. Once the subject has been isolated through cropping, the pipeline will perform post-processing on the pointcloud to remove outlier points as well as any extra or internal points. It is often useful to calculate the normal of the point cloud data for the surface reconstruction algorithms

Figure 9. Alpha Shape reconstruction

Figure 7. Cleaned up point cloud which has been further cropped. Figure 10. Ball Pivot reconstruction

One can find that for the above example, for reconstruction scenarios where there are Poisson produces a smooth watertight mesh at chaotic changes in the scene such as moving the cost of surface resolution. Alpha Shape was objects. unable to compute an efficient convex mesh resulting in an unusable mesh. Ball Pivot was C. Parallelization able to produce the highest resolution mesh of This final extension of the program would be a the point cloud, though it had missing faces and series of optimizations to allow the program to took an incredibly long time to calculate. more efficiently utilize multi-core processors III. DISCUSSION and compute clusters. While the core library, Open3D, already performs some parallelized A. Turntable Sequence Processing optimizations in visualization and reconstruction techniques, this milestone would seek to turn The next milestone for this program will be this program into a more scalable solution for turntable sequence processing. Following the 3D reconstruction. cropping stage, the bounding box as well as the transformation matrices for the cameras will be IV. CONCLUSION saved for quick use in aligning a capture sequence. This extension of the program is In this work, we propose a 3D scanning pipeline designed for scanning scenarios in which the for 3D digitization of a real object using subject is turning in front of the cameras and a multiple depth sensing cameras. The proposed fixed number of cameras remain stationary. By pipeline presents a scalable configuration using the already calculated transformation scheme that allows for more complete scans in a matrices to orient each camera’s capture with the single pass than with a single camera alone. By capture of the primary camera, we can quickly saving the alignment of stationary cameras, we align captures across cameras with the same perform rapid realignment of captures without time stamp. Further, by removing the the need for more additional computational backgrounds of the captures using the bounding power. This is useful for recapturing shots from box, we are left with partially aligned captures a specific camera or, as outlined in Future Work, of the subject at different rotations. These allowing for a wider variety of scanning fragments can then be aligned to each other scenarios. using multiway registration to generate a more complete model of the subject [10]. REFERENCES

B. Instant Scene Snapshot [1] S. Izadi, A. Davison, A. Fitzgibbon, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. This extension would rely on the precomputed Kohli, J. Shotton, S. Hodges, and D. Freeman, transformation matrices to provide a live, “KinectFusion,” Proceedings of the 24th annual ​ aligned point cloud visualization of the scene in ACM symposium on User interface software front of the cameras. After aligning the point and technology - UIST 11, Oct. 2011. clouds, and cropping in on the subject, the ​ program would perform captures across devices [2] D. S. Alexiadis, D. Zarpalas and P. Daras, in rapid succession to provide a rendering of the "Real-Time, Full 3-D Reconstruction of point cloud from multiple cameras at an instant Moving Foreground Objects From Multiple in time. The user would then have the ability to Consumer Depth Cameras," in IEEE save an instance, like taking a photo on a Transactions on Multimedia, vol. 15, no. 2, pp. camera, however the 3D scene would be 339-358, Feb. 2013, doi: processed and exported as a 3D model of the 10.1109/TMM.2012.2229264. scene at that instance. This would be best suited

[3] Q.-Y. Zhou, J. Park, & V. Koltun (2018). Open3D: A Modern Library for 3D Data ProcessingarXiv:1801.09847.

[4] . Azure Kinect document. available at https://docs.microsoft.com/en-us/azure/kinect-d k/ [07/14/2020]

[5] Q.-Y. Zhou, J. Park, and V. Koltun, Fast Global Registration, ECCV, 2016.

[6] J. Park, Q.-Y. Zhou, and V. Koltun, Colored Point Cloud Registration Revisited, ICCV, 2017.

[7] M. Kazhdan and M. Bolitho and H. Hoppe: Poisson surface reconstruction, Eurographics, 2006.

[8] H. Edelsbrunner and D. G. Kirkpatrick and R. Seidel: On the shape of a set of points in the plane, IEEE Transactions on Information Theory, 29 (4): 551–559, 1983

[9] F. Bernardini and J. Mittleman and HRushmeier and C. Silva and G. Taubin: The ball-pivoting algorithm for surface reconstruction, IEEE transactions on visualization and computer graphics, 5(4), 349-359, 1999

[10] S. Choi, Q.-Y. Zhou, and V. Koltun, Robust Reconstruction of Indoor Scenes, CVPR, 2015.