Real-Time Re-Textured Geometry Modeling Using Microsoft Hololens

Real-Time Re-Textured Geometry Modeling Using Microsoft HoloLens Samuel Dong* Tobias Hollerer¨ † University of California, Santa Barbara ABSTRACT information to build a custom experience based on the structure of We implemented live-textured geometry model creation with im- the surrounding environment, such as rendering decals or choosing mediate coverage feedback visualizations in AR on the Microsoft where units spawn. HoloLens. A user walking and looking around a physical space can While the HoloLens does scan the geometry of the environment, create a textured model of the space, ready for remote exploration it does not automatically apply color detail at all. The mesh only and AR collaboration. Out of the box, a HoloLens builds a triangle contains positions and normals. This prevents the out-of-the-box mesh of the environment while scanning and being tracked in a new HoloLens from being used for 3D scanning and other applications environment. The mesh contains vertices, triangles, and normals, but that use mesh color and/or texture for exporting or rendering. Exam- not color. We take the video stream from the color camera and use it ples of potential uses of color include scanning an object with surface to color a UV texture to be mapped to the mesh. Due to the limited texture and rendering virtual objects with real-world reflections and graphics memory of the HoloLens, we use a fixed-size texture. Since ambient color. the mesh generation dynamically changes in real time, we use an In this paper, we propose a real-time solution using the adaptive mapping scheme that evenly distributes every triangle of HoloLens’s built-in camera. Because the HoloLens is a device the dynamic mesh onto the fixed-size texture and adapts to new with limited memory, we did not want to have a memory require- geometry without compromising existing color data. Occlusion is ment that scales with the size of the environment, so we utilize a also considered. The user can walk around their environment and fixed-size texture. This texture operates as a standard UV texture continuously fill in the texture while growing the mesh in real-time. map, making exporting and integrating into existing applications We describe our texture generation algorithm and illustrate bene- very straightforward. We show the results of modeling both a small fits and limitations of our system with example modeling sessions. and large environment to show the effects of the adaptive texture Having first-person immediate AR feedback on the quality of mod- and consider the time required to capture these environments and eled physical infrastructure, both in terms of mesh resolution and the quality of the outcome. texture quality, helps the creation of high-quality colored meshes with this standalone wireless device and a fixed memory footprint in 2RELATED WORK real-time. Interactively rendering realistic image-based models of the real environment has been a research topic for a while. Early research Index Terms: Computing methodologies—Computer graphics— focused on using image-based techniques to extract the geometric Graphics systems and interfaces—Mixed / augmented reality; Com- data and provide a virtual model. With the advent of depth cameras puting methodologies—Computer graphics—Image manipulation— and laser scanning, and the popularity of structure-from-motion tech- Texturing niques [5], acquiring geometric data is extremely practical. Color data can be acquired by taking pictures of the geometry from var- 1INTRODUCTION ious angles, and the camera position and orientation is sometimes Microsoft released the HoloLens, a standalone augmented reality available as well. headset with a stereoscopic 3D display, in early 2016. Equipped Newcombe et al.’s KinectFusion [12] shows building a dense with an infrared depth sensor and an inertial measurement unit, the geometric model of interior environments in real-time using an HoloLens continuously builds up a model of the environment it inexpensive depth sensor. This has been improved upon in Whelan is in, allowing objects that were positioned to remain fixed in the et al.’s Kintinuous paper [16], allowing for growing environments location no matter how far or where the user moves to. Due to its and conversion to a triangular mesh. standalone nature, it is meant to be worn as the user walks around Once a model has been built, image-based rendering still needs their environment, expanding the stored model as they go. It splits its to be done to make them look realistic. Debevec et al. [4] improved processing into three units: the CPU, GPU, and HPU, or holographic upon their previous view-dependent image-based rendering tech- processing unit. This allows the HoloLens to dedicate hardware to nique by introducing a real-time method for rendering geometric continuously track and model the environment while also supporting data from a set of images and poses. In their paper, the best im- intensive 3D graphical applications. ages were chosen by the current view angle and projected onto the The environment is scanned using the infrared laser and sensor, geometry. Because projective texturing colors geometry no matter generating an internal point cloud. The HoloLens automatically its visibility, the occlusion was dealt with by clipping the geometry processes the point cloud and converts it into a triangular mesh. This with occluding volumes. Buehler et al. [2] introduced unstructured mesh is used in a variety of manners, ranging from showing a 3D lumigraph rendering, a combination of view-dependent image-based cursor to providing anchor points for 3D objects. The mesh is also rendering with light field techniques to support arbitrary view lo- commonly used to provide occlusion for virtual objects, making cations and directions. It factors in the resolution of each view them feel truly positioned in the real space. Games can also use this and creates a blend of the best views based on their quality. Chen et al. [3] uses KinectFusion to acquire a high-quality mesh from *e-mail: samuel [email protected] the scene along with image frames. After some offline processing, †e-mail: [email protected] view-dependent texture projection is used to render the scene while factoring in occlusion. Care is taken to balance smooth transitions by 2018 IEEE Conference on Virtual Reality and 3D User Interfaces blending many frames and preserving view-dependent features, such 18-22 March, Reutlingen, Germany as specular highlights and reflections, by selecting fewer frames. 978-1-5386-3365-6/18/$31.00 ©2018 IEEE These methods render the model in real-time with highly accurate 231 Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on March 09,2021 at 05:23:03 UTC from IEEE Xplore. Restrictions apply. results, but they rely on a provided model and a dense set of images. If the real-time constraint is lifted, further optimizations for tex- 0 2 4 6 ture quality can be performed. Bernardini et al. [1] used both the 1 3 5 7 depth and color information when blending together multiple views based on a weight calculated on the confidence of individual pixels. Views were also aligned by utilizing features detected in the color 8 10 12 14 information to prevent ghosting. Similarly, Zhou and Koltun [17] reduced artifacts by optimizing the camera location and orientation 9 11 13 15 and image distortion based on photometric consistency. This allows for better texturing when geometric errors or lens distortion cause 16 18 20 misalignment. Maier et al. [11], on the other hand, improve vi- sual quality by merging multiple lower resolution color frames into 17 19 21 higher resolution ones with some filtering. This allows the texture to have higher resolution data for an object than what is provided by the original RGB camera frame. Using RGB-D (color and depth) information, Fechteler et al. [7] were able to texture a provided articulated model with the Kinect. The articulated mesh provides UV coordinates for a texture file. The images come from a continuous stream of RGB data. Because not Figure 1: Example layout with 22 triangles. The numbers show where all triangles are visible in any one frame, a set of frames are kept the triangles with specific global indices would be mapped to. The based on their angle and resolution. Each triangle references only size is 4, which holds a maximum of 32 triangles. one frame, and frames that do not provide much data are discarded to save memory. The Kinect has also been used to generate both color and geome- texture the set of meshes, UV coordinates must be provided for each try data. By using RANSAC and feature extraction to align views, vertex in every mesh. While there are many automated methods of Henry et al. [8] use frame to frame RGB-D data to construct a sparse unwrapping a 3D mesh onto a UV map, we chose a simple, efficient surface patch array. It also detects loops and closes them. Their method to cope with regular changes in geometry as the HoloLens method runs in real time and provides a compact representation of is building up the environment. indoor environments, but must store a set of keyframes for loop detection. Instead of using image features, Kerl et al. [9] use pho- 3.1 Mapping Triangles to a Fixed-Size Texture tometric error between all pixels in subsequent frames to provide Our method first assigns each triangle in the set of meshes a global higher quality pose estimation. By selecting keyframes and closing index. We iterate over the dictionary in order by GUID, which loops, accumulated drift between frames is minimized. Steinbrucker¨ ensures a consistent ordering of the meshes. Each mesh has an et al. [15] extend on that paper to use the tracking data to merge internal ordering of triangles.

Load more