University of Wollongong Research Online Faculty of Engineering and Information Sciences - Faculty of Engineering and Information Sciences Papers: Part A

2015 Creating Simplified 3D oM dels with High Quality Textures Song Liu University of Wollongong, [email protected]

Wanqing Li University of Wollongong, [email protected]

Philip O. Ogunbona University of Wollongong, [email protected]

Yang-Wai Chow University of Wollongong, [email protected]

Publication Details Liu, S., Li, W., Ogunbona, P. & Chow, Y. (2015). Creating Simplified 3D Models with High Quality Textures. 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015 (pp. 264-271). United States of America: The Institute of Electrical and Electronics Engineers, Inc..

Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Creating Simplified 3D oM dels with High Quality Textures

Abstract This paper presents an extension to the KinectFusion algorithm which allows creating simplified 3D models with high quality RGB textures. This is achieved through (i) creating model textures using images from an HD RGB camera that is calibrated with Kinect depth camera, (ii) using a modified scheme to update model textures in an asymmetrical colour volume that contains a higher number of than that of the geometry volume, (iii) simplifying dense mesh model using quadric-based mesh decimation algorithm, and (iv) creating and mapping 2D textures to every polygon in the output 3D model. The proposed method is implemented in real-Time by means of GPU parallel processing. Visualization via ray casting of both geometry and colour volumes provides users with a real-Time feedback of the currently scanned 3D model. Experimental results show that the proposed method is capable of keeping the model texture quality even for a heavily decimated model and that, when reconstructing small objects, photorealistic RGB textures can still be reconstructed.

Disciplines Engineering | Science and Technology Studies

Publication Details Liu, S., Li, W., Ogunbona, P. & Chow, Y. (2015). Creating Simplified 3D Models with High Quality Textures. 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015 (pp. 264-271). United States of America: The nI stitute of Electrical and Electronics Engineers, Inc..

This conference paper is available at Research Online: http://ro.uow.edu.au/eispapers/5447 Creating Simplified 3D Models with High Quality Textures

Song Liu, Wanqing Li, Philip Ogunbona, Yang-Wai Chow Advanced Multimedia Research Lab University of Wollongong, Wollongong, NSW, Australia, 2522 {sl796,wanqing,philipo,caseyc}@uow.edu.au

Abstract—This paper presents an extension to the KinectFu- information, which would increase the model complexity and sion algorithm which allows creating simplified 3D models with lower the rendering efficiency. In many cases, it is necessary high quality RGB textures. This is achieved through (i) creating to simplify a dense 3D model to achieve higher rendering model textures using images from an HD RGB camera that efficiency, especially for large scale rendering or on platforms is calibrated with Kinect depth camera, (ii) using a modified with limited processing powers such as cell phones and tablets. scheme to update model textures in an asymmetrical colour It is noteworthy that for many 3D models generated from volume that contains a higher number of voxels than that of the geometry volume, (iii) simplifying dense polygon mesh model existing 3D reconstruction systems, the quality of model tex- using quadric-based mesh decimation algorithm, and (iv) creating ture is directly related to the model complexity. Furthermore, and mapping 2D textures to every polygon in the output 3D decimation of the polygon mesh models will degrade the model model. The proposed method is implemented in real-time by texture. means of GPU parallel processing. Visualization via ray casting of both geometry and colour volumes provides users with a real- time feedback of the currently scanned 3D model. Experimental II.RELATED WORK results show that the proposed method is capable of keeping The problem of reconstructing geometry and texture of the model texture quality even for a heavily decimated model and that, when reconstructing small objects, photorealistic RGB real world has remained an active challenge in the field of textures can still be reconstructed. computer vision for decades. We now review some of the extant aproaches and the associated results. I.INTRODUCTION Conventional 3D reconstruction approaches usually do not Generating 3D models based on real-world environments consider model texture information, or represent model texture and with high quality textures is of great significance to many in a simple way. Chen and Medioni [5] average overlapping fields including civil engineering, 3D printing, game design, range images and connect points based on simple surface movie, virtual reality and preservation of cultural heritage topology to create polygon mesh models; model textures are artefacts. Various computer vision-based approaches have been totally ignored. Turk and Levoy [23] propose mesh zippering proposed to create 3D models and deal with the associated as an extension to Chen and Medioni’s work; they stitch classical problems such as simultaneous localization and map- polygon meshes to create 3D model without textures. Some ping (SLAM), and structure-from-motion (SFM). To date, point-based 3D reconstruction methods [20][24][12][22] use impressive progress has been made in this domain [6][2][8]. simple unstructured point representations that are directly Largely, many approaches use visual key points to build 3D captured from many range imaging devices. These methods models leading to sparse point cloud based 3D reconstruction. do not model connected surfaces which usually requires post Conventional dense reconstruction method [9][28] on the other processing to generate . In most popular point-based hand usually require professional sensors such as high-fidelity 3D model rendering techniques [11][19][30], textures are sim- laser scanners or time-of-flight (ToF) depth cameras which are ply represented by colours attached to each point in the model. very expensive. The release of low-cost RGB-D cameras like the Microsoft The release of commodity RGB-D cameras such as the Kinect™ and Asus Xtion™ opens up new opportunities to Microsoft Kinect™ and Asus Xtion™ has made dense 3D 3D reconstruction in terms of providing easy access to depth reconstruction possible at an affordable cost. This, along with imaging. KinectFusion [17][15] adopts volumetric data struc- the KinectFusion algorithm [17][15] , has enabled real-time ture to store reconstructed scene surface [7][14] and realise dense 3D reconstruction using a low-cost RGB-D camera real-time reconstructions using GPU. Although model textures and GPU parallel processing. Subsequent efforts by other re- are not considered in the original KinectFusion algorithm, it searchers have led to the development of several KinectFusion- inspires multiple volumetric 3D reconstruction methods using based methods [26][21][4][18] that allow efficient 3D re- commodity RGB-D cameras that try to create dense polygon construction on a large scale and with higher reconstruction mesh with RGB textures. Among these KinectFusion-based quality. However, current KinectFusion-based methods tend methods, an open source C++ implementation of KinectFusion to deliver 3D models with high quality geometry but low from Point Cloud Library (PCL) [1], Whelan et al. [26] and quality texture. In other words, the works on improving model Bylow et al. [4] use a colour volume to store and update textures are less advanced. Moreover, 3D models created by RGB texture information. In their methods, model textures on dense 3D reconstruction usually contain significant redundant reconstructed 3D models are represented by colours on model vertices, and these colours are linearly interpolated within each and extended. Given updated geometry and colour volumes, polygon. This popular 3D model texture representation can 3D polygon mesh model with textures can be extracted using also be easily found in many other 3D reconstruction methods marching cube algorithm [16]; a method that aims to produce [21][25][18]. While the texture representation is straightfor- polygons for visualization. ward and easy to implement, simplifying the model inevitably degrades the texture quality because it is determined by the IV. IMPROVED METHOD number of vertices in the model. The workflow of our improved method is shown in Fig1. Zhou and Koltun [29] reconstruct 3D models with high The similarity with the KinectFusion process is noticeable. quality colour textures using a commodity depth camera and However, in order to create simplified 3D models with high an HD RGB camera. HD model textures are refined using quality textures, the following major improvements are made: optimized camera poses in tandem with non-rigid correction functions for all images. Because model textures are also • HD RGB camera is added to achieve higher model represented by colours assigned to each vertex, the generation texture quality; of high quality textures requires increasing the number of • Colour volume integration scheme is revised to pro- vertices and polygons of a 3D model. This results in increased vide an asymmetrical colour volume; model complexity. • Dense polygon mesh model is simplified without In order to create simplified 3D models, Whelan et al. [27] losing much geometry information; decimate dense polygon mesh model by means of planar • Simplified model is textured and details are retained. simplification and simultaneously preserve model colour infor- mation by 2D . However, their work focuses In subsequent subsections each major improvement is pre- on scene reconstruction where flat floors and walls tend to be sented in more detail. over represented by millions of polygons. For 3D models of ar- bitrary shapes, planar simplification is not suitable. Moreover, A. HD Camera Setup and RGB-D Calibration their texture preserving method does not consider generating HD RGB textures with resolution beyond what a Kinect RGB The on-board RGB camera of the Kinect sensor can only camera currently provides. return RGB images in VGA resolution (640 × 480) and this insufficient for generating high quality textures. This short- In this paper, a KinectFusion-based method for creating coming is mitigated, in order to achieve high quality texture simplified 3D models with high quality textures is presented. mapping, by rigidly attaching an external HD RGB camera to Texture information is stored and updated in a colour volume the Kinect sensor assembly. The resulting assembly delivers with a higher dimension than that of the truncated signed high-definition RGB images (see Fig2). distance function (TSDF) volume. Two-dimensional texture images are extracted from colour volume and are mapped to reconstructed 3D models so that model texture can retain its quality even on a simplified 3D model with much fewer number of polygons.

III.BACKGROUND Our method is based on the open source C++ implemen- tation of the KinectFusion algorithm provided by PCL [1]. In KinectFusion, retrieved RGB image and registered depth map Fig. 2. Kinect sensor with HD RGB camera are taken as inputs. A vertex map and map pyramid based on input depth map is calculated; the value is then The HD RGB image and depth map are misaligned due used to estimate the camera motion using iterative closest to the different placements and field of views of the HD point (ICP) algorithm [3] in conjunction with a predicted RGB camera and the depth camera of Kinect sensor. Hence surface vertex map. The map is derived from currently stored high quality texture mapping will depend on accurate mapping depth information. With estimated camera motions, depth and relation between HD RGB camera space and Kinect depth RGB information from the input frame is used to update the camera space. current model. In the work of KinectFusion, scanned depth Herrera C’s RGB-D calibration toolbox [13] is adopted to information is stored in a TSDF volume, in which each calculate camera intrinsic and extrinsic parameters. Camera stores the distance to its closest surface. Since TSDF volume intrinsic parameters include camera principal point (image contains geometric information of scanned 3D model and centre) and focal length in pixel-related units which are used to is updated using depth map, in our method, it is referred project points between image space and camera space. Camera to as the geometry volume. The predicted surface vertex intrinsic parameters can be compactly represented in a matrix, map used for camera tracking is obtained by the K ∈ 3×3. Let the coordinate vector in image space be geometry volume. The original KincetFusion does not consider R denoted by (x, y, z)T and the coordinate vector in the camera reconstructing model textures. Its C++ implementation in PCL space be (X,Y,Z)T . The transformation from camera space capture and save texture information using a separate colour to image space can be written as volume that has the same dimension and size as the TSDF volume. In our paper, this colour integration method is adopted (x, y, z)T = K(X,Y,Z)T . (1) Fig. 1. Block diagram showing the workflow of improved method.

Camera extrinsic parameters describe the relative position of where v ∈ N3 is the 3D coordinate representing the location the camera to another object (another camera in our case), and of a voxel in the colour volume. Each voxel of the colour can be written as a rotation plus a translation. The extrinsic volume stores a 3 × 1 RGB colour vector Cn(v) and a weight parameters from Kinect depth camera space to HD RGB Wn(v). 3×3 camera space are denoted by rotation matrix Rcalib ∈ R 3 When updating the colour volume, for each voxel v in the and translation vector tcalib ∈ R . Let the intrinsic parameters of Kinect depth camera and HD RGB camera be donoted as colour volume, its 3D coordinates (Xv,Yv,Zv) in the depth K ∈ 3×3 and K ∈ 3×3 respectively. camera space of the first frame is considered. Given all camera c R hd R motions between consecutive frames so far, by synthesizing Given calibrated camera intrinsic and extrinsic parameters, all the motions, the camera motion (Rn, tn) from the first a mapping from depth image to HD RGB image is calculated frame to the current nth frame can be calculated. Therefore, as follows. Assume a point in real world is captured by the coordinates of this voxel in the current camera coordinate the depth camera whose coordinate in depth image space system, (Xvc,Yvc,Zvc), can be calculated as is (xc, yc, zc). Let its 3D coordinate in depth camera space (X ,Y ,Z )T = R (X ,Y ,Z )T + t . (5) be denoted as (Xc,Yc,Zc); the 3D coordinate in HD RGB vc vc vc n v v v n camera space as (Xhd,Yhd,Zhd); and its coordinate in HD Using the depth camera intrinsic matrix Kc, 3D point RGB image space as (xhd, yhd, zhd). The equations governing (Xvc,Yvc,Zvc) can be mapped to its depth image coordinates the respective transformations can be written as follows: using the equation, T −1 T (Xc,Yc,Zc) = Kc (xc, yc, zc) ; (2) T T (xvc, yvc, zvc) = Kc(Xvc,Yvc,Zvc) . (6) T T (Xhd,Yhd,Zhd) = Rcalib(Xc,Yc,Zc) + tcalib; (3) Based on the 2D coordinate (xvc, yvc), if voxel v is outside the T T current depth camera frustum, the updating process terminates (xhd, yhd, zhd) = Khd(Xhd,Yhd,Zhd) . (4) and the algorithm moves on to update the next voxel. If it is The mapped 2D coordinate (xhd, yhd) in HD RGB image will inside the current depth camera frustum, its actual depth value th be considered when updating colour volume. Dn(xvc, yvc) is retrieved from the n depth map Dn. If D (x , y ) is non-zero, the valid depth map pixel B. Colour Volume Integration n vc vc (xvc, yvc,Dn(xvc, yvc)) and its 3D point (Xvd,Yvd,Zvd) in In the implementation of KinectFusion from PCL, RGB the current camera coordinate system are calculated as texture information is saved in a separate volume but similarly (X ,Y ,Z )T = K−1(x , y ,D (x , y ))T . (7) dimensioned and sized as the TSDF geometry volume. This es- vd vd vd c vc vc n vc vc tablishes one-to-one correspondence between voxels in colour If the Euclidean distance between (Xvc,Yvc,Zvc) and volume and geometry volume. In our method, modifications (Xvd,Yvd,Zvd) is greater than a threshold σ, the process are made to improve PCL’s technique to deliver higher quality terminates. In our experiments, the threshold σ is set to 20 model textures. (equivalently 20mm). Since all texture information is saved in the colour volume, The weight of a point, W , is the dot product between the a colour volume with higher dimension will result in better surface normal at this point and the viewing direction from the texture quality. As mentioned earlier, most KinectFusion-based camera to this point. The normal vector for the current point methods tend to deliver 3D models with high quality geometry (Xvd,Yvd,Zvd) is computed using neighbouring retrojected but low quality textures. Based on this observation and limited points. Assuming (X1vd,Y1vd,Z1vd) and (X2vd,Y2vd,Z2vd) GPU memory availability, a strategy to achieve higher colour are the camera coordinates of (xvc + 1, yvc,Dn(xvc + 1, yvc) volume dimension is to make the geometry and colour volumes and (xvc, yvc + 1,Dn(xvc, yvc + 1), the normal is calculated asymmetrical. Specifically, the dimension of colour volume is using the formula, made higher than that of the geometry volume while their sizes N =((X ,Y ,Z ) − (X ,Y ,Z ))× remain the same. In this way, the actual space in real world v 1vd 1vd 1vd vd vd vd (8) taken by a colour voxel is much smaller than that is taken by a ((X2vd,Y2vd,Z2vd) − (Xvd,Yvd,Zvd)). geometry voxel; this results in capturing more textures details. If the weight, W , exceeds the previous weight times 0.8 (i.e. The colour volume containing updated texture information 0.8 · Wpre), the voxel is updated. Since HD RGB frames are st th from the 1 frame to the n frame is denoted by Cn(v), used to deliver high-resolution RGB images, for colour voxel v to be updated, its corresponding pixel in HD RGB image In passing, note that the reduction rate parameter of the HD(xhd, yhd) can be located considering camera intrinsic and decimation algorithm specifies the degree to which the current extrinsic parameters from RGB-D calibration (Section IV.A). mesh model will be decimated. For instance, reduction rate Then, the new colour Cnew in this colour voxel is updated as of 10% implies that the simplified mesh model after polygon reduction will contain 10% of polygons as the original model. Cpre · Wpre + W · HD(xhd, yhd) Cnew = (9) This reduction rate can be adjusted according to the actual Wpre + W requirements of 3D model generated. where Cpre is the previous RGB value in this colour voxel and HD(xhd, yhd) is the RGB value of pixel (xhd, yhd) in the HD D. Texture Generation RGB image. The new weight, W , is updated as, new 1) 2D Texture Map Generation for Each Polygon: Com- Wnew = Wpre + W · (1 − Wpre) (10) pared with a dense polygon mesh, a simplified model contains much less polygons. Texturing by coloured vertex method will Finally, the new RGB value Cnew and the new weight Wnew incur losses of texture details which in turn reduces the texture are saved in the colour voxel v. quality. To maintain texture quality on a model with less polygons, each polygon should be textured with more RGB C. Mesh Extraction and Simplification details. To this end, different 2D texture maps are required to be generated and mapped to all polygons in a mesh model; The marching cube algorithm is used to extract 3D polygon one 2D texture map for one polygon. mesh models from TSDF volume in KinectFusion. However, dense polygon mesh models obtained directly from marching cube algorithm tend to contain a large number of redundant polygons, which can increase the model complexity without providing much geometry details. When memory and ren- dering efficiency are considered, simplification of the recon- structed dense polygon mesh is necessary in many circum- stances. In order to achieve model complexity reduction the quadric-based mesh decimation is adopted[10]. Fig.3 shows the results of simplifying a dense chair model. The original dense chair model (Fig. 3(a)) is decimated to contain different numbers of polygons. It can be observed that all simplified models (see Fig. 3(b) - Fig. 3(f)) can still keep their shape as a chair. The results indicate that quadric-based mesh decimation algorithm is effective in simplifying polygon mesh model while preserving geometric features. (a) (b)

Fig. 4. Generating 2D texture map, each block represents one pixel in the 2D texture map. (a): a polygon in the model to be textured. (b): a 2D texture map for the polygon in (a).

In the reconstructed 3D models, each polygon is a triangle with 3 vertices. For an arbitrary polygon in a 3D model, assume its 3 vertices are (V1,V2,V3) (Fig 4(a)); V1,V2,V3 are points with 3D space coordinates in real world. When generating a 2D texture map (Fig 4(b)) for this particular (a) (b) (c) polygon, the upper left triangle in the texture map (4VaVbVc in Fig 4(b)) will be mapped onto this polygon. The mapping relations are defined as follows: Va 7→ V1; Vb 7→ V2; and Vc 7→ V3 The resolution of the texture image is calculated as the actual size of polygon divided by a given pixel size parameter Sizepixel. The value of Sizepixel represents the size of a pixel when mapped onto a 3D model, which should be set according to the HD RGB camera resolution and scanning distance. In w × h (d) (e) (f) Fig. 4(b), the resolution of 2D texture map is pixels, where Fig. 3. Simplification of a chair model using quadric based mesh decimation. w = distance(Va,Vb)/Sizepixel (11) (a): original dense model containing 137037 polygons. (b): simplified model containing 13703 polygons (10%). (c): simplified model containing 6851 h = distance(Va,Vc)/Sizepixel (12) polygons (5%). (d): simplified model containing 1370 polygons (1%) (e): simplified model containing 685 polygons (0.5%). (f): simplified model distance(Vi,Vj) is the Euclidean distance between 3D coordi- containing 343 polygons (0.25%). nates Vi and Vj. In order to complete the texture generation process, two • texture improvements from using 2D texture mapping −−−−−→ −−−−→ unit vectors (VaVgreen, VaVblue) are required; compared with using coloured vertex; and −−−−−→ −−→ V V = V V /(w − 1), (13) • texture improvements from using HD RGB images a green a b compared with using Kinect RGB images. −−−−→ −−→ V V = V V /(h − 1). (14) a blue a c The test cases selected include a lunch bag and a backpack, which are small and medium-sized objects with textures of For any pixel in the texture map, its 3D world position can noticeable details (e.g. brand logo). In our experiment, the be expressed as a combination of these two vectors. If the pixel dimension of geometry volume is 384 × 384 × 384 and the marked with a red bounding box is considered for example, −−−−→ asymmetrical colour volume is set to 768×768×768. Volume its 3D world position can be expressed as V + V V . This −−−−−→ −−−−→a a red size is variable to suit the size of the object to be reconstructed. is equivalent to Va + 2VaVgreen + 3VaVblue. Given the 3D world positions, the RGB value of any pixel can be found by To obtain a visual evaluation of texture quality, recon- accessing the colour volume. By repeating the process above structed 3D models with textures are loaded and visualized for every pixel in the texture map, a complete 2D texture map in MeshLab. A snapshot from a fixed view point is taken for can be obtained. each model. An objectve evaluation of our algorithm measures the level of texture quality. For each test case, image patches 2) 2D Texture Map Merging: A regular 3D model (includ- are cropped from the same location where patterns containing ing simplified ones) usually contains hundreds or thousands detailed texture could be found (e.g. brand logo). An image of polygons. The generation of 2D textures for each polygon patch from high quality textures tend to be sharp, so that more will result in the production of very large number of 2D texture details can be captured. Therefore, image sharpness of texture maps. Loading each texture map one at a time is the image patch is measured as the indicator of texture quality time consuming and required a speedup strategy. An efficient using gradient magnitude; that is, the sum of all gradient norms method is to load multiple textures onto one or a few 2D divided by number of pixels. A higher gradient magnitude images. This strategy is implemented in 2D texture merging implies more sharpness and more texture details. scheme. 2D texture image that are supposed to contain texture maps A. RGB-D Camera Calibration Results for multiple polygons are created. Each texture map for one Efforts to ensure correct RGB texture mapping require an polygon is added and placed in the 2D image starting from the accurate calibration between HD RGB camera and the Kinect top-left corner to bottom-right corner in a column-wise order. depth camera. An evaluation on RGB-D camera calibration is performed before our experiments. In the process of RGB-D calibration, 100 sample shots were taken which include 100 HD RGB images and 100 depth maps. These samples can be further divided into 3 groups according to the average shooting distance (distance between the camera and the chess board pattern), which include short range(0.5M-1M), middle range(1M-2.5M) and long range(2.5M-4M). In each group, there are more than 30 RGB-D image pairs that were taken from different view points of the chess board. RGB-D calibration is then performed using (a) (b) all sample shots. The accuracy of RGB-D camera calibration is ascertained Fig. 5. (a): a merged 2D texture image file. (b): a 2D texture map for one a straightforward way by visually checking the overlapped polygon. Best viewed in colour. HD RGB image and visualised depth map for misalignment. Fig.5 depicts one 2D texture image file created by merging The overlayed image can be generated by overlaying the multiple texture maps. Fig. 5(a) shows the merged texture aligned semitransparent depth map onto its corresponding HD image and Fig. 5(b) is a 2D texture map for one polygon. The RGB image (Fig.6). Judging by the overlayed images, it can upper left triangle of each 2D texture map (yellow triangle in be observed that the depth map and HD RGB images are Fig 5(b)) will be rendered on a polygon as the actual model reasonably well aligned. texture. B. 3D Model Texture Results V. EXPERIMENTSAND RESULTS 1) Improvements from using 2D texture mapping: Most Kinectfusion-based methods texture the reconstructed models The HD RGB camera used in our experiment is a Point using coloured vertex; assignment of an RGB value to each Grey GS3-U3-41C6C-C camera with an 8mm F1.4 wide angle vertex in the model. The texture of a polygon is the linear lens. Our 3D reconstruction method is implemented in C++ interpolation of 3 RGB value on its vertices, and this can lead and run on a PC with a Nvidia GTX680 GPU, 4GB graphical to loss of details due to mesh simplification. On the other hand, memory. our proposed texture method generate 2D texture maps directly Experiments are carried out on 2 test cases to verify the from the colour volume and this has maintained texture details effectiveness of our texturing method in terms of even on a simplified 3D polygon mesh. using Kinect RGB and HD RGB. For both test cases, textures generated by using HD RGB possess higher quality. Similar conclusions can also be made by checking model textures vi- sually. As shown in Fig.9 and Fig 10, the product brand logos on models (Fig. 9(b), Fig. 10(b)) with HD textures are clearer and sharper to visually. However, the logos reconstructed using Kinect RGB image are quite blurry and hard to recognise especially when the logo size is small (letters in Fig. 9(a)). (a) (b) It is also worth noting that when reconstructing small-sized Fig. 6. Overlapped HD RGB image with registered depth map objects, photorealistic textures can be achieved. Fig 11(a) and Fig 11(c) are reconstructed model textures using HD RGB image while Fig 11(b) and Fig 11(d) are the corresponding Fig.7 and Fig8 depict simplified models that are textured parts directly cropped from HD RGB images. Similar level of with these two methods. TableI presents texture qualities of details can be observed from both model texture and cropped reconstructed models as measured by gradient magnitudes of HD RGB image. their image patches. From the table, the quality of texture generated using coloured vertex drops drastically as the level of model simplification increases. However, our 2D textured mapping method is able to maintain the texture quality at a relatively high level with only a small texture degradation. Assessments based on visual check also support the texture results shown in TableI. It can be observed that texture details on coloured vertex models are greatly “washed” away by mesh simplification. Texture details such as letters and brand (a) (b) logos are difficult to distinguish on those models. On the other hand, the letters and brand logos are clear to see and Fig. 9. Textured model and cropped image patch of a lunch bag (a): texture easy to distinguish from models textured using our 2D texture using Kinect RGB. (b): texture using HD RGB. mapping. Based on the results shown, it can be verified that our proposed texture method is effective in terms of preserving texture qualities on a simplified polygon mesh model.

Image Gradient Image Gradient Patch Magnitude Patch Magnitude Test Case: Lunch Bag Coloured Vertex 2D Texture Mapping Fig 7(g) 1.1292 Fig 7(j) 2.1064 Fig 7(h) 0.6633 Fig 7(k) 1.9707 Fig 7(i) 0.0870 Fig 7(l) 1.8888 Test Case: Backpack Coloured Vertex 2D Texture Mapping (a) (b) Fig 8(g) 3.9389 Fig 8(j) 5.7487 Fig 8(h) 2.6984 Fig 8(k) 5.4861 Fig. 10. Textured model and cropped image patch of a backpack (a): texture Fig 8(i) 0.4366 Fig 8(l) 5.3571 using Kinect RGB. (b): texture using HD RGB.

TABLE I. TEXTUREQUALITIES (COLOURED VERTEX VS. 2D TEXTURE Image Gradient Image Gradient MAPPING) Patch Magnitude Patch Magnitude 2) Improvements from using HD RGB Image: HD RGB Test Case: Lunch Bag camera can deliver RGB images with a higher resolution Kinect RGB HD RGB Fig 9(a) 0.9467 Fig 9(b) 1.8462 than that from the Kinect RGB camera. Thus such a camera Test Case: Backpack is able to capture more details of real world. Based on Kinect RGB HD RGB this observation, model texture generated by using HD RGB Fig 10(a) 4.3045 Fig 10(b) 5.8296 image should also contain more texture details. To verify this assumption, the following experiment is carried out. TABLE II. TEXTUREQUALITIES (KINECT RGB VS.HDRGB) For each test case, a model textured using Kinect RGB image and a model textured using HD RGB image are gen- VI.CONCLUSIONAND DISCUSSION erated and compared. The geometry of these two models are To the best of our knowledge, our method is the first aimed exactly the same and both models are textured using 2D texture at creating simplified 3D models with high quality textures. mapping method to keep high texture qualities. To deliver high quality model texture, an HD RGB camera is TableII shows the difference of texture quality between added to work with the depth camera of Microsoft Kinect™ [10] M. Garland and P. S. Heckbert. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on and interactive techniques, pages 209–216, 1997.4 [11] M. Gross and H. Pfister. Point-Based Graphics. Morgan Kaufmann Publishers Inc., 2007.1 [12] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-d mapping: Using depth cameras for dense of indoor environments. In Proceedings of the RSS Workshop on RGB-D: Advanced Reasoning (a) (b) with Depth Cameras, 2010.1 [13] D. Herrera C., J. Kannala, and J. Heikkil. Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence,, 34(10):2058–2064, 2012.2 [14] A. Hilton, A. Stoddart, J. Illingworth, and T. Windeatt. Reliable surface reconstruction from multiple range images. In B. Buxton and R. Cipolla, editors, Proceedings of the European Conference on Computer Vision, volume 1064 of Lecture Notes in Computer Science, pages 117–126. Springer Berlin Heidelberg, 1996.1 [15] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, (c) (d) J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. Kinectfusion: real-time 3d reconstruction and interaction using a mov- Fig. 11. Textures using HD RGB camera vs. actual HD RGB image (a): HD ing depth camera. In Proceedings of the ACM symposium on User texture of a lunch bag. (b): HD image of a lunch bag. (a): HD texture of a interface software and technology, pages 559–568, 2011.1 backpack. (b): HD image of a backpack. [16] W. E. Lorensen and H. E. Cline. : A high resolution 3d surface construction algorithm. COMPUTER GRAPHICS, 21(4):163– 169, 1987.2 [17] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. sensor. Improved texture update scheme on an asymmetrical Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfu- colour volume with a higher dimension than the geometry sion: Real-time dense surface mapping and tracking. In Proceedings of volume is presented. Given an output decimated 3D model, our the IEEE International Symposium on Mixed and Augmented Reality, pages 127–136, 2011.1 2D texturing method is able to maintain a high-level texture [18] M. Niessner, M. Zollhofer, S. Izadi, and M. Stamminger. Real-time quality despite the degree of model simplification. 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics, 32(6):169:1–169:11, 2013.1,2 However, the texture quality is still limited by the dimen- [19] H. Pfister, M. Zwicker, J. van Baar, and M. Gross. Surfels: Surface sion and size of the colour volume which is again constrained elements as rendering primitives. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pages by the GPU memory. Our future work includes improving 335–342, 2000.1 texture quality especially for large scene reconstruction and [20] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3d model exploring better 2D texture map generation methods to achieve acquisition. ACM Trans. Graph., 21(3):438–446, July 2002.1 higher model rendering efficiency. [21] F. Steinbrucker, C. Kerl, D. Cremers, and J. Sturm. Large-scale multi- resolution surface reconstruction from rgb-d sequences. In Proceedings of the IEEE International Conference on Computer Vision, pages 3264– ACKNOWLEDGEMENT 3271, 2013.1,2 [22] J. Stuckler and S. Behnke. Integrating depth and color cues for dense This work is supported by Smart Services CRC, Australia. multi-resolution scene mapping using rgb-d cameras. In Proceedings of the IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 162–167, 2012.1 REFERENCES [23] G. Turk and M. Levoy. Zippered polygon meshes from range images. In Proceedings of the 21st Annual Conference on Computer Graphics [1] Point Cloud Library. http://pointclouds.org/, 2015. [Online; accessed and Interactive Techniques, pages 311–318, 1994.1 24-June-2015].1,2 [24] T. Weise, T. Wismer, B. Leibe, and L. Van Gool. In-hand scanning [2] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, with online loop closure. In Proceedings of the IEEE International and R. Szeliski. Building rome in a day. Communications of the ACM, Conference on Computer Vision, pages 1630–1637, 2009.1 54(10):105–112, 2011.1 [25] T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. [3] P. J. Besl and N. D. McKay. A method for registration of 3-d shapes. Robust real-time visual odometry for dense RGB-D mapping. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Proceedings of the IEEE International Conference on Robotics and 14(2):239–256, 1992.2 Automation, Karlsruhe, Germany, May 2013.2 [4] E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers. Real-time [26] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. Mc- camera tracking and 3d reconstruction using signed distance functions. Donald. Kintinuous: Spatially extended KinectFusion. In Proceedings In Proceedings of Robotics: Science and Systems, 2013.1 of the RSS Workshop on RGB-D: Advanced Reasoning with Depth [5] Y. Chen and G. Medioni. Object modelling by registration of multiple Cameras, 2012.1 range images. Image Vision Comput., 10(3):145–155, Apr. 1992.1 [27] T. Whelan, L. Ma, E. Bondarev, P. H. N. de With, and J. McDonald. [6] A. Chiuso, P. Favaro, H. Jin, and S. Soatto. 3-d motion and structure Incremental and batch planar simplification of dense point cloud maps. from 2-d motion causally integrated over time: Implementation. In Robotics and Autonomous Systems (RAS) ECMR ’13 Special Issue, Proceedings of the European Conference on Computer Vision, volume 2014.2 1843, pages 734–750. 2000.1 [28] Y.-Q. Yang, Q. Xiao, and Y.-H. Song. The investigation of 3d scene [7] B. Curless and M. Levoy. A volumetric method for building complex reconstruction algorithm based on laser scan data. In Proceedings ot models from range images. In Proceedings of the 23rd annual the International Conference on Machine Learning and Cybernetics, conference on Computer graphics and interactive techniques, pages volume 2, pages 819–823, 2010.1 303–312, 1996.1 [29] Q.-Y. Zhou and V. Koltun. Color map optimization for 3d reconstruc- [8] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. tion with consumer depth cameras. ACM Transactions on Graphics, An evaluation of the rgb-d slam system. In Proceedings of the IEEE 33(4):155:1–155:10, 2014.2 International Conference on Robotics and Automation, pages 1691– [30] M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Surface splatting. 1696. IEEE, 2012.1 In Proceedings of the 28th Annual Conference on Computer Graphics [9] C. Fruh and A. Zakhor. 3d model generation for cities using aerial and Interactive Techniques, pages 371–378, 2001.1 photographs and ground level laser scans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages II–31–II–38 vol.2, 2001.1 (a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

Fig. 7. Texture results of a lunch bag model (a): coloured vertex model with 13360 polygons. (b): coloured vertex model with 1336 polygons. (c): coloured vertex model with 134 polygons. (d): 2D textured model with 13360 polygons. (e): 2D textured model with 1336 polygons. (f): 2D textured model with 134 polygons. (g): image patch of (a). (h): image patch of (b). (i): image patch of (c). (j): image patch of (d). (k): image patch of (e). (l): image patch of (f).

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

Fig. 8. Texture results of a backpack model (a): coloured vertex model with 39176 polygons. (b): coloured vertex model with 3918 polygons. (c): coloured vertex model with 392 polygons. (d): 2D textured model with 39176 polygons. (e): 2D textured model with 3918 polygons. (f): 2D textured model with 392 polygons. (g): image patch of (a). (h): image patch of (b). (i): image patch of (c). (j): image patch of (d). (k): image patch of (e). (l): image patch of (f).