On-The-Fly Dense 3D Surface Reconstruction for Geometry-Aware Augmented Reality
Total Page:16
File Type:pdf, Size:1020Kb
On-the-fly Dense 3D Surface Reconstruction for Geometry-Aware Augmented Reality by Long Chen Faculty of Science & Technology Bournemouth University Supervised by Prof. Wen Tang, Prof. Nigel W. John and Prof. Jian J. Zhang A thesis submitted in partial fulfilment of the requirements of Bournemouth University for the degree of Doctor of Philosophy Oct. 2018 Copyright Statement This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and due acknowledgement must always be made of the use of any material contained in, or derived from, this thesis. Abstract Augmented Reality (AR) is an emerging technology that makes seamless connections between virtual space and the real world by superimposing computer-generated information onto the real-world environment. AR can provide additional information in a more intuitive and natural way than any other information-delivery method that a human has ever in- vented. Camera tracking is the enabling technology for AR and has been well studied for the last few decades. Apart from the tracking problems, sensing and perception of the surrounding environment are also very im- portant and challenging problems. Although there are existing hardware solutions such as Microsoft Kinect and HoloLens that can sense and build the environmental structure, they are either too bulky or too expensive for AR. In this thesis, the challenging real-time dense 3D surface reconstruction technologies are studied and reformulated for the reinvention of basic position-aware AR towards geometry-aware and the outlook of context- aware AR. We initially propose to reconstruct the dense environmental surface using the sparse point from Simultaneous Localisation and Map- ping (SLAM), but this approach is prone to fail in challenging Minimally Invasive Surgery (MIS) scenes such as the presence of deformation and surgical smoke. We subsequently adopt stereo vision with SLAM for more accurate and robust results. With the success of deep learning technology in recent years, we present learning based single image re- construction and achieve the state-of-the-art results. Moreover, we pro- posed context-aware AR, one step further from purely geometry-aware AR towards the high-level conceptual interaction modelling in complex AR environment for enhanced user experience. Finally, a learning-based smoke removal method is proposed to ensure an accurate and robust re- construction under extreme conditions such as the presence of surgical smoke. 4 Contents 1 Introduction 6 1.1 Background . 6 1.2 Main Challenges . 7 1.3 Research Aims and Contributions . 9 1.4 Structure of the Following Chapters . 10 1.5 List of Publications . 13 2 Research Topic Classification, Trend Analysis and Technology Re- view 15 2.1 Automatic Classification of AR using Data-Mining . 15 2.1.1 Data Source . 17 2.1.2 Selection Criteria . 17 2.1.3 Text Mining . 17 2.1.4 Topic Generation . 18 2.2 AR Trend Analysis . 20 2.2.1 Applications Trends . 20 2.2.2 Technologies Trends . 22 2.3 Review of Enabling Technologies for AR . 23 2.3.1 Interaction . 23 2.3.1.1 Gesture-based Interfaces . 24 2.3.1.2 Haptic Devices . 25 2.3.1.3 Other Hand Held Controllers . 27 2.3.1.4 Brain-Computer Interfaces . 28 i 2.3.2 Display Applications . 28 2.3.2.1 Head Mounted Displays . 28 2.3.2.2 Mobile Displays . 31 2.3.2.3 Spatial Augmented Reality . 32 2.3.3 Mobile AR . 36 2.3.3.1 Hand-Held Displays . 37 2.3.3.2 Smartphone and Tablet Applications . 38 2.3.4 Tracking . 40 2.3.4.1 Marker-based Tracking . 40 2.3.4.2 Markerless Tracking . 43 2.3.5 Registration Techniques . 46 2.4 Conclusion . 51 3 Problem Statement & Literature Review 52 3.1 Problem Statement . 52 3.2 Research Hypothesis . 54 3.3 Camera Tracking for AR . 54 3.3.1 Feature-based 2D Tracking . 54 3.3.2 SLAM-based 3D Tracking . 55 3.4 3D Dense Surface Reconstruction . 57 3.4.1 Stereo Depth Estimation . 57 3.4.2 Monocular Depth Estimation . 58 3.4.3 DCNNs based Monocular Depth Learning . 58 3.4.4 Unsupervised Monocular Depth Learning . 59 3.5 Summary . 59 4 Monocular-based Online Dense Surface Reconstruction for GA-AR 61 4.1 Introduction . 61 4.2 Methodology . 65 4.2.1 Introducing of the Surface Coordinate . 67 4.2.2 Monocular Endoscopic Camera Tracking and Mapping . 68 ii 4.2.2.1 Initialization . 69 4.2.2.2 Training of Data Sets . 69 4.2.2.3 Parameter Tuning and Increasing Surface Points . 71 4.2.3 Intra-operative 3D Surface Reconstruction . 71 4.2.3.1 Pointcloud Pre-processing . 73 4.2.3.2 Moving Least Square Point Smoothing . 73 4.2.3.3 Poisson Surface Reconstruction . 74 4.3 Results . 75 4.3.1 System Setup . 75 4.3.2 Ground Truth Study using Simulation Data . 75 4.3.2.1 Camera Trajectory Evaluation . 77 4.3.2.2 3D Surface Reconstruction Evaluation . 79 4.3.3 Real Endoscopic Video Evaluation . 80 4.4 Discussion . 82 4.5 Conclusions . 85 5 Stereo-based Online Global Surface Reconstruction for GA-AR 86 5.1 Introduction . 86 5.2 Methods . 88 5.2.1 Landmark Point Detection and Triangulation . 89 5.2.2 Frame by Frame Camera Pose Estimation . 90 5.2.3 Keyframe-based Bundle Adjustment . 91 5.2.4 ZNCC Dense Stereo Matching . 92 5.2.5 Incremental Building of Geometric Mesh . 93 5.3 Results and Discussion . 93 5.3.1 System setup . 93 5.3.2 Ground Truth Study using Simulation Data . 95 5.3.3 Real Endoscopic Video Evaluation . 97 5.4 Conclusions . 99 iii 6 Learning-based Monocular Image Depth Estimation and 3D Re- construction 100 6.1 Introduction . 100 6.2 Novelty Compared to Previous Work . 102 6.3 Method . 103 6.3.1 Framework Overview . 103 6.3.2 Depth Synthesis Network . 104 6.3.3 Warping-based Stereo View Reconstruction . 106 6.3.4 Disparity-guided Patch Sampling . 106 6.3.5 Loss Function Construction . 107 6.3.5.1 Patch Matching Loss . 108 6.3.5.2 View Reconstruction Loss . 110 6.3.5.3 Disparity Smoothness Loss . 110 6.3.5.4 Disparity Consistency Loss . 110 6.3.6 Confidence Estimation Network . 111 6.4 Experiments . 112 6.4.1 Implementation Details . 112 6.4.2 KITTI dataset . 113 6.4.3 Results . 114 6.4.3.1 Quantitative Evaluation . 114 6.4.3.2 Qualitative Evaluation . 115 6.4.3.3 Confidence Map Evaluation . 116 6.4.3.4 Reconstruction Results . 116 6.5 Discussion . 118 7 From Geometry-Aware AR to Context-Aware AR 119 7.1 Introduction . 119 7.2 Previous Work . 123 7.2.1 Geometry-based MR Interaction . 123 7.2.2 Deep Semantic Understanding . 124 iv 7.2.3 Context and Semantic awareness in XR environment . 125 7.3 Framework Overview . 125 7.3.1 Input Sensor . 126 7.3.2 Camera Tracking & Reconstruction Stream . 126 7.3.3 Context Detection & Fusion Stream . 127 7.3.4 Interactive MR Interface . 127 7.4 implementation . 128 7.4.1 Camera Tracking and Model Reconstruction . 128 7.4.2 Deep Learning for Material Recognition . 129 7.4.3 Bayesian Fusion for 3D Semantic Label Fusion . 130 7.4.4 3D Structural CRF Label Refinement . 132 7.4.5 Interaction Interface . 133 7.5 Example Applications . 134 7.6 Experimentation . 136 7.6.1 Accuracy Study . 136 7.6.2 User Experience Evaluation . 138 7.6.2.1 Participants . 139 7.6.2.2 Results . 140 7.6.3 User Feedback . 141 7.7 Conclusion and Discussion . 141 8 Increase Tracking and Reconstruction Robustness { Learning-based Image Smoke Removal 143 8.1 Introduction . 143 8.2 Related Work . 145 8.2.1 Atmospheric Scattering Model . 145 8.2.2 Dark Channel Prior based de-smoking . ..