Mobile Augmented Reality for Semantic 3D Models -A Smartphone-based Approach with CityGML-
Christoph Henning Blut
Veröffentlichung des Geodätischen Instituts der Rheinisch-Westfälischen Technischen Hochschule Aachen Mies-van-der-Rohe-Straße 1, 52074 Aachen
NR. 70
2019 ISSN 0515-0574
Mobile Augmented Reality for Semantic 3D Models -A Smartphone-based Approach with CityGML-
Von der Fakultät für Bauingenieurwesen der Rheinisch-Westfälischen Technischen Hochschule Aachen zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften genehmigte Dissertation
vorgelegt von
Christoph Henning Blut
Berichter: Univ.-Prof. Dr.-Ing. Jörg Blankenbach Univ.-Prof. Dr.-Ing. habil. Christoph van Treeck
Tag der mündlichen Prüfung: 24.05.2019
Diese Dissertation ist auf den Internetseiten der Universitätsbibliothek online verfügbar.
Veröffentlichung des Geodätischen Instituts der Rheinisch-Westfälischen Technischen Hochschule Aachen Mies-van-der-Rohe-Straße 1, 52074
Nr. 70
2019 ISSN 0515-0574 Acknowledgments I
Acknowledgments
This thesis was written during my employment as research associate at the Geodetic Institute and Chair for Computing in Civil Engineering & Geoinformation Systems of RWTH Aachen University. First and foremost, I would like to express my sincere gratitude towards my supervisor Univ.-Prof. Dr.-Ing. Jörg Blankenbach for his excellent support, the scientific freedom he gave me and the inspirational suggestions that helped me succeed in my work. I would also like to thank Univ.-Prof. Dr.-Ing. habil. Christoph van Treeck for his interest in my work and the willingness to take over the second appraisal. Many thanks go to my fellow colleagues for their valuable ideas towards my research and the fun after-work activities that will be remembered. Last but not least, I am grateful to my family for the support and motivation they gave me. A very special thank you goes to my brother Timothy Blut for the inspiring extensive discussions about my work throughout this journey, sometimes until late into the night, that where very helpful in finding great new ideas.
Aachen, June 2019 Christoph Henning Blut II Abstract
Abstract
The increasing popularity of smartphones over the past 10 years has drastically propelled mobile technology forward, enabling innovative applications and experiences, as for example in form of mobile virtual reality (VR) and mobile augmented reality (AR). While in earlier days mobile AR systems were constructed using multiple large and costly external components carried in bulky and heavy backpacks, today low-cost off-the-shelf mobile devices, such as smartphones, are sufficient, since these provide all the necessary technology right out- of-the-box. However, the realization of highly accurate and performant systems on such devices poses a challenge, since the inexpensive parts (e.g. sensors) are often prone to inaccuracies. Many AR systems are developed for entertainment purposes, but mobile AR potentially also has further beneficial applications in more serious fields, such as archaeology, education, medicine, military, etc. For civil engineering and city planning, mobile AR is also promising, as it could be used to enhance some typical workflows and planning processes. A real-life example application is the visualization of planed building parts, to simplify planning processes and to optimize the communication between the participating decision makers. In this thesis, a concept for a mobile AR system aimed at the mentioned scenarios is presented, implemented and evaluated. For this, on the one side a suitable mobile AR system and on the other some appropriate data are necessary. A problem is that much digital 3D building data typically lacks the required spatial referencing and important additional information, like semantics or Abstract III topology. Some exceptions can be found in the construction sector and in the geographic information domain with the IFC and CityGML format. While the focus of IFC primarily lies on particular highly detailed building models, CityGML emphasizes more general, less detailed models in a broader context, thus, enabling city and room scale visualizations. A proof-of-concept system was realized on an Android-based smartphone using CityGML models. It is fully self-sufficient and operates without external infrastructures. To process the CityGML data, a mobile data processing unit consisting of a SpatiaLite database, a data importer and a data selection method, was implemented. The importer is based on a XML Pull parser which reads CityGML 1.0 and CityGML 2.0 data and writes it into the SpatiaLite-based CityGML database that is modelled according to the CityGML schema. The selection algorithm enables efficiently filtering the data that is relevant to the user at his current location from the entirety of data in the database. To visualize the data and make the information of each object accessible, a customized rendering solution was implemented that aims at preserving the object information while maximizing the rendering performance. For preparing the geometry data for rendering, a customized polygon triangulation algorithm was implemented, based on the ear-clipping method. To superimpose the physical objects with these virtual elements, a fine-grained (indoor) pose tracking system was implemented, using a combination of image- and inertial measurement unit (IMU)-based methods. The IMU is utilized to determine initial coarse pose estimates which then are optimized by the CityGML model-based optical pose estimation methods. For this, a 2D image-based door detector and a 3D corner extraction method that return accurate IV Abstract corners of the door were implemented. These corners are then used for the pose estimations. Lastly, the mobile CityGML AR system was evaluated in terms of data processing/visualization performance and accuracy/stability of the pose tracking solution. The results show that off-the-shelf low-cost mobile devices, such as smartphones, are sufficient to realize a fully-fledged self-sufficient location-based mobile AR system that qualifies for numerous AR scenarios, like the earlier described one. Zusammenfassung V
Zusammenfassung
Die zunehmende Popularität von Smartphones über die vergangenen 10 Jahre hat die mobile Technologie entscheidend vorangetrieben und ermöglicht innovative Anwendungen und Erfahrungen, wie zum Beispiel in Form von mobiler Virtual Reality (VR) und mobiler Augmented Reality (AR). Wurden zuvor für mobile AR-Systeme noch eine Vielzahl von großen und teuren externen Komponenten benötigt, die in sperrigen und schweren Rucksäcken transportiert wurden, reichen heute preiswerte, handelsübliche mobile Geräte, wie Smartphones, aus, da diese bereits alle erforderlichen Technologien beinhalten. Die Realisierung hochgenauer und performanter Systeme auf Basis solcher Geräte stellt jedoch eine Herausforderung dar, da die kostengünstigen Komponenten (z.B. Sensoren) oft zu Ungenauigkeiten neigen. Vorrangig werden mobile AR-Systeme für den Entertainment- bereich entwickelt, mobile AR hat jedoch auch ein vielversprechendes Potential in anderen Bereichen, wie beispielsweise der Archäologie, Bildung, Medizin oder dem Militär. Auch im Bauwesen und in der Stadtplanung ist mobile AR äußerst vielversprechend, da es zur Optimierung einiger typischer Arbeitsabläufe und Planungsprozesse verwendet werden könnte. Ein Beispiel für eine reale Anwendung ist die Visualisierung von geplanten Bauwerksteilen, um Planungs- prozesse zu vereinfachen und die Kommunikation zwischen den beteiligten Entscheidungsträgern zu optimieren. In dieser Arbeit wird ein Konzept für ein AR-System vorgestellt, implementiert und evaluiert, das auf die genannten Szenarien abzielt. VI Zusammenfassung
Dazu sind einerseits ein geeignetes mobiles AR-System und andererseits entsprechende Daten notwendig. Problematisch sind die benötigten, jedoch häufig fehlenden räumlichen Bezüge der digitalen 3D-Gebäudedaten und fehlende wesentliche attributive Daten, wie Semantik oder Topologie. Einige Ausnahmen finden sich im Bau- und Geoinformationssektor mit dem IFC- und CityGML-Format. Während der Fokus von IFC in erster Linie auf einzelnen, hochdetaillierten Gebäudemodellen liegt, legt CityGML den Schwerpunkt auf allgemeinere, weniger detaillierte Modelle in einem breiteren Kontext und ermöglicht so Visualisierungen im Stadt- und Raummaßstab. Ein Demonstrator wurde auf einem Android-basierten Smartphone und mit entsprechenden CityGML-Modellen realisiert. Dieser ist vollständig autark und funktioniert ohne externe Infrastrukturen. Zur Verarbeitung der CityGML-Daten wurde eine mobile Daten- verarbeitungskomponente implementiert, die aus einer SpatiaLite- Datenbank, einem Datenimporter und einer Datenselektionsmethode besteht. Der Importer basiert auf einem XML Pull-Parser, der CityGML 1.0- und CityGML 2.0-Daten liest und in die SpatiaLite- basierte CityGML-Datenbank schreibt, die nach dem CityGML- Schema modelliert ist. Der Selektionsalgorithmus ermöglicht ein effizientes Filtern der Daten abhängig von der aktuellen Position des Nutzers, sodass nur relevante Daten aus der Datenbank exportiert werden. Für die Visualisierung der Daten und Bereitstellung der Objektinformationen wurde eine spezialisierte Rendering-Lösung implementiert, die es ermöglicht die Objektinformationen zu erhalten, aber gleichzeitig die Rendering-Leistung zu maximieren. Zur Vorbereitung der Geometriedaten für das Rendering wurde ein Zusammenfassung VII angepasster Polygontriangulationsalgorithmus, basierend auf der Ear- Clipping Methode, implementiert. Um die physischen Objekte mit diesen virtuellen Elementen zu überlagern, wurde ein Lagebestimmungssystem, unter Verwendung einer Kombination von bildbasierten und inertialen Messverfahren, implementiert. Die inertiale Messeinheit (IMU) wird verwendet, um erste grobe Posen zu ermitteln, die nachfolgend durch die CityGML- Modell-basierten optischen Verfahren optimiert werden. Dafür wurden ein 2D-bildbasierter Türdetektor und ein 3D- Eckenextraktions-verfahren implementiert, um die Ecken der entsprechenden Tür präzise zurückzugeben und für die Lageschätzung zu verwenden. Schließlich wurde das mobile CityGML-AR-System in Bezug auf die Datenverarbeitungs- und Datenvisualisierungsleistung und die Genauigkeit/Stabilität des Lagebestimmungssystems evaluiert. Die Ergebnisse zeigen, dass kostengünstige Standard-Mobilgeräte wie Smartphones ausreichen, um ein vollwertiges, autarkes, standort- basiertes mobiles AR-System zu realisieren, das für zahlreiche AR- Szenarien, wie das zuvor beschriebene, geeignet ist. VIII Table of Contents
Table of Contents
Acknowledgments ...... I Abstract ...... II Zusammenfassung ...... V Table of Contents ...... VIII List of Figures ...... XI List of Tables ...... XXI List of Abbreviations ...... XXIII 1 Introduction ...... 1 1.1 Thesis Goal ...... 5 1.2 Related Work ...... 7 2 Background ...... 17 2.1 Fundamentals ...... 17 2.1.1 Transforms ...... 17 2.1.2 Camera Models ...... 26 2.2 3D Real-Time Rendering ...... 33 2.2.1 Graphics Rendering Pipeline ...... 33 2.2.2 Tessellation ...... 39 2.2.3 Graphics Libraries ...... 46 2.3 Data for Augmented Reality ...... 50 2.3.1 Geospatial Data ...... 51 2.3.2 Building Information Modeling ...... 56 2.3.3 Data Parsers ...... 59 Table of Contents IX
2.3.4 Databases...... 61 2.4 Pose Tracking ...... 67 2.4.1 Coordinate Systems for Positioning ...... 68 2.4.2 Coordinate Systems for Orientation ...... 72 2.4.3 Pose Tracking Methods ...... 74 2.4.4 Sensor Fusion ...... 83 2.4.5 Optical Pose Tracking Methods ...... 87 3 Requirement Analysis ...... 121 3.1 Mobile Device ...... 122 3.2 CityGML vs. IFC ...... 125 3.3 Data Processing Options ...... 131 3.3.1 Memory-based Option ...... 133 3.3.2 Web-based Option ...... 136 3.3.3 Local Database-based Option ...... 137 3.4 Polygon Triangulation ...... 140 3.5 Data Rendering ...... 142 3.6 Pose Tracking Methods ...... 144 3.6.1 Sensor-based Tracking ...... 144 3.6.2 Infrastructure-based Tracking ...... 151 3.6.3 Optical Tracking ...... 152 4 Implementation ...... 169 4.1 General Solution Architecture ...... 169 4.2 CityGML Viewer ...... 171 4.2.1 Data Processor ...... 172 4.2.2 Visualizing CityGML ...... 180 4.3 Pose Tracking ...... 193 4.3.1 Sensor-based Pose Tracker ...... 194 4.3.2 Optical Pose Tracker ...... 195 4.3.3 Fusion ...... 199 X Table of Contents
4.4 Android App ...... 201 5 Evaluation of AR System ...... 204 5.1 System Calibration ...... 204 5.2 Data Performance ...... 207 5.2.1 Data Processing ...... 209 5.2.2 Data Visualization ...... 215 5.3 Door Detection ...... 217 5.3.1 Image Dataset ...... 219 5.3.2 Influence of Image Resolution ...... 220 5.3.3 Influence of Environmental Conditions ...... 224 5.4 Pose Tracking ...... 228 5.4.1 Optical Pose ...... 229 5.4.2 Sensor Pose Stability ...... 230 5.5 General AR System Evaluation ...... 236 6 Conclusions ...... 238 6.1 Data Processing ...... 240 6.2 Rendering ...... 240 6.3 Door Detection ...... 241 6.4 Pose Tracking ...... 242 6.5 BIM as Extension to AR System ...... 243 Bibliography...... 246
List of Figures XI
List of Figures
Figure 1.1: Trend of AR in the last 14 years, based on search statistics from Google [5]. The values indicate the search interest relative to the highest interest in the specified time period...... 2 Figure 1.2: Concept image of outdoor AR system...... 6 Figure 1.3: Concept image of indoor AR system...... 6 Figure 1.4: Reality-virtuality continuum according to [12]...... 8 Figure 2.1: An example for a translation of a geometrical object...... 19 Figure 2.2: An example for a rotation of a geometrical object...... 20 Figure 2.3: The three DoF (yaw, pitch and roll)...... 21 Figure 2.4: An example for gimbal lock when using Euler angles...... 22 Figure 2.5: An example for scaling a geometrical object...... 24 Figure 2.6: Orthographic projection according to [40]...... 25 Figure 2.7: View frustum for a perspective projection according to [40]...... 25 Figure 2.8: The pinhole camera model according to [41]...... 26 Figure 2.9: The relationship between world coordinates and pixel coordinates according to [43]...... 28 Figure 2.10: (a) Undistorted image (left); (b) Image with pincushion distortion (middle); (c) Image with barrel distortion (right)...... 29 Figure 2.11: Camera calibration chessboard photographed from different angles...... 32 Figure 2.12: Encoded field of control points for camera calibration...... 32 XII List of Figures
Figure 2.13: General conceptual parts of the graphics rendering pipeline according to [37]...... 33 Figure 2.14: Steps of the geometry stage...... 34 Figure 2.15: The axes of the OpenGL coordinate system...... 35 Figure 2.16: A screen with a coordinate system in which the x- axis points right and the y-axis points downwards...... 38 Figure 2.17: Shapes of a polygon. (a) Convex polygon (left); (b) Concave polygon (middle); (c) Convex polygon with holes (right)...... 40 Figure 2.18: Polygon triangulated by connecting a vertex with all other vertices of the polygon, except its direct neighbours...... 41 Figure 2.19: Progress of the ear-clipping algorithm. (a) Non- triangulated polygon (top-left); (b) polygon with clipped ear (top-right); (c) polygon with two clipped ears (bottom-left); (d) three clipped ears (bottom- right) ...... 42 Figure 2.20: Two interior polygons (holes) linked to the outer polygon by edges...... 43 Figure 2.21: Delaunay triangulation with empty circumcircles...... 44 Figure 2.22: The triangulation does not meet the Delaunay condition, since the sum of angles and is larger than 180° and the circumcircle of contains the vertex ...... ...... 44 Figure 2.23: Delaunay triangulation with newly inserted point s...... 45 Figure 2.24: The gap of a concave polygon should not be closed...... 46 Figure 2.25: Example structure of a scene graph...... 49 Figure 2.26: Two boundary surfaces of a wall modelled using B- Rep...... 53 List of Figures XIII
Figure 2.27: City model of Berlin. Courtesy of Berlin Partner für Wirtschaft und Technologie GmbH [56]...... 54 Figure 2.28: An example of a DTM based on data from Aachen [58]...... 55 Figure 2.29: An example of a DSM based on data from Aachen [58]...... 56 Figure 2.30: A wall modelled using CSG...... 58 Figure 2.31: Boolean operations used to construct new objects. The top-right geometry is subtraced from the top- left geometry, resulting in the bottom-right geometry...... 58 Figure 2.32: MBRs R1 to R6 are arranged for the R-tree structure...... 65 Figure 2.33: Tree data structure of an octree. Each node has eight children. A minimum bounding cuboid is subdivided into octants...... 67 Figure 2.34: Geocentric Reference System...... 69 Figure 2.35: Geographic Reference System...... 70 Figure 2.36: Earth’s surface represented by an ellipsoid and geoid. Ellipsoidal height and geoid model can be used to calculate the orthometric height...... 72 Figure 2.37: Android’s sensor world coordinate system, according to [71]...... 73 Figure 2.38: An Android smartphone with its coordinate axes...... 74 Figure 2.39: CV is based on image analysis, which in turn in based on image processing...... 89 Figure 2.40: Original image (left); Dilation (middle); Erosion (right)...... 91 XIV List of Figures
Figure 2.41: Gaussian blur applied to an image (left) and the resulting image (right)...... 92 Figure 2.42: Original image and its schematic pixel-value- representation (left); Vertical edges detected with the Sobel operator using the kernel (middle); Horizontal edges detected with the Sobel operator using the kernel (right); ...... 94 Figure 2.43: Image created with the Canny edge detector...... 96 Figure 2.44: Sweep window to find corners based on high variations of intensity. If there are no variations in all directions, then there is no feature (case 1/left); If there are variations in one direction, then an edge has been found (case 2/middle); If there are variations in all direction, then a corner has been found (case 3/right) ...... 98 Figure 2.45: Features found by Harris corner detector...... 99 Figure 2.46: Geometry of the three-point space resection problem...... 101 Figure 2.47: Example of a fiducial marker...... 109 Figure 2.48: Miniature digital 3D model of building interior visualized using a fiducial marker...... 110 Figure 2.49: Different types of model-based tracking based on [137]...... 112 Figure 2.50: 3D wireframe model of a building...... 113 Figure 2.51: Cube with visble and hidden edges. The thin gray lines in the back are covered by the front surfaces...... 114 Figure 2.52: A projected edge of a 3D model (black), samples points (red), normals (blue) searching for edges of the door in the image (orange)...... 115 List of Figures XV
Figure 2.53: Example result of the SURF detector of [152]. The red circles represent the found interest points and their scale...... 118 Figure 2.54: Model captured using the Google Tango tablet...... 120 Figure 3.1: Required components of an AR-system ...... 121 Figure 3.2: Custom created LOD4 model of the civil engineering building of the RWTH Aachen University (Model 1)...... 129 Figure 3.3: An office of the model 1 building...... 129 Figure 3.4: Custom created LOD4 building (Model 2)...... 130 Figure 3.5: The kitchen of model 2...... 130 Figure 3.6: The living room of model 2...... 131 Figure 3.7: Part of the LOD2 model of Aachen based on data from [161] (Model 3)...... 131 Figure 3.8: Average RAM consumption for DOM and the pull parser [55]...... 134 Figure 3.9: Average loading times for DOM and the pull parser [55]...... 135 Figure 3.10: Average runtimes for typical spatial queries. The Oracle and PostGIS queries were run on an Intel Core [email protected] and 8 GB RAM. SpatiaLite was run on a Google Nexus 5 [55]...... 139 Figure 3.11: Percentage of convex and concave polygons in the three models M1, M2 and M3...... 140 Figure 3.12: Example of sensor drift for the x-axis of the accelerometers of Google Nexus 5, Sony Xperia Z2 and Google Pixel 2 XL...... 147 XVI List of Figures
Figure 3.13: Example of sensor drift for the y-axis of the accelerometers of Google Nexus 5, Sony Xperia Z2 and Google Pixel 2 XL...... 148 Figure 3.14: Example of sensor drift for the z-axis of the accelerometers of Google Nexus 5, Sony Xperia Z2 and Google Pixel 2 XL...... 148 Figure 3.15: Orientation error in indoor and outdoor areas due to the magnetometer...... 149 Figure 3.16: Position error when using GNSS...... 150 Figure 3.17: The virtual objects are shifted and do not fit the physical objects due to inaccuracies of the calculated pose...... 150 Figure 3.18: The Polemus G4 becomes increasingly inaccurate with growing distance...... 151 Figure 3.19: Mean translation error of each method when Gaussian noise is increasingly added to the 2D image points...... 155 Figure 3.20: Mean rotation error of each method when Gaussian noise is increasingly added to the 2D image points...... 155 Figure 3.21: Mean translation error of each method when increasing the number of points using an image heavily distorted with Gaussian noise...... 156 Figure 3.22: Mean rotation error of each method when increasing the number of points using an image heavily distorted with Gaussian noise...... 157 Figure 3.23: Mean translation error of each method when Gaussian noise is increasingly added to the 3D object points...... 158 List of Figures XVII
Figure 3.24: Mean rotation error of each method when Gaussian noise is increasingly added to the 3D object points...... 158 Figure 3.25: Mean translation error of each method when increasing the number of 3D object points heavily distorted with Gaussian noise...... 159 Figure 3.26: Mean rotation error of each method when increasing the number of 3D object points heavily distorted with Gaussian noise...... 160 Figure 3.27: Comparison of the mean translation error when using methods with and without RANSAC on 2D image points that include outliers...... 161 Figure 3.28: Comparison of the mean translation error when using methods with and without RANSAC on 3D objects points that includes outliers...... 162 Figure 3.29: Edge-based Pose Tracking with IMU ...... 164 Figure 3.30: An example of the edge-based pose tracking algorithm...... 165 Figure 4.1: General AR system architecture...... 169 Figure 4.2: Activity diagram of the AR system...... 170 Figure 4.3: Architecture for the CityGML viewer component of the AR system according to [55]...... 171 Figure 4.4: Example of a relationship between different classes in the database [55]...... 177 Figure 4.5: Activity diagram for the implemented selection algorithm [55]...... 178 Figure 4.6: Activity diagram for the implemented ear-clipping algorithm according to [55]...... 184 Figure 4.7: Process of connecting multiple holes (inner polygons) to the outer polygon...... 185 XVIII List of Figures
Figure 4.8: A fully triangulated LOD4 building...... 186 Figure 4.9: Hierarchy of CityGML objects in a scene graph [55]. ... 188 Figure 4.10: Ray casting-based picking using a view frustum of the perspective projection [55]...... 192 Figure 4.11: Activity diagram of the pose tracking system...... 200 Figure 5.1: The PHIDIAS markers used for the calibration process captured from different angles...... 205 Figure 5.2: Time to perform the queries Q1 - Q5 for each smartphone...... 210 Figure 5.3: Loading times for each smartphone for the positions P1, P2 and P5...... 212 Figure 5.4: Loading times for each smartphone for the positions P3 and P4...... 212 Figure 5.5: The time spent to select data and export it from the database for each smartphone in positions P1, P2 and P5...... 213 Figure 5.6: The time spent to select data and export it from the database for each smartphone in positions P3 and P4...... 214 Figure 5.7: Required time to prepare the exported CityGML data for visualization...... 214 Figure 5.8: Required time to prepare the exported CityGML data for visualization...... 215 Figure 5.9: The average draw calls that were required for the rendered scene in P1 - P5...... 216 Figure 5.10: Average FPS in the different positions for each smartphone...... 217 Figure 5.11: Setup for testing optical pose estimation...... 218 List of Figures XIX
Figure 5.12: Some examples of doors used for evaluating the door detection algorithm...... 219 Figure 5.13: Example of detected door. The red dots are corner points and the green rectangle is the correctly detected door...... 221 Figure 5.14: Example of a partially detected door...... 221 Figure 5.15: True Positive detection rate of the door detection algorithm for images with different resolutions...... 222 Figure 5.16: True Negative detection rate of the door detection algorithm for images with different resolutions...... 222 Figure 5.17: Time required to detect a door in cases using the same images in different resolutions...... 223 Figure 5.18: Time required to detect a door in cases M1 - M6 using downscaled images with a resolution of 480×360 pixels...... 226 Figure 5.19: Accuracy of the automatically derived door corner coordinates from downscaled images with a resolution of 480×360 pixels...... 227 Figure 5.20: Accuracy of automatically estimated position...... 229 Figure 5.21: Accuracy of automatically estimated orientation...... 230 Figure 5.22: Quality of relative orientation using the Rotation Vector over time...... 232 Figure 5.23: Quality of relative orientation using the Game Rotation Vector over time...... 232 Figure 5.24: Quality of relative orientation using the Gyroscope over time...... 233 Figure 5.25: Quality of relative orientation using the Rotation Vector when AR system is rotated...... 235 XX List of Figures
Figure 5.26: Quality of relative orientation using the Game Rotation Vector when AR system is rotated...... 235 Figure 5.27: Quality of relative orientation using the Gyroscope when AR system is rotated...... 236 Figure 5.28: The time that the smartphone battery lasts when using the AR framework...... 237 Figure 6.1: Door augmented using the AR system...... 239 Figure 6.2: Building augmented using the AR system...... 239 Figure 6.3: AR view using Google Tango and a custom created BIM model of the RWTH Aachen University [208]...... 245 List of Tables XXI
List of Tables
Table 2.1: Positioning/pose tracking technologies according to [74]...... 75 Table 3.1: Specifications of the three smartphones...... 124 Table 3.2: Statistics of the three models used in the project...... 128 Table 3.3: Maximum capabilities of SpatiaLite ([175])...... 138 Table 3.4: Information about the accelerometer, magnetometer and gyroscope of the Google Nexus 5...... 145 Table 3.5: Information about the accelerometer, magnetometer and gyroscope of the Sony Xperia Z2...... 146 Table 3.6: Information about the accelerometer, magnetometer and gyroscope of the Google Pixel 2 XL...... 146 Table 3.7: Comparison of corner detectors...... 168 Table 4.1: Simplications of the CityGML classes for the database schema [55]...... 173 Table 5.1: The calibration parameters of the Google Nexus 5 obtained from the calibration process using PHIDIAS in calibration C1 and calibration C2...... 205 Table 5.2: The calibration parameters of the Sony Xperia Z2 obtained from the calibration process using PHIDIAS in calibration C1 and calibration C2...... 206 Table 5.3: The calibration parameters of the Google Pixel 2 XL obtained from the calibration process using PHIDIAS in calibration C1 and calibration C2...... 206 XXII List of Tables
Table 5.4: Statistics about the test CityGML database containing the models Model 1, Model 2 and Model 3...... 208 Table 5.5: Positions P1 - P5 representing the possible locations that can occur with the AR system and the average amount of polygons and objects that the selection algorithm loaded...... 211 Table 5.6: Detection rate of the door detection algorithm using downscaled images with a resolution of 480×360 pixels...... 225 List of Abbreviations XXIII
List of Abbreviations
AAA-Model AFIS-ALKIS-ATKIS-Model AC alternating current AdV Arbeitsgemeinschaft der Vermessungsverwaltungen der Länder der Bundesrepublik Deutschland AEC architecture, engineering and construction AoA angle of arrival AR augmented reality AV augmented virtuality B3DM Batched 3D Model BIM building information modeling BLOB Binary Large Object BMVI Federal Ministry of Transport and Digital Infrastructure bpp bit per pixel B-Rep boundary representation CDT Constrained Delaunay Triangulation CityGML City Geography Markup Language COLLADA Collaborative Design Activity CPU central processing unit CSG constructive solid geometry CV computer vision DBMS database management system DBS database system DC pulsed direct current DEM digital elevation model DGPS differential global positioning system XXIV List of Abbreviations
DLT direct linear transformation DoF degrees of freedom DOM document object model DR dead reckoning DSM digital surface model DT Delaunay triangulation DTM digital terrain model ECEF Earth-centered, Earth-fixed ENU east, north, up ER Entity-Relationship ETRS89 European Terrestrial Reference System 1989 EWMA exponentially weighted moving average FOV field of view FPS frames per second GB gigabyte GDI-DE Spatial Data Infrastructure Germany GIS geographic information system glTF GL Transmission Format GML Geography Markup Language GNSS global navigation satellite system GPS global positioning system GPU graphics processing unit GUI graphical user interface HMD head-mounted display IFC industry foundation classes IMU inertial measurement unit IPS indoor positioning system IR infrared IRLS Iterative Re-weighted Least Squares List of Abbreviations XXV
JAXB Java Architecture for XML Binding JTS JTS Topology Suite KB kilobyte LOD level of detail LOS line-of-sight MB megabyte MBR minimum bounding rectangle MEMS micro electro mechanical sensor MP megapixel MR mixed reality NED north, east, down NRW North Rhine-Westphalia ODB object database OGC Open Geospatial Consortium OHMD optical head-mounted display OpenGL Open Graphics Library OpenGL ES OpenGL for Embedded Systems OS operating system OSM OpenStreetMap P3P Perspective-Three-Point PDA personal digital assistant PnP Perspective-n-Point POI point of interest RAM random access memory RANSAC Random Sample Consensus RDBMS relational database management system RE real environment RGB red, green, blue ROS region of support XXVI List of Abbreviations
RSS received signal strength SAX Simple API for XML SDK software development kit SIFT Scale-Invariant Feature Transform SIG 3D Special Interest Group 3D SLAM Simultaneous Localization and Mapping SQL structured query language SSID service set identifier SURF Speeded Up Robust Features SVD singular value decomposition TDoA time difference of arrival TIN triangulated irregular network ToA time of arrival ToF time of flight UI user interface UTM Universal Transverse Mercator UWB ultra-wideband VE virtual environment VIO visual-inertial odometry VO visual odometry VR virtual reality WFS Web Feature Service WGS84 World Geodetic System WKB well-known binary WKT well-known text WLAN wireless local area network micrometer μm 1 Introduction 1
1 Introduction
In human history geospatial data has always played an important role. Some of the earliest known maps date back to approximately 25.000BC depicting mountains, rivers, valleys and routes [1]. Today, people are still just as interested in such spatial information. With more sophisticated data acquisition and visualization techniques driven by the rapid evolution of technology, the data has evolved in detail and complexity. In recent years, a strong trend towards mobile computing has arisen, mainly propelled by mobile devices such as smartphones which have had a strong impact on worldwide markets since Apple introduced the first iPhone in 2007 [2]. In 2017, about 1.54 billion smartphones were sold [3]. Next to Apple other major companies, like Google, Samsung, Sony and LG, have also developed numerous models of smartphones and mobile devices, propelling the evolution of mobile technology and allowing for more powerful and yet smaller devices (e.g. wearables like smartwatches). [4] coined the term ubiquitous computing which basically describes the main direction in which technology is moving, computing anytime and anywhere. Modern mobile devices essentially offer possibilities equal to desktop computers, with some design and hardware specific aspects that must be considered. For example, the typically much smaller displays in comparison to a desktop PC or the limited capabilities of low-cost mobile hardware. Therefore, new ways of interaction and visualization are of importance. 2 1 Introduction
Head-mounted displays (HMD) for example work around the issue of constrained visualization space, by bringing displays closer to the user’s eye. This technique is for instance used by virtual reality (VR). The goal of VR is to fully immerse the user in the virtual world. VR has been a topic of numerous projects and research efforts since the 1980s, but just very recently the interest has increased significantly, especially since the Oculus Rift 1 was presented in 2012. Along with VR, the interest in mixed reality (MR) and augmented reality (AR) in particular has increased (Figure 1.1). AR adds additional virtual information to the user’s real-world perception instead of immersing him into a virtual world.
100
80
60
40 percent(%) 20
0
2004-01 2004-11 2005-09 2006-07 2007-05 2008-03 2009-01 2009-11 2010-09 2011-07 2012-05 2013-03 2014-01 2014-11 2015-09 2016-07 2017-05 2018-03 Figure 1.1: Trend of AR in the last 14 years, based on search statistics from Google [5]. The values indicate the search interest relative to the highest interest in the specified time period.
1 https://www.oculus.com/rift/ 1 Introduction 3
As one of the first global companies, Google announced the development of a wearable vision-based AR solution with a miniature computer combined with an optical head-mounted display (OHMD), named Google Glass 2. Though, for now the developments have been postponed, due to hardware restrictions and user experience issues. Next to Google the entertainment and gaming industry has recently also focused some research activities on AR applications. However, instead of developing special hardware, many solutions rely on smartphones. This is not surprising, since modern smartphones already provide all the necessary components of an AR system. But not only developers and users from the entertainment and gaming industry have found application areas for smartphone-based AR, use cases in the geospatial domain have proven to be promising. For instance, a location-based mobile AR system can be utilized for the georeferenced visualization of points of interest (POI) or for navigational purposes. The advantage is that the user can view the information in a much more natural way from a first-person perspective. From a civil engineering point of view, the visualization of buildings is highly interesting, as some examples named by [6] show: • Real estate and planning offices could offer their clients the possibility to visualize planned buildings on parcels of land. With the help of the AR system, clients could inspect the georeferenced virtual 3D building models freely on-site and easily compare colors, sizes, look and feel or overall integration into the cityscape.
2 https://x.company/glass/ 4 1 Introduction
• Physical buildings could be augmented to enable the visualization of hidden building parts, such as cables, pipes and beams. • Tourist information centers could offer tourists visual historic city tours by visualizing historic buildings on-site and displaying additional information and facts about the location and the historic building.
In the following, a location-based mobile AR system is defined as a system that utilizes geospatial data and operates in a global reference system. For such AR solutions, not only suitable hardware is obligatory, but also appropriate data. There are many graphic formats for modelling 3D objects, but these typically lack spatial references and the possibility to store additional information about the object. A promising alternative to these formats are semantic information models , which have become fairly popular for geographic information systems (GIS). A concrete realization of such a model is the Geography Markup Language 3 (GML)-based XML encoding schema City Geography Markup Language 4 (CityGML). It enables modelling, storing and exchanging city models by employing a modular structure and a level of detail (LOD) system. It provides necessary classes that are generally found in cities, such as buildings, roads and water. Many use cases of CityGML are created around desktop PC environments and typically focus on overview-like presentations of the data. Some typical desktop tasks are large-scale analyses, solar potential analyses, shadow analyses and disaster
3 http://www.opengeospatial.org/standards/gml 4 http://www.opengeospatial.org/standards/citygml 1 Introduction 5
analyses [7]. However, applications of CityGML in mobile environments are still scarce. AR promises to be a suitable technology for mobile CityGML. In this work a mobile location-based AR system for CityGML data is presented.
1.1 Thesis Goal
The goal of this thesis is to develop a concept for a location-based mobile AR system that can visualize buildings or building parts to allow the realization of use cases, as described in the previous section. Furthermore, a fully functional self-sufficient mobile AR system prototype should be implemented to evaluate if modern mobile devices, such as smartphones, satisfy the requirements of an AR system for such scenarios. Figure 1.2 and Figure 1.3 depict exemplary reference visualizations for the targeted application. To develop a system as proposed, various state-of-the-art methods must be combined from different disciplines, such as 3D modelling, real-time rendering, computer vision (CV), etc. and appropriate hardware and data must be found. For this, a detailed literature review and a familiarization with the practical implementation for each topic are mandatory. To find suitable devices, data and methods for this specific AR system, empirical comparisons between the different possibilities are essential. Lastly, the finished proof-of- concept system should be evaluated.
6 1 Introduction
Figure 1.2: Concept image of outdoor AR system.
Figure 1.3: Concept image of indoor AR system. 1 Introduction 7
1.2 Related Work
Concept of Augmented Reality Already at the end of the 1960s Ivan Sutherland presented the first stationary AR system with a stereoscopic HMD. The poor capabilities of processing units only enabled very simple wireframe models of rooms to be placed over the real world though. Sensing of the head’s position was done using a heavy mechanical head position sensor, which determined the position and orientation (pose) of the user’s head. Due to the weight of the entire system, it was fittingly named “The Sword of Damocles” [8]. But what exactly is AR? Basically, AR adds additional virtual information to the user’s real-world perception. Generally speaking, this not only includes augmentations of the visual sense (ophthalmoception), but all senses including hearing (audioception), taste (gustaoception), smell (olfacoception) and touch (tactioception). In most cases when referring to AR, the focus lies on visual augmentations, as the majority of past research projects has shown. In this thesis, the focus also lies on visual-based AR. According to [9], AR ideally enables a virtual object and a real object to coexist in the same space. As it is often the case, ideas exist long before these are actually realized. For AR, this also holds true. Already back in the early 1900s the novel author L. Frank Baum mentions spectacles that overlay data on real-world people [10], but not until the 1960s a first AR system was developed. The actual term Augmented Reality is credited to [11], which they coined in the early 1990s. 8 1 Introduction
Widely accepted definitions of AR where made by Milgram and Azuma. [12] first presented their reality-virtuality continuum in 1994. As depicted in Figure 1.4, the continuum reaches from the real environment (RE) to the virtual environment (VE), with AR and augmented virtuality (AV) being two realms in between of both and part of MR. While VR places the user in an entirely computer- generated world, MR environments combine virtual- and real-world elements to enhance the user’s perception. AR adds virtual elements to real-world perceptions and AV in contrast complements virtual environments with real-world information.
Figure 1.4: Reality-virtuality continuum according to [12].
A following, also widely accepted, but more precise definition of AR was made by [9]. According to this definition, AR is the extension of perception by integrating additional virtual information and can be characterized by three main characteristics: • It combines real and virtual elements • It must be registered in 3D • It is interactive in real time ‐ The combination of “real and virtual elements” defines the actual description of AR and is extended by the necessity of “registration in 3D”, which separates it from 2D virtual video overlays. “Interactive in 1 Introduction 9
real-time” further delimits AR from virtual computer graphic augmentations, as used in pictures or television (e.g. overlay of virtual lines in soccer games) by requiring the system to update in real-time and respond to the user. In this thesis, AR is understood as defined by Milgram [12] and Azuma [9].
Types of Visual Augmented Reality A common interaction scheme for today’s computers is the point-and- click action. AR needs a different kind of interaction, especially in terms of mobile AR. [13] states that AR research aims at the development of “intuitive user interfaces”, with the words of [14], the transformation of the world into the user’s interface. According to [15], the advantages of AR are obvious: When the real- and virtual environment are viewed separately the information of both environments must be combined by the user. If real- and virtual information is perceived already combined, the experience is simplified, making the information likely more comprehensible. [16] gives an overview of possible applications for AR: For example, medical visualizations for surgery training with the help of x-ray visions of patients, manufacturing and repair work for assembly, maintenance and repair of complex machinery, robot path planning or entertainment systems. According to [15], five types of technical realizations of visual AR exist: optical see through HMD AR, video see through HMD AR, handheld display ‐AR, projection based AR with ‐video augmentation and projection based AR with physical‐ surface augmentation. While optical see through‐ HMD AR uses a transparent surface through which the ‐ user can see the real world, video see through HMD AR uses displays on which captured images are projecte‐ d. In both cases 10 1 Introduction
additional virtual information is displayed into the field of view (FOV) of the user. Handheld display AR captures the real world with one or multiple cameras and displays the video images combined with virtual information on the video display. Projection based AR with video augmentation also uses video cameras to capture‐ the real world and a surface on which the video information, plus its additional virtual information, is projected. Projection based AR with physical surface augmentation uses real-world objects‐ and projects virtual information onto these, creating enhanced object perceptions. Generally, AR systems can be categorized by marker-based and markerless systems. While markerless systems generally use sensor- based position and orientation trackers, marker-based systems rely on image-based methods with 2D fiducial markers, to determine the pose of the device. For marker-based tracking, the environment must be prepared ahead of time by distributing the markers. Markerless tracking, however, needs no such preparations. Therefore, the majority of AR applications generally use marker-based tracking for small-scale environments and markerless tracking for large-scale environments, such as outdoor scenarios.
Augmented Reality Research Projects While the first AR system by Sutherland [8] only allowed movements in a constrained area with a diameter of roughly 2 m, the full potential of AR first unfolds when the system is mobile. The first mobile augmented reality system, named MARS, was presented by [17] and relied on a portable computer carried in a backpack, a personal digital assistant (PDA) with touchpad, an orientation tracker, differential global positioning system (DGPS) and a wireless web access point. It was a campus information system and assisted a 1 Introduction 11
user in navigating to POI and allowed querying information about these. [18] also developed multiple mobile AR systems for indoor and outdoor scenarios, inter alia creating a guided campus tour by overlaying models of historic buildings and georeferenced media, such as hypermedia news stories. Around the same years as the first mobile AR systems were presented, one of the first fiducial-marker-based AR systems was presented by [19]. A popular open source library for tracking with six degrees of freedom (DoF) based on fiducial markers was presented in [20], named ARToolKit. The framework has been used in multiple projects and applications and is still maintained and in development today. It has been ported to major platforms such as Windows 5, Linux 6, MacOS 7 and also the mobile platforms iOS 8 and Android 9. [21] first presented a backpack-based system which was developed into Tinmith [22]. It evolved into a combination of a small computer worn on the belt, a helmet with HMD, wireless input devices, and a freeware software framework. Pose tracking was done using global positioning system (GPS) and an InterSense orientation tracker. [23] presented Archeoguide, a mobile AR system consisting of mobile computers, wireless access points and GPS. The system offered navigation and visualizations of ancient constructions on the historical Olympia site in Greece. [24] also presented an AR system named GEIST for interactive narrative tours through historical
5 https://www.microsoft.com/de-de/windows 6 https://www.linux.org/ 7 https://www.apple.com/de/macos/mojave/ 8 https://www.apple.com/de/ios/ios-12/ 9 https://www.android.com/intl/de_de/ 12 1 Introduction
locations, where historical facts and events were told by fictional avatars. Efforts towards indoor solutions were made by [25]. They presented a mobile AR building navigation guide based on the ARToolkit. The system displayed directional information in the see- through heads up display by overlaying a registered wire-frame model of the building. The overall position of the user in the building could be viewed on a wrist worn pad, displaying a 3D world-in-miniature (WIM) map, which also acted as an input device [25]. Tracking was done using fiducial markers, which were distributed across the building, and an inertial measurement unit (IMU). Another indoor guidance system was developed by [26], which was also based on the ARToolkit and fiducial marker recognition to visualize and overlay 3D models of the environment. The major improvement here was that the system ran completely on a PDA, minimizing the need for infrastructure. [27] presented a mobile urban environment outdoor AR system which overlays virtual wireframe models over the real buildings. In comparison to their predecessors, they used an edge-based visual matching method and a combination of the sensors, gyroscope, magnetometer and gravimeter for tracking. The disadvantage is that a good initial pose is required in contrast to the approach utilized in this thesis. Next to fiducial marker-based approaches, other visual tracking methods have gained interest. Not only is no preparation of the environment required, as would be the case when using fiducial markers, but the methods can be used equally in both small- and large-scale environments. This, for instance, enables the possibility of precise pose tracking indoors and outdoors. This method has been used frequently, especially indoors, since positional information is 1 Introduction 13
hardly or not available at all, due to the lack of global navigation satellite systems (GNSS) availability. [28] introduced SiteLens, a mobile AR system for urban planning purposes. The system visualizes referenced 3D objects, mainly buildings. In 2010, a handheld AR system named Vidente that visualizes underground infrastructures, like pipes, was introduced by [29]. [30] presented the LARA system, an assistive AR system that is also able to visualize underground infrastructures. It uses the GeoTools 10 open source library and PostGIS 11 . For pose tracking, GNSS and IMU data is fused. [31] presented a mobile system that enables the visualization of CityGML data in different modes, like VR and AR. The data is stored in the PostgreSQL 12 -based 3D City Database 13 (3DCityDB) and is transmitted using a client-server model. The client is implemented on the iOS platform using the Glob3 Mobile API 14 , a map and 3D globe framework. Next to the mentioned research projects, also open source, freeware and commercial frameworks were developed, which commonly provide readily implemented low-level methods to realize AR projects more easily. Every framework has a certain focus though, so that one must be chosen depending on the type of project. An overview can be found here [32].
10 http://geotools.org/ 11 https://postgis.net/ 12 https://www.postgresql.org/ 13 https://www.3dcitydb.org 14 http://glob3mobile.com/ 14 1 Introduction
Commercial Augmented Reality Systems Furthermore, complete commercial solutions that combine hardware with software frameworks were developed by for example Google and Microsoft. First developer versions of the Google Glass, an OHMD in the shape of eyeglasses with a miniature computer attached to it, developed by Google, was sold in 2013, but consumer versions were postponed due to technical limitations. Nevertheless, advancements in AR were still made with Google’s Project Tango [33], which consists of a specialized tablet device and a software development kit (SDK). For sensing, the device includes a 4 megapixel (MP) red, green, blue (RGB)-infrared (IR) pixel sensor camera, a motion tracking camera, 3D depth sensor, accelerometer, ambient light sensor, barometer, compass, GPS and a gyroscope. The sensors are backed up by suitable hardware, such as a NVIDIA Tegra K1 15 (192 CUDA cores), 128 gigabytes (GB) internal storage and 4 GB of random access memory (RAM). 3D pose tracking is realized using three core technologies: motion tracking, area learning and depth perception. Motion tracking uses visual-inertial odometry (VIO), which is a combination of visual odometry (VO) and IMU, to estimate the device’s pose relative to a starting pose. VIO utilizes a series of camera images in which the relative positions of different image features are tracked to determine pose changes. By incorporating an IMU into the process, drift errors occur. To correct these, Tango uses its second core technology, area learning. With this, the device gains the ability to record key visual features of the physical 3D world, like edges and corners, which can be used to recognize the area at a later time. The features are mathematically described and saved in a
15 https://www.nvidia.com/object/tegra-k1-processor.html 1 Introduction 15
searchable index. When finding known features, the system can apply drift corrections by adjusting its path according to previous observations, before the drift errors occurred. The third core technology, depth perception, uses IR rays to create point clouds of the surrounding environment. From these point clouds, 3D coordinates can be calculated which enable positional tracking forwards, backwards, up and down. Since 2016, Microsoft also has made its head-mounted see-through holographic headset, named HoloLense [34], available. For displaying holographic images into the user’s sight, it features two HD 16:9 light engines, using a custom-built holographic processing unit, 64 GB of flash storage and 2 GB of RAM. For tracking, it utilizes one IMU, four environment understanding cameras, one depth camera, one 2 MP camera, four microphones and one ambient light sensor. It also features possibilities for human interactions by sound, gaze, gesture and voice.
Commercial Augmented Reality Frameworks Next to hardware coupled solutions, the major companies are also developing more general approaches that enable AR on non- specialized devices. Apple, for instance, bought the German AR startup company Metaio in 2015 [35] and recently released ARKit 16 as part of iOS 11. It allows creating AR experiences on normal iPhones and iPads using monocular VIO. Google accordingly announced ARCore 17 , a similar framework that enables AR experiences on normal Android smartphones, also using monocular VIO. The
16 https://developer.apple.com/arkit/ 17 https://developers.google.com/ar/ 16 1 Introduction
disadvantage of both solutions is that the pose is only estimated in a local coordinate frame, relative to an initial pose. In this thesis the focus lies on developing an AR system that determines absolute poses in a global reference frame. 2 Background 17
2 Background
AR combines numerous research fields and highly specialized methods. In general, an AR application can be split into three essential components, the reality component, the virtual component and a method to combine these two components. The following sections discuss fundamental methods for this thesis and related topics of AR systems. Since different domains commonly share certain vocabulary (e.g. CV, photogrammetry, 3D real-time rendering), but in some cases understand something else by it, the terminology in the following sections should be understood in the current context. Related methods or terminology are additionally provided if required.
2.1 Fundamentals
This section provides details on fundamentals that are essential to an AR system. These basics are found in AR-related sub-topics, such as 3D real-time rendering, CV, etc.
2.1.1 Transforms
Some of the most fundamental parts of AR are the transforms . They are essential for important tasks such as 3D real-time rendering, pose tracking, etc. With the help of a transform objects can be altered by position, orientation or size. A translation is such a transform. It refers to changing the , and values and thus the location of an object. Transforms that preserve vector addition and scalar 18 2 Background
multiplication are called linear transforms . The following is true then (equation (2.1)):
(2.1) , . Some typical linear transforms are rotation and scaling . However, it is advantageous to be able to combine multiple transforms, such as a translation and rotation. With an affine transform it is possible to perform a linear transform followed by a translation by combining these. In computer graphics, affine transforms are typically represented as 4×4 matrices, using the homogeneous notation , so that translations and perspective projections can be expressed as matrix multiplications, allowing single transforms, like rotation, scaling and translation, to be combined into a single matrix. Especially in computer graphics this uniform representation of transforms is desirable, since hardware can be optimized towards handling 4×4 matrices. This has the advantage that the coordinates of a vertex only must be multiplied once. According to [36] and [37], a rotation matrix and a translation matrix can be concatenated to a matrix which can be multiplied with , resulting in the same rotation and translation as if had been multiplied with them separately (see equations (2.3) and (2.2)).
(2.2) ∙ ∙ ∙ where:
(2.3) ∙ The following sections give an overview of typical transforms using 2 Background 19
the homogeneous form:
Translation
Figure 2.1: An example for a translation of a geometrical object.
A translation is used to move points of a geometry by the same amount in a certain direction. The translation matrix is given by .
1 0 0 0 1 0 0 0 1 0 0 0 1 With this matrix, a vertex is translated by the vector resulting in a new , , vertex , 1 . Like this, a shift from one location , , to another can be accomplished ′ as depicted in Figure 2.1. 20 2 Background
Rotation
Figure 2.2: An example for a rotation of a geometrical object.
The following matrices , and are used to rotate an instance by radians about the -, - and -axis. Figure 2.2 shows an example∅ for a rotation of a shape.
1 0 0 0 0 cos∅ sin∅ 0 ∅ 0 sin ∅ cos ∅ 0 0 0 0 1
cos ∅ 0 sin ∅ 0 0 1 0 0 ∅ sin∅ 0 cos∅ 0 0 0 0 1 2 Background 21
cos∅ sin∅ 0 0 sin ∅ cos ∅ 0 0 ∅ 0 0 1 0 0 0 0 1 Euler Transform In virtual worlds, it is important to be able to orient objects, such as the virtual camera. Yaw, pitch and roll are defined by the Euler Transform which is a concatenation of the three rotation matrices , so that (Figure ,2.3). , ℎ, ℎ
Figure 2.3: The three DoF (yaw, pitch and roll).
A disadvantage of the Euler transform is that a gimbal lock can occur, in which case one DoF is lost. This is caused by two axes aligning in such a way that a rotation of either of these axes results in a rotation of the other axes. Given a three-gimbal mechanism in its initial state, therefore, with the three gimbal axes mutually perpendicular, this can for example occur when a 90° change about the pitch axis is applied. 22 2 Background
The yaw axis gimbal and the roll axis gimbal become aligned (Figure 2.4) so that changes to roll and yaw then essentially apply the same rotation.
Figure 2.4: An example for gimbal lock when using Euler angles.
Quaternions A solution to the gimbal lock problem is the use of quaternions . A quaternion is an extension of the complex numbers originally described by [38]. It can simply be thought of as a delta rotation that describes the shortest path between two rotations. This is advantageous in comparison to rotations in Euler angles that are represented as a sequence of steps. While Euler angles are quite intuitive compared to quaternions, they also require some questions to be answered: For example, in which order the rotations are applied or what the reference frame is. Another advantage is that quaternions have a smaller memory footprint, using only 4 floating point numbers in contrast to rotation matrices with 9 (3×3 matrix) or 16 (4×4 matrix) respectively. The mathematical definition of a quaternion is given by equation (2.4):
2 Background 23