Low-latency head-tracking for AR

An exploration into open source head tracking

Konrad Urdahl Halnum

Thesis submitted for the degree of Master in Networks and Distributed Systems (ND) 60 credits

Department of Informatics Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

Autumn 2020

Low-latency head-tracking for AR

An exploration into open source head tracking

Konrad Urdahl Halnum © 2020 Konrad Urdahl Halnum

Low-latency head-tracking for AR http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo Abstract

Augmented Reality (AR) and Virtual Reality (VR) are technologies gaining traction these days, where several actors are trying to find their niche and grab the market. The common denominator between many of these actors is proprietary software. There are also two main pathways within AR: mobile devices or headsets, where the latter is the focus of this thesis. Mobile AR is often in the manner of overlaying something rendered onto the screen of the mobile device, which is showing the surroundings recorded by its camera. AR headsets are in contrast often equipped with clear reflective elements that allow the user to see through them while the rendered augmentation is reflected onto these, or with a see-through screen. This thesis will explore the possibilities of developing Open Source Software (OSS) AR tracking software for such headsets. In order to do this, an Inertial Measurement Unit (IMU) has been employed for fast tracking of the users’ head, while a camera is required for enabling the tracking software to gain spatial awareness. These components have been mounted on an open-source 3D-printed AR headset, and a simple game level has been developed in a suitable game engine to test its performance. Also, two experiments have been conducted which look into the amount of delay there exists between said IMU and computer or game engine. This has been an interesting process, where several hurdles had to be overcome, such as finding compatible hardware and software, interoperability between the game engine and external APIs, and software development. In the end, this project hopes to be a solid bedrock for further development within this field, where headset, IMU, camera, and game engine have been married together and provided for further development on low-latency head-tracking for AR.

i ii Contents

I Introduction1

1 Motivation3 1.1 Purpose and goals...... 4 1.2 Outline...... 5

2 Background7 2.1 Rotations...... 7 2.1.1 Euler angels...... 7 2.1.2 Quaternions...... 8 2.2 Previous Work...... 9 2.3 Tracking...... 9 2.3.1 Inertial Measurement Unit (IMU)...... 9 2.3.2 Marker-based tracking...... 10 2.3.3 Edge detection...... 10 2.4 Rendering in AR...... 11 2.4.1 Hand tracking...... 11 2.4.2 Cockpit detection...... 12

II Planning 15

3 Introduction 17 3.1 Project outline...... 17 3.1.1 Goals...... 17 3.1.2 Latency testing...... 18 3.1.3 Rendering...... 18 3.2 Process...... 19 3.3 Equipment...... 20

4 Augmented Reality (AR) headset 21 4.1 Project North Star Modular AR Headset...... 21 4.1.1 Hardware...... 22 4.1.2 Software...... 24 4.1.3 Assembly...... 25 4.2 Alternatives...... 26 4.2.1 Microsoft HoloLens 1...... 26 4.2.2 Microsoft HoloLens 2...... 27

iii 4.2.3 Meta 2...... 27 4.2.4 Magic Leap...... 29 4.2.5 Aryzon...... 29

5 Inertial Measurement Unit (IMU) 33 5.1 Tinkerforge IMU Brick 2.0...... 33 5.1.1 Specifications...... 34 5.1.2 Tinkerforge Brick Daemon...... 35 5.1.3 Tinkerforge Brick Viewer...... 35 5.1.4 Tinkerforge API...... 36 5.2 Alternatives...... 37

6 Camera 39 6.1 Camera...... 39 6.1.1 Universal Serial Bus (USB) Camera...... 39 6.1.2 Power over Ethernet (POE) Camera...... 41

7 Game Engines 43 7.1 Unity 3D...... 43 7.1.1 Overview...... 44 7.1.2 Scripting...... 45 7.2 Unreal Engine...... 47

8 Other Software 49 8.1 Visual Studio...... 49 8.2 Kdenlive...... 49 8.3 Blender...... 50

9 Integration 51 9.1 Mount...... 51 9.2 USB hub...... 52 9.3 IMU...... 53 9.4 Camera...... 53 9.5 Cables...... 54 9.6 Room for improvement...... 54

III Development 57

10 Latency 59 10.1 Equipment...... 59 10.1.1 Huawei P30 Pro...... 59 10.1.2 Asus VG24QE...... 60 10.1.3 Tripod...... 60 10.1.4 Phone holder...... 60 10.2 USB stack...... 60 10.2.1 Setup...... 60 10.2.2 Software...... 62 10.2.3 Conducting the experiment...... 63

iv 10.2.4 Results...... 64 10.3 Game engine...... 65 10.3.1 Setup...... 66 10.3.2 Software...... 67 10.3.3 Conducting the experiment...... 67 10.3.4 Results...... 68 10.4 Concluding thoughts...... 69

11 Tracking 71 11.1 Rotation...... 71 11.1.1 Introduction...... 72 11.1.2 Transforming objects in Unity...... 72 11.1.3 IMU input...... 73 11.1.4 Calculating rotation...... 74 11.1.5 Result...... 75 11.2 Movement...... 76 11.2.1 Development...... 77 11.3 Visual...... 77 11.3.1 CCTags...... 78 11.3.2 ARToolKit...... 79 11.3.3 Vuforia...... 79 11.3.4 OpenCV...... 80 11.3.5 OpenCVSharp...... 80 11.3.6 EmguCV...... 81 11.3.7 OpenCV plus Unity...... 81

IV Conclusion 83

12 Results 85

13 Future work 87

Bibliography 89

A Source Code 99 A.1 Delay experiment code...... 99 A.2 Unity assets...... 99 A.3 Project North Star repository...... 99 A.4 Videos & Unity package...... 99

v vi List of Figures

2.1 Occlusion example...... 13

4.1 North Star AR headset w/calibration stand [20]...... 21 4.2 North Star headset final assembly (front view)...... 26 4.3 Microsoft HoloLens 1 [67]...... 26 4.4 Microsoft HoloLens 2 [93]...... 27 4.5 Meta 2 [68]...... 28 4.6 Magic Leap [66]...... 29 4.7 Aryzon AR/MR [74]...... 30

5.1 Tinkerforge IMU Brick 2.0 [18]...... 34

6.1 Universal Serial Bus (USB) Camera...... 40 6.2 Basler Power Over Ethernet (POE) Camera [5]...... 40

7.1 Unity 3D interface overview...... 44

9.1 North Star headset attachments...... 51 9.2 North Star headset (rear view)...... 52

10.1 USB latency test, an overview...... 61 10.2 USB latency test, detail...... 63 10.3 Delay measurements USB stack...... 66 10.4 Unity 3D latency test, detail...... 66 10.5 Delay measurements Unity 3D...... 69

11.1 Scene overview...... 72 11.2 Scene interior...... 72 11.3 Marker 2×2...... 77

vii viii List of Tables

4.1 BOE VS035ZSM-NW0 technical specifications [85]...... 23 4.2 Display Driver Board technical specifications [82]...... 23 4.3 Combiner lenses technical specifications [81]...... 24 4.4 HoloLens 1 technical specifications [67]...... 27 4.5 HoloLens 2 technical specifications [93]...... 28 4.6 Meta 2 technical specifications [68]...... 28 4.7 Magic Leap technical specifications [66]...... 29 4.8 Aryzon technical specifications [86]...... 30

5.1 Tinkerforge IMU Brick 2.0 technical specifications [108]... 35 5.2 CTiSensors CS-IM200 IMU technical specifications [26]... 37 5.3 Osmium MIMU22BL IMU technical specifications [75]... 38

6.1 Basler Ace acA2000-50gc technical specifications [5]..... 40

9.1 Power and bandwidth requirements headset attachments.. 53

10.1 Equipment overview...... 61 10.2 Delay: USB-stack...... 65 10.3 Delay: Unity 3D...... 68

11.1 Overview of sensor fusion modes, IMU [31]...... 73

ix x Glossary

3D-print Create a plastic object in 3D. i, 21, 52, 55

Asset All components of a Unity project, be it code, 3D-models, or imported plugins. 44 script What unity calls code in #. 45, 46, 49, 67, 71, 77

xi xii Acronyms

API Application Programming Interface. 18, 19, 34, 35, 36, 38, 41, 46, 48, 62, 63, 67, 73, 80 AR Augmented Reality. i, 3, 4, 5, 9, 18, 19, 20, 22, 26, 29, 35, 43, 51, 71, 72, 80, 85

CPU Central Processing Unit. 65

DDB Display Driver Board. 52, 53, 54, 55, 87 DOF Degrees of Freedom. 29, 33, 34 DRM Digital Rights Management. 23

FOV Field of View. 22, 27, 28, 29, 30 FPS Frames per Second. 4, 19, 22, 23, 39, 40, 54, 59, 64, 68

GPU Graphics Processing Unit. 78 GTG Gray-To-Gray. 60, 65

Hz Hertz. 27, 28, 29, 35, 37, 38, 60, 63, 65, 67

IC Integrated Circuit. 23 IDE Integrated development environment. 20, 47, 49 IMU Inertial Measurement Unit. i, 4, 5, 6, 9, 10, 17, 18, 19, 20, 25, 26, 27, 28, 33, 34, 35, 36, 37, 39, 47, 52, 53, 59, 61, 63, 65, 66, 67, 71, 73, 75, 76, 77, 85, 87

LED Light Emitting Diodes. 37 LTS Long Term Support. 43

MCU Micro Controller Unit. 23

OS Operating System. 19, 28, 29, 35, 61, 62, 63, 65, 68

xiii OSS Open Source Software. i, 4, 20, 25, 30, 79, 80, 81, 99

POE Power Over Ethernet. vii, 39, 40, 41 PPI Pixels per Inch. 23 Px Pixel(s). 22, 23, 28, 29, 39, 40, 65

RGB Red Green Blue. 37

SDK Software Development Kit. 26, 30, 31 SLAM Simultaneous Localization and Mapping. 27, 28

USB Universal Serial Bus. vii, 5, 23, 25, 34, 39, 40, 41, 51, 52, 54, 65, 68, 69, 87

VR Virtual Reality. i, 3, 43, 69

xiv Part I

Introduction

1

Chapter 1

Motivation

One of the main issues with using Virtual Reality (VR) and Augmented Reality (AR) headsets today is the delay between head-movement and representation of the corresponding movement in the headset. The motivation behind this thesis is reducing the latency of this movement with sensors and cameras.

There has been development within positional tracking for AR [25] [61][120][48][63][30][50][59] , where several methods have been used, which will be discussed in detail, to various degrees of user satisfaction [50]. The common denominator is the usage of cameras to track the environment and user, which is a reasonable starting point, but also introduce issues of its own.

When considering the number of steps an algorithm must go through to accomplish tracking (simplified), it is quite easy to see the reason behind the delay:

• A camera records a frame

• The frame is passed to a machine

• The frame is loaded into memory

• The frame is analysed for positional information with image recogni- tion

• The frame may have to be manipulated to reveal blurred information

• The markers/objects are recognised

• The position is calculated

• THe position is passed to the rendering engine

• The frame is rendered and presented to the screen

3 This process must be done several times a second, preferably as close to 25 frames per seconds as possible for a smooth experience. To reduce this latency, the use of an Inertial Measurement Unit (IMU) mounted on the headset could be beneficial. This unit can record movement much faster than any camera but cannot place the headset in 3D space. If a camera is employed as a general tracker for placement in the room and for correcting any travel or error from the IMU, one might end up with a much smoother experience. To test this tracking, an AR headset is needed, where several options may be considered, all with their differences. Some or focused on mobile AR where a phone is inserted, which are a bit too simple, while others are intended for enterprise solutions. Both already have tracking implemented, with source code unavailable to us. There also exists headsets that are intended for research and development by the community that are more suitable for this project due to their Open Source Software (OSS). Placing a camera and IMU on a headset should not pose as an issue to the users comfort as there exist several such devices that are compact enough not to be a burden. Finally, some software needs to be written that combines the output of the sensor used for tracking and feeds it rendering software, where a game engine is a logical beginning. As a proof of concept, this setup could be tested in a racing simulator. The general idea would be to create a 3D model of the cockpit interior which would overlay the control inputs such as steering wheel and pedals allowing the user to reap the benefits of AR: non-disturbed vision to elements of the real world that needs interaction. However, the tracking needs to be so precise that the rendered items around the cockpit stay anchored, otherwise head movements would move the interior around in a possibly quite discomforting manner. This testing environment could prove to be challenging to develop if no suitable open source (or readily available) 3D models of cockpits and a functioning engine for simulation is found. Building what is essentially a complete game might not be feasible, and thus a more straightforward method for testing could be employed. The main task will be to implement tracking algorithms with data gathered from any selected sensors, such as an IMU and camera, and combining them in a sufficiently timely manner into a position that may be used by rendering software. Creating as efficient algorithms as possible will be beneficial to cut down on frame rendering time, resulting in a higher Frames per Second (FPS). This topic can be considered one of the core features of AR and VR that needs further research and has many more use- cases than just a racing simulator. Hopefully, this project will be an essential starting point for anyone interesting in open source AR development.

1.1 Purpose and goals

The goal of this thesis is to create software that can estimate the position of a users head and its movement in an efficient and precise manner through the use of an IMU and camera mounted to an AR headset.

4 The following points can summarise the general goals of this thesis:

• Interpret rotational information from an IMU in the form of qua- ternions.

• Interpret positional information from an IMU in the form of linear acceleration and cope with its highly imprecise nature.

• Record frames from a camera and uses computer vision to track markers to create an understanding of 3D space.

• Stream this data into a game engine that is compatible with an available AR headset and renders a cockpit for testing the developed algorithms, and finally user experience.

1.2 Outline

This thesis is split into four main parts, with individual chapters and sections, and an appendix.

PartI: Introduction

This part will introduce this thesis by discussing motivation and back- ground.

• Chapter1 : Motivation, presents the motivation, purpose and goals of this thesis.

• Chapter2 : Background, presents theoretical background and ideas needed to follow this project.

PartII: Planning

This part concerns itself about the planning stage of this thesis: which devices, hardware and software are applicable for this project.

• Chapter3 : Introduction, presents the outline, process and overview of equipment needed for this project.

• Chapter4 : Augmented Reality (AR) headsets, discusses the AR headset used in this project: Project North Star Modular AR Headset, some alternatives considered, and why they were not chosen.

• Chapter5 : Inertial Measurement Unit (IMU), discusses the IMU used in this project: Tinkerforge IMU Brick 2.0, some alternatives considered, and why they were not chosen.

• Chapter6 : Camera, discusses the two types of cameras considered for this project, and the one selected: a generic Universal Serial Bus (USB) camera and why.

5 • Chapter7 : Game engines, discusses the game engine selected for this project (and why): Unity 3D and the functions used. An alternative engine is also introduced. • Chapter8 : Other software, introduces some software used in support for this project. • Chapter9 : Integration, explains the process of assembling the AR headset and attaching our equipment to it, with its troubles.

Part III: Development In this part all software development and experiments are discussed. • Chapter 10: Latency, describes the two experiments conducted to find the latency between the IMU and a simple program in C, and Unity 3D, respectively by summarising the equipment needed, how to set up the experiments, the software developed and the result. • Chapter 11: Tracking, describes the development gone into tracking the users head through rotational input, movement, and visual track- ing, respectively. Discusses the challenges with obtaining rotational data from the IMU, some ideas on handling movement, and finally introduces some libraries for implementing visual tracking.

PartIV: Conclusion The final part discusses what this thesis has achieved, and what may lay ahead for future work. • Chapter 12: Results, discusses what this project has achieved. • Chapter 13: Future work, discusses what remains as future work for this project. • AppendixA : Source Code, contains information about how to get the source code developed in this thesis, and the video clips generated from the experiments.

6 Chapter 2

Background

This chapter will present the theoretical background and ideas pertinent to this thesis. First, Section 2.1 will discuss some theory regarding calculating rotations, then Section 2.2 introduces some previous work, followed by Section 2.3 which introduces some issues with tracking in AR, and finally Section 2.4 which explains some issues with rendering in AR. rotation previous work tracking rendering

2.1 Rotations

This project will require the use of rotational operations to calculate the new position of the users head. There are two common ways of doing this, both supported by the Unity game engine, and will be introduced in the following sections.

2.1.1 Euler angels The following is a summary of the relevant parts concerning Euler angles from Wolfram [118]. According to Euler’s rotation theorem, any rotation may be described using three angles. If the rotations are written in terms of rotation matrices D, C, and B, then the general rotation A can be written as A = BCD. The three angles giving the three rotation matrices are called Euler angles. There are several conventions for Euler angles, where x-convention is the most common definition. Given the matrix A as

a11 a12 a13 A = a21 a22 a23 a31 a32 a33

In this convention, the rotation given by Euler angles (φ, θ, ψ) is where: 1. first rotation is by an angle φ about the z-axis using D,

7 2. second rotation is by an angle θ ∈ [0, π] aboDut the former x-axis using C, and

3. third rotation is by angle ψ about the former z-axis using B

Therefore, Euler angles represent a 3D rotation by performing three separate rotations around individual axes. It is worth noting that the Unity engine does this differently to the x-convention, and performs these rotations first around the Z-axis, then the X-axis, and finally the Y axis [105]. This may be called the game axis layout where the z-axis is in the direction of either in or out of the screen, while X and Y are within the screen plane.

One benefit of Euler angles is that they are intuitive in the sense the rotations are human-readable, but a major limitation is that they suffer from Gimbal Lock. When applying three rotations in turn, the first or second rotation can result in the third axis pointing in the same direction as one of the previous axes [102]. Due to this, there exists an alternative representation called quaternions, which is also the underlying representation of rotations in Unity.

2.1.2 Quaternions

Quaternions are four-dimensional complex numbers that can be used to describe rotations in space. Rotations are expressed as a rotation angle around a rotation axis, as opposed to Euler angles where rotation is described as rotation components around x-, y- and z-axes [97].

This representation consists of four numbers, represented in Unity as z, y, z, and w. These numbers should not be manipulated directly and are not intuitive for the user. However, the representation does not suffer from Gimbal Lock [102].

Quaternions display several properties, but this project’s focus is on calculating the rotation from quaternion to another, which may be done b multiplying the quaternions together. If the rotation from frame a to b, qa is composed of q1 followed by q2, the total rotation is given by:

b qa = q2q1 = qw + qxi + wy j + qzk where:

qw = q2w q1w − q2x q1x − q2y q1y − q2z q1z

qx = q2w q1x + q2x q1w − q2y q1z + q2z q1y

qy = q2w q1y + q2x q1z + q2y q1w − q2z q1x

qz = q2w q1z − q2x q1y + q2y q1x + q2z q1z

[97]

8 2.2 Previous Work

As AR has been an emerging field for a while, there has been significant progress within this field [42][25][61][120][48][63][30][50][59][41][117] [95][94][60][54] in the recent years.

In my experience, while there are copious amounts of literature, almost all of it is based upon one specific part of tracking, such as detecting makers or tracking a user’s hands. Also, much of the research is based on mobile devices or glasses, while this thesis is focused on a headset with reflective lenses. In contrast, mobile AR seldom uses a headset, and the AR is often presented through the screen. One exception to this is Aryzon, which is a cardboard headset with a reflective screen, but this seems to the exception rather than the norm. More on this is later. AR glasses seem to be more focused on industry or adding small pieces of information about the environment to the user. An example of this could be Google Glass which aimed to give navigational information or alerts from the users’ phone.

Almost none of the literature that I have seen address AR used in a cockpit, or in other terms, a highly stationary situation where most of the environment is rendered over. Our goal is rather to decide what not to render, rather than where to render an object. Also, I have not seen any literature that covers the whole solution: headset, game engine, tracking. They are all fragmented, and therefore there is value in exploring this aspect of AR.

2.3 Tracking

The following sections will introduce some of the key aspects regarding tracing that are relevant to this thesis.

2.3.1 Inertial Measurement Unit (IMU)

The movement of a user’s head may be summarised as to which way the user’s head is rotating, and in which direction the user’s head is acceler- ating. An Inertial Measurement Unit (IMU) can estimate its orientation through the use of components such as gyroscopes, accelerometers, and magnetometers. The gyroscope is used for angular velocity and orienta- tion, the accelerometer is used for outputting linear acceleration, and the magnetometer is used to measure magnetic fields. Suppose three of each component are placed to represent each axis in 3D space, e.i. x, y, and z, the IMU combines the information from all of these components and outputs the estimated orientation of the IMU [97].

By placing this IMU on the users’ head, one should be to track the users’ movements. There are, however, issues with precision and drift, which need to be corrected and handled.

9 2.3.2 Marker-based tracking An IMU is not enough for tracking, as discussed, due to drift. By adding a camera to the setup, it will be possible to keep track of the user with more precision, and mainly due to the static nature of the user in the project, a camera is even more valuable. This is because the surroundings will be relatively static so that one can use the environment as an anchor through some methods discussed below. Many existing AR implementations, such as games for mobile phones or the Nintendo 3DS already use tags to manipulate elements in the game or application. This includes Aryzon, which is an AR implementation discussed elsewhere in this paper. The tags used by these implementations are often quite complex, usually as 2D barcodes or square blocks on a contrasting background, reminiscent of QR-codes. These tags are often sufficient for simple games, but the detection suffers from rapid camera movement and motion blur, where the detection algorithm fails and has to be reset. For these games, this simply means that an object disappears off- screen until the tag is re-detected. In this project, this re-detection would probably be quite jarring for the user experience as the whole interface – or cockpit – would disappear and reappear in the worst case, or judder and move about in the best case. Thus, a method for compensating for motion blur is beneficial. The idea is to use several tags that the camera can recognise, and through calculating relative distances and some predetermined knowledge of the tags, the system should be able to know in which direction the user is looking. Several parameters need to be considered, such as frame (sample) rate of the camera, size and location of the tags, and design. The method proposed by L. Calvet et al. [21] is using Concentric Circle Tags (CCTags). These circles are tolerant of motion blur created by a moving camera, increasing robustness compared to other tag designs. This allows for rapid camera movement and more acute angles of view, meaning that the camera does not have to point directly at the tag. By changing the thickness of the circles in the tags, each tag is given a unique ID through software encoding. The benefit of this paper is its ability to detect tags with motion blur in the frame, which reduces the time needed for detection by not needing to un-blur the image. Besides, the project has a promising open-source implementation called AliceVision/CCTag [29].

2.3.3 Edge detection Methods for optical tracking without trackers exist, i.e. without any pre- existing knowledge about the scene. This could help if the marked based detection system is unable to track any markers due to various reasons: obscured by the user, the camera is pointing in a different direction, the user is moving too fast, or the algorithm fails. One such tracking method is using edge detection [47]. This is an established method, often called Canny Edge detection after the original

10 author [22], and there are large amounts of literature available on the subject [46][99][64][16]. Also several tutorials exist that might be of use [23][24]. As we are not that interested in high definition edge detection, and only need a precision good enough for consistent recognition of basic objects in the user’s environment, the added computational cost should not impact performance. There has been work on using this type of computer vision for optical tracking, [79][14][15] where some recent efforts have been able to use it efficiently [101] and might serve as helpful addition to the tracking system.

2.4 Rendering in AR

Presenting a virtual cockpit in AR is quite different in its amount and type of rendering done by most AR projects, which in my experience seem more focused on having stationary objects and free-moving users. An example could be exploring in detail an architectural concept where the user(s) is free to walk around the object, observe it from several angles and distances, or even moving inside the object. In a cockpit, the user will be mostly stationary, sitting in a chair, where the main focus of tracking is the head. Regarding rendering, the user will observe the world through a window, be it a racetrack, sky, or submarine environments. We have to ensure that when the user moves about, the rendered part appears in the correct place and is perceived accurately relative to other virtual and physical objects in the scene [30]. Not only do we need to track the rotation of the users’ head, but also sideways motion. This is a challenging task as the 3D world we want to present is shown in 2D on a screen or reflected surface in close proximity to the eyes of the user. There are several challenges here, such as perspective, lighting, and shadowing, where all contribute to the plausibility of the rendered scene. To understand what to render and what not to render, the proposed system has to be able to decide what not to render, which is quite a contrast to most other implementations in my experience, where the question is where to render. One should assume that the system will take the entire cockpit and view and subtract the elements the user will want to interact with directly, such as the controls of the vehicle being simulated or any other physical part of the cockpit. Some sort of object detection is therefore needed. If the cockpit is based upon mainly straight edges, this would fit well with the edge detection approach discussed above, where a suggested algorithm based upon a database containing 3D models of the environment where lines from a frame are matched to this database [79].

2.4.1 Hand tracking First of all, the system needs to know where the user’s hands are to avoid rendering on top. The article by O. Akman et al. [6] go well into depth regarding the issue of hand tracking in AR, and a summary follows.

11 Although the paper is mostly focused on detecting hand movements for user interaction, it highlights several relevant issues with tracking hands. O. Akman et al. [6] implement this trough stereo vision, i.e. two cameras, and by using several cues they are able to increase performance and reliability in challenging conditions without the use of any markers. There are, however, issues with relying on vision only when attempting to track hands. One method with a single camera is using skin colour information to extract the hands from the image, but this method suffers when there are objects in the image of the same colour as the skin. One may also use background subtraction by detecting motion in the picture, but again, this does not work when neither the foreground nor background is static as is the case for AR. Of course, these two methods may be combined and J. MacLean et al. [65] used facial recognition software to pinpoint skin colour for more precise skin colour detection, and also to keep the detection to regions close to the face. Nevertheless, the face we want to track in this case is behind the camera. O. Akman et al. [6] also mentions and references other attempts at tracking using 3D models of the hand-gathered from 2D images, but all of them rely more or less on static elements that are not available to us.

O. Akman et al. [6] proposed a method using colour, gradient, intensity, and depth cues gathered from a stereoscopic vision setup where the pose of the hand in the camera coordinate frame is approximated, and trajectory is subsequently calculated between frames. Without going into depth on the several methods the paper used to implement the tracking, they offer concrete methods for skin colour modelling, including the updating of its colour space in varying scenes. O. Akman et al. [6] also implement tracking by estimating the trajectory of the hand through probability models and the difference between frames, and as such are able to predict movement even if tracking fails.

In conclusion, O. Akman et al. [6] aim at a much higher precision of tracking than this project, but it is an excellent paper to keep in mind for this project to ensure a comfortable and realistic simulation. The use of gloves or other devices is not ideal as the user expects to interact with the cockpit unencumbered, and avoiding them will simplify the setup and its cost.

Another possibility is to use the Leap Motion hand tracking module designer for the headset, which is further discussed in Section 4.1.

2.4.2 Cockpit detection

Related to the subject of hand tracking in Section 2.4.1, one must be able to detect what elements of the real cockpit should be allowed to be seen by the user. Dynamic occlusion is one method for detecting real objects that are in front of virtual objects. This is important in the world of AR due to the jarring effect that a rendered object is partially in front and behind something real. It will break the suspension of disbelief – a user’s

12 Figure 2.1: Occlusion example willingness to “temporarily allow oneself to believe something that is not true, especially in order to enjoy a work of fiction” [100] – quite quickly if not handled correctly. The main challenge of dynamic occlusion is detecting which objects that should be occluded, where this object might also be partially covered by something else [120][38]. Say the steering wheel has to be in front of the rendered world, but whatever system is used for detecting the wheel might be obscured by the user’s hands and arms. Please see figure 2.1 for a simple example of the issue of occlusion. The small figurine is obviously in front of the green droid, even though the droid is rendered over the figurine. These are screenshots from the demo project given by ARCore [87]. Therefore, some sort of object detection based on pre-existing knowledge of the environment has to be implemented. We are fortunate enough that the user will be in a mostly stationary cockpit, nevertheless with a noisy, potentially unknown, background. One would be able to give support for select wheels or other cockpit components. A. Holynski and J. Kopf [48] explored this issue to some success but struggled with a high failure rate and delay.

13 14 Part II

Planning

15

Chapter 3

Introduction

This part of the thesis will discuss the various elements related to planning the project. Section 3.1 will introduce the projects structure, Section 3.2 will explain the process, and Section 3.3 will give an overview into the equipment discussed in Chapters4 to8. These chapters will for each piece of equipment describe the selected item before presenting the alternatives considered and why they were not chosen for this thesis. Finally, Chapter9 will describe the process of integrating the addons for the headset, which is the contribution this thesis has to the assembly of the headset. Integrating these components were not a trivial task, and an investigation into what caused the issues was needed.

3.1 Project outline

The first steps of this project were to gather information and investigate what equipment, tools and software available to us. The following sections will introduce the overall goals of this thesis, the experiments conducted, and finally challenges with rendering.

3.1.1 Goals

Our initial goal is to establish that we are able to use an IMU for tracking by creating communication between the IMU and a game engine. Following this, we need to investigate what information this IMU will give us, and how quickly and precise it does this. Finally, testing how the game engine handles this input and if there are adjustments that need to be made or hinders that need to be overcome. When this is done, measuring the capabilities of the IMU is necessary to quantify its limitations.

Subsequently, the game engine needs to be investigated: are there any AR frameworks available to us? Er there any dedicated engines for AR? These questions need to be answered, and in association with this, compatibility with AR headsets must also be explored.

17 When we are able to render into a headset with a compatible game engine, the input from the IMU needs to be handled. How do we handle rotation and movement of the users head and represent this reliably in the game engine? Considering the limitations of the IMU operating alone, how to use optical tracking and computer vision with the game engine also needs to be researched. To be able to use optical tracking, there are two things that need to be done. First, find a suitable camera for the headset. Second, establish an efficient way of using computer vision within the game engine. For this to happen, the connection between camera and computer needs to be established, and computer vision needs to be integrated into the game engine. To tie all this together, the chosen AR headset must be assembled, and all its components need to integrate.

3.1.2 Latency testing An experiment will be conducted to investigate the delay between both the IMU and machine, as well as IMU and game engine. First, the value from this is knowing how much latency there is in communication alone between the devices we have selected: if the speed is good enough or if there are obvious limitations. Also, is the developed system able to track the user sufficiently on its own, or is motion prediction required in addition? As an example, a game does know where it is rendering, so if there occurs an event in the upper left corner of the users’ viewpoint, it is likely that the user will look towards there. Any data that correlates to movement in this direction should be prioritised over data saying otherwise. Second, what is good enough needs to be quantified. For that to be done, we need to explore what users will tolerate in latency before they do not accept the tracking either from the unacceptable user experience or more subtle discomfort. To be able to conduct such an experiment, a testing bed with a complete enough tracking solution needs to be developed. As will become apparent through this thesis is that a complete tracking system was not possible to develop in our time-frame for various reasons.

3.1.3 Rendering There are many game engines out there aimed at many different targets, be it platform, ease of use or game style. This thesis needs a state-of-the-art game engine that is efficient enough to handle the low-latency nature of this project; capable enough to render in 3D; flexible enough to be adapted to our needs; and crucially, available to us, preferably for free or with student licensing. First, we need to establish that we are able to use external software with this engine, such as the Application Programming Interface (API) for the

18 IMU and computer vision software. Secondly, any headset selected need to be compatible with this engine, either with software developed by an existing AR-framework or plugin, or by us. This project’s experience is that there are many AR frameworks available, but most are not applicable to various reasons, be it compatibility with game engines, age or usability.

3.2 Process

This section will describe the process of attempting to reach the goals outlined above. 1. Establishing connection with the IMU. A small program written in C was written to test the API and explore what data the IMU is able to give us. 2. Integrating IMU into a game engine. Again, a small program was written as a proof of concept for incorporating IMU data in a game engine. We established communication and proved that the API could be compiled by the engine. 3. Writing code for experiments and conducting them. The programs mentioned above were extended into solutions that would enable delay experiments. Research into how an OS renders was conducted and exploited, and a simple game level was developed. Suitable equipment such as fast screen and camera able to record with a high FPS was procured, and the experiments were conducted. 4. Exploring AR frameworks. Before this project had access to an AR headset, various AR frameworks were tested and discarded for various reasons, mostly due to incompatibility with the selected game engine and due to lacking/complicated documentation, or due to targeting mobile devices. In the end, none were selected as the headset came with its own plugin. 5. Testing various computer vision frameworks. We also knew that computer vision with OpenCV or an alternative marker detection framework would be necessary. Again, several frameworks were tested with the game engine, and many discarded due to compat- ibility issues or complexity and time constraints. In the end, the plu- gin provided by the headsets manufacture came with OpenCV integ- rated. 6. Constructing an AR headset, and integrating its components. A headset was procured but was delayed due to force majeure. The headset needed assembly and testing with its plugin and was thankfully compatible with the code developed so far. In addition, the included plugin provided much of the necessary frameworks for further development. With the fully constructed headset, additions could be made to it, and the IMU and a camera were attached.

19 7. Developing tracking code. Now that a headset could be used to test the tracking, some progress was made with developing tracking code. Rotation through quaternion multiplication was finalised, and movement through linear acceleration was attempted. Using computer vision to identify markers and use this for tracking remains as future work. 8. Ensuring feasibility of future work. The final stages of this project were devoted to making sure that further development would be facilitated. The headset and its plugin, the tracking code developed so far, and testing game level is all available for anyone as Open Source Software (OSS). A package may be downloaded and imported into the game engine, and further research into tracking may use this as a starting point.

3.3 Equipment

This section will introduce the various equipment needed for this thesis. • AR headset. An AR headset is used for testing the tracking experience, and for understanding what needs to be implemented. The headset is discussed in Section 4.1. • IMU. For fast motion tracking, an IMU needs to be attached to the headset. The IMU is discussed in Section 5.1. • Camera. A camera enables optical tracking and is attached to the headset. Section 6.1 discusses the camera used. • Equipment for the experiments. The delay experiments require the following equipment: a fast computer monitor; camera capable of high-speed recording; a tripod. These are discussed in further detail in Section 10.1. • Game engine. A game engined is required to render the environment and evaluate the tracking. This is discussed in Chapter7. • Various software. An Integrated development environment (IDE) is beneficial for developing against the various big frameworks this project uses; a video editing suite is required for measuring the time for the delay experiments; a 3D model program is used for importing models into the game engine. All of these are explained in Chapter8. The following Chapters4 to8 will discuss the equipment needed in further detail. Chapter9 will describe the assembly process for the AR headset.

20 Chapter 4

Augmented Reality (AR) headset

This chapter will in Section 4.1 discuss the headset selected for this project. First a brief overview is given, followed by short explanations of the various hardware components (Section 4.1.1) and software solutions (Section 4.1.2). Finally, some alternative headsets that were not used are discussed in Section 4.2. Why they were not selected for this thesis will also be explained.

4.1 Project North Star Modular AR Headset

This is the AR headset we selected to work on for this project. It is a 3D-

Figure 4.1: North Star AR headset w/calibration stand [20] printed headset made from open source components and readily available hardware for the consumer. Originally designed by Leap Motion, they decided to make the schematics open source and by putting the design into the hands of the hacker community [83], in the hope to accelerate experimentation. It is intended to be used with the Leap Motion Controller hand tracking device [110], but as this was out of scope for this project, we did not include this device.

21 It features two 120 FPS screens at 1600×1400 Px resolution, which is combined by a custom circuit board and presented to a computer as one combined screen. It uses reflectors which are capable of covering 100° FOV. Most of the components are relatively easy to 3D-print and manufacture by anyone with access to the source code on GitHub [55], but some are special, including the two screens, reflectors, Display Driver board and headband size adjuster. Some experience in 3D-printing is advisable if attempting to manufacture the headset from scratch. Furthermore, some of the components are hard to get, such as the screens and reflectors. We, therefore, selected to use a supplier, Smart Prototyping, which offers kits at several levels of manufacture [84]. At the time of writing, they offer the following levels of manufacture, where we selected kit B: • Kit A: no 3D-printed parts • Kit B: all parts included, assembly required • Kit C: assembled and calibrated It includes all the necessary parts and equipment to get an up and running headset, where all the 3D-printed parts are printed and all components included. They also include any tools needed, such as screwdrivers and scalpel. Assembly instructions are also included. For optimal use, a calibration stand is required, which can be seen in figure 4.1, together with the headset. This calibration will compensate for minute differences in the manufacture of the combiner lenses [56]. However, in our experience, this was not necessary for testing. This headset was mostly painless to assemble, but we did have some issues with getting it to run. Please refer to Chapter9 for more details.

4.1.1 Hardware This section will present an overview of the hardware of the headset. Experiences and further details into how to construct the headset can be found in Section 4.1.3.

Screens The headset uses two screens that project onto two combiner lenses, acting as one screen. They were designed by Leap Motion specifically for AR and the North Star headset [85] and are of quite high quality. Please refer to Table 4.1 for full technical specifications.

Display Driver Board The two screens mentioned above are connected to a custom display driver board which is specifically designed for this headset and screens. It acts as a motherboard by integrating a display driver and controller which presents itself to the connected computer as one display [82]. The board

22 Property Value Brand BOE Model No. VS035ZSM-NW0 Displaying Method Active Matrix TFT Display Mode Transmission mode, normally black Screen Size 3.5inch Video Resolution 1440×1600 Px with 615 PPI Frame Rate 120 FPS Active Area 59.4 (H)×66.0 (V) mm Pixel Pitch 13.75 (H)×41.25 (V) um Display Colours 16.7M colours, 8 bit NTSC Ratio 85% Driver IC NT57860 Interface MIPI DSI (Video Mode) Surface Treatment HC, >=3H Display Outline Dimension 62.2×73.6×1.7 mm Display Weight 17.0 g

Table 4.1: BOE VS035ZSM-NW0 technical specifications [85]

Property Value Input Connector Mini DisplayPort Output Connector 2 connectors for Project North Star BOE Display Video Resolution 2880x1600 Frame Rate 120fps (defaults to 90fps) Dimension 127.9 x 27.2mm Weight 10.8g PCB thickness new version is 1.0mm Display Driver IC Analogix ANX7530 MCU Atmel ATSAMD21G18

Table 4.2: Display Driver Board technical specifications [82]

handles power and image connections to the headset through a couple of connections. One is a male USB connector only used for power, so any 5V power supply could, in theory, be used, but has not been tested. The second is a mini-DisplayPort female connection. There are a couple of things to note about this board. First, it is susceptible to power instability which in our experience manifests itself as screen flickering. More on this in Chapter9. Secondly, this board does not work with SteamVR due to the lack of the required Digital Rights Management (DRM)-keys as these are not in general distributed to open source projects [82]. The original intent of these keys is to prevent users from recording copyrighted content, such as films from streaming services. Please refer to Table 4.2 for technical specifications.

23 Property Value Coating stack 50% reflective concave surface Broadband anti-reflective convex surface Oleophobic (anti-fingerprint) coating Weight 30.7 g / lens Substrate material Polycarbonate

Table 4.3: Combiner lenses technical specifications [81]

Combiner lenses

The combiner lenses let the user see the world through them, but also reflects the contents of the screens. Any portion of the screen that is black is not reflected on the lens, allowing the user to see through. This, of course, limits the range of colours available to the user (i.e. imagine how to show a black car on the road, the black would end up being see-through and show as whatever is in the users’ environment). Due to minute differences in the manufacture, the curve of the lenses might differ. These differences require calibration to be compensated for, and Leap Motion provides a way to do this [56]. Please refer to Table 4.3 for technical specifications.

Other components

Apart from the 3D-printed components, which will be elaborated upon in Chapter 4.1.3, there are a couple of components that are included in our Kit B (Section 4.1) that are worthy of note:

• Headband adjuster. This piece of equipment is a welder helmet replacement part [70], allowing the user to adjust the fit of the headset.

• Metal bars. There are two metal bars that run along each side of the headset. These provide the main structural support.

• Various tools. The kit included various tools needed for constructing the headset. More of this in Section 4.1.3.

• Cables. DisplayPort to Mini DisplayPort cable and USB 3.1 extension cable.

4.1.2 Software

There are a couple of pieces of software that are available to the user, which enables the use of the North Star headset. The project has only used the Unity 3D sample scene. Since we have not used the Leap Motion hand tracking module or the calibration stand, this will not be explained in any further detail. Following is a brief explanation on how to use the headset with a game engine.

24 Leap Motion North Star sample scene

Leap Motion provides a Unity 3D package (.unitypackage) that may be imported directly into Unity 3D [58]. This will enable the user to develop towards this headset from scratch with everything set up.

The steps are described as follows in the repo [58], but are replicated here for posterity:

• Make sure your North Star AR Headset is plugged in

• Create a new Project in Unity 2018.2 or above

• Import “LeapAr.unitypackage”

• Click on the ARCameraRig game object and look for the WindowOffsetManager component

• Here, you can adjust the X and Y Shift that should be applied to the Unity Game View for it to appear on the North Stars display

• When you’re satisfied with the placement; press “Move Game View To Headset”

• With the Game View on the Headset, you should be able to preview your experience in play mode!

Due to the limitations introduced by the OSS nature of the display driver board discussed in Section 4.1.1, the plugin requires the coordinates of the top-left pixel of the screen’s placement in the screen arrangement of the computer it is connected to. This is for the plugin to be able to render the game on the headset screens correctly.

As a final note, the plugin is available on Leap Motion’s GitHub [58], but is also forked for posterity and compatibility with this project if necessary [45].

4.1.3 Assembly

Leap Motion provides assembly instructions in the form of blueprint drawings [57]. These have proved to be simple to follow, and required two days in total assembly with connecting it to a machine and Unity. Figure 4.2 shows the headset in its final state with its IMU, camera, and USB hub.

In the kit that was procured, some tools were included: a screwdriver kit and a scalpel. It is worth noting that several of the 3D-printed pieces required adjustments, such as snapping of pieces that remained from the 3D-printing in the form of either supporting structures or extraneous pieces. Most notably in some spots needed to be cut with the scalpel either to fit or to remove said pieces.

25 Figure 4.2: North Star headset final assembly (front view)

Figure 4.3: Microsoft HoloLens 1 [67]

4.2 Alternatives

When this project started, we had to find a suitable headset to develop against. The following sections will briefly discuss some of the headsets we considered, their specifications and why they were not selected.

4.2.1 Microsoft HoloLens 1

The Microsoft HoloLens 1, as seen in Figure 4.3 was marketed as the worlds first fully independent AR headset, released in March 2016. It is a complete solution for AR with tracking and included Software Development Kit (SDK), and features an IMU, several cameras and microphones, ambient light sensor, speakers and more. It also features advanced functions such as gaze tracking, gesture input and voice support [67]. Please refer to Table 4.4 for specifications. It is intended to be a complete solution for AR, where tracking is abstracted away from the developer. This project aims to create its own tracking solution, and it is unlikely that we would be able to access the raw data from its IMUs. Furthermore, it was not available for purchase when this project started.

26 Property Value Resolution 2.3M total light points Refresh rate 60 Hz [69] FOV 30 ° [76] Sensors IMU 4 cameras A depth sensing camera & more Capabilities Gaze tracking Gesture input Voice support

Table 4.4: HoloLens 1 technical specifications [67]

Figure 4.4: Microsoft HoloLens 2 [93]

4.2.2 Microsoft HoloLens 2

The Microsoft HoloLens 2, seen in Figure 4.4 is a generational improve- ment over the HoloLens 1 discussed in Section 4.2.1. It was announced in early 2019, but was not made available generally before early 2020 and tried to refine the experience by being more comfortable and immersive. In addition to improving on the functions of the first headset, this model also adds hand tracking [93]. Again, this solution is intended to be used and is out of scope for this project, and again, was not available for purchase in the planning phase of this project. Table 4.5 contains its specifications.

4.2.3 Meta 2

The Meta 2 headset, seen in Figure 4.5, was a headset developed to be an alternative to the HoloLens 1 (Section 4.2.1), featuring similar capabilities such as hand gesture support and positional tracking through the use of optical and inertial sensors, including SLAM tracking at 400 Hz [68]. Please refer to Table 4.6 for specifications. This was a promising headset until the company went bankrupt in early 2019 [89].

27 Property Value Resolution 2k 3:2 light engines Refresh rate 120 Hz [98] FOV 52 ° [107] Sensors IMU 4 cameras 2 Infrared cameras Time-of-flight depth sensor Capabilities Eye tracking Hand-tracking Voice support Spatial mapping

Table 4.5: HoloLens 2 technical specifications [93]

Figure 4.5: Meta 2 [68]

Property Value Resolution 2560×1440 Px Refresh rate 75 Hz FOV 90 ° Sensors “Optical and inertial sensors for positional tracking” SLAM @ 400 Hz Capabilities Hand gesture tracking Software OS: Windows 10 3D Engine: Unity 3D

Table 4.6: Meta 2 technical specifications [68]

28 Figure 4.6: Magic Leap [66]

Property Value Resolution 1.3M Px per eye Refresh rate 120 Hz FOV 50 ° Capabilities 6DOF controller Hand tracking Eye tracking Software Lumin OS (custom)

Table 4.7: Magic Leap technical specifications [66]

4.2.4 Magic Leap Magic Leap, seen in Figure 4.6, was an ambitious crowd-funded headset which tried to do things a little different, such as by placing the processing power at the users hip and providing options for people who wear glasses [66]. Again, a headset that sadly never got into the hands of users [90], and there have been no reports of its performance. Please refer to Table 4.7 for technical specifications.

4.2.5 Aryzon The Aryzon AR headset, shown in Figure 4.7 is cardboard construction meant to accept any smartphone that has support for AR either through ARCore which is developed by Google for Android phones, or ARKit developed by Apple for iPhones [86]. Table 4.8 shows some specifications. We tested with both an Android and iPhone and deemed surprisingly capable for its relatively low price, albeit somewhat uncomfortable and bulky in extended use. However, over time, the accuracy of this headset will reduce due to its cardboard construction. In the end, the same

29 Figure 4.7: Aryzon AR/MR [74]

Property Value FOV 35 ° Capabilities Supports all phones supporting - ARCore (Google) - ARKit (Apple) Software Aryzon OSS SDK

Table 4.8: Aryzon technical specifications [86]

30 conclusion was drawn for this headset as all the others: tracking was already included through an SDK, defeating the purpose of this project.

31 32 Chapter 5

Inertial Measurement Unit (IMU)

As previously discussed, an Inertial Measurement Unit (IMU) is required for the precise and low-latency (i.e. fast) tracking of the user’s head movement. An IMU is a device that detects movement and rotation through the use of accelerometers and gyroscopes, where a typical configuration is one accelerometer per axis to detect linear acceleration; one gyroscope for each plane to detect rotation; and often a magnetometer per axis to detect heading [52]. If each detection device per axis is considered as one Degrees of Freedom (DOF), an IMU that can detect rotation by having a gyroscope on each of the three axes in 3D-space would be called a 3-DOF IMU. These devices are variable in precision where each axis has its precision, and there are a few known problems, especially considering drift. Drift is where the output of the IMU has suffered from either imprecise detection hardware, poor firmware, or just noise from the environment [78]. The implications of imprecise data will be further elaborated in section TODO. In many off-the-shelf commercially available IMUs, the combined information about movement, rotation and heading is combined into one signal, or data output, which is an estimate of the IMUs orientation [52]. The following sections will describe the IMU selected for this project, where it’s specifications is described in Section 5.1, the tools and API in Section 5.1.2 through 5.1.4, and finally, some alternative IMUs are discussed in Section 5.2.

5.1 Tinkerforge IMU Brick 2.0

When selecting hardware for a project, there are a couple of things one needs to consider: availability, suitability, and complexity. In essence, if one can get the device, if it fits the project, and if it is likely that one can use it. Considering the first aspect, that was easy: we already had a Tinkerforge

33 Figure 5.1: Tinkerforge IMU Brick 2.0 [18] kit, including an IMU in our lab. Secondly, the IMU suits our project well as it was considered precise enough for a proof of concept, although not necessarily the best option for a real, commercial solution. More on this in Section 5.2. Finally, Tinkerforge is known for their excellent documentation and ease of use. Consider how they describe themselves: Tinkerforge is the affordable system of building blocks that will drastically simplify the implementation of your project. The modules are pluggable and they have an intuitive API that is available for many programming languages. Through the high degree of abstraction the actual purpose of the project comes into focus while the technical implementation becomes a side issue. Tinkerforge is open source and Made in Germany. ([109]) The following sections will go into more detail as to why this IMU was applicable.

5.1.1 Specifications The Tinkerforge IMU Brick 2.0, shown in figure 5.1, is equipped with a 3-axis accelerometer, magnetometer (compass) and gyroscope and works as a USB Inertial Measurement Unit (IMU). It can measure 9 Degrees of Freedom (DOF) and computes quaternions, linear acceleration, gravity vector as well as independent heading, roll and pitch angles. It is a complete attitude and heading reference system [108]. Please refer to table 5.1 for its technical specifications. It connects to the computer through USB and Tinkerforge provides APIs in several languages [34] as well as other software solutions, discussed in Section 5.1.2 through 5.1.4. This IMU was chosen for its relative lightness, smallness and accuracy, but not the least due to its documentation. It uses a sensor developed by Bosch [108], BNO055, which states that it is the perfect

34 Property Value Acceleration, Magnetic, Angular Velocity Res. 14 bit, 16 bit, 16 bit Heading, Roll, Pitch Resolution 0.0625° steps Quaternion Resolution 16 bit Sampling Rate 100 Hz Dimensions (W×D×H) 40×40×19 mm Weight 12 g Current Consumption 415 mW (83 mA @ 5 V)

Table 5.1: Tinkerforge IMU Brick 2.0 technical specifications [108] choice for AR, immersive gaming and more [17]. Coincidentally, it is also almost the same size as the camera discussed in Section 6.1, making it very suitable for this project. There are three main tools used in this project: first, the daemon should be installed to establish a connection with the IMU, then the Brick Viewer can be used to test the connection and verify that the device works, and finally use the API to program against the device. The following sections will describe this in more detail.

5.1.2 Tinkerforge Brick Daemon

The Brick Daemon is a daemon (or service on Windows) that acts as a bridge between the Bricks/Bricklets and the API bindings for the different programming languages. The daemon routes data between the USB connections and TCP/IP sockets. When using the API bindings, a TCP/IP connection to the daemon is established. This concept allows the creation of bindings for almost every language without any dependencies. Therefore it is possible to program Bricks and Bricklets from embedded devices that only support specific languages, such as smartphones. ([32]) This daemon is easy to install on whatever OS that is used for development and is the first step that should be made.

5.1.3 Tinkerforge Brick Viewer

The Brick Viewer provides a graphical interface for testing Bricks and Bricklets. Each device has its own view that shows the main features and allows to control them. Additionally, brickv can be used to calibrate the analogue-to-digital converter (ADC) of the Bricks to improve measurement quality and to flash Brick firmware and Bricklet plugins. ([33])

35 The Brick viewer is not necessary for the IMU to work, but it is a nice tool to have to test the device.

5.1.4 Tinkerforge API Tinkerforge support many programming languages [36], including C/C++, C#, Java, and Python. It is highly recommended to look through the example code to get familiar with the API and run the comparative “Hello World.“ This project has used the C and C# interface. Installation is as simple as downloading the source code and including the files as necessary in the applicable project [34]. One should particularity take note of the callback functions, as they have much better performance: Using callbacks for recurring events is always preferred compared to using getters. It will use less USB bandwidth and the latency will be a lot better, since there is no round trip time. ([34]) What follows is brief introduction into the main aspects of using the Tinkerforge API.

Overview There are a couple of files that are in common between our usage of the C and C# bindings: • ip_connection.h or IPConnection.cs • brick_imu_v2 or BrickIMUV2.cs These are the necessary files to establish the connection between any Tinkerforge brick or bricklet that uses an IP connection, and the specific features and methods available for the IMU. Please refer to the full docu- mentation available at www.tinkerforge.com for a complete introduction.

Configuration and setup When using the API, there are a couple of things one needs to configure. The following code snippets are using the C# bindings. private static string HOST = "localhost"; private static string UID = "64tUkb"; private static int PORT = 4223; private static IPConnection ipcon; private static BrickIMUV2 imu; First, one needs to set the parameters for IP address and port number of the device that the IMU is connected to, as well as the unique identifier. This is to enable multiple devices connected to the same machine. The

36 Property Value Resolution <0.005°√ steps Noise 0.001°/ Hz Sampling Rate 1 Hz - 2000 Hz Dimensions (W×D×H) 42×55×25 mm Current Consumption 400 mW (80 mA @ 5 V)

Table 5.2: CTiSensors CS-IM200 IMU technical specifications [26] default values for the IMU are shown above. Having the IPConnection and BrickIMUV2 variables available to the class is necessarily for later calls in other methods. When one wants to establish a connection to the IMU, do as follows. ipcon = new IPConnection(); imu = new BrickIMUV2(UID, ipcon); ipcon.Connect(HOST, PORT); imu.QuaternionCallback += ; imu.SetQuaternionPeriod();

The first order of business is connecting the IMU with the included methods from Tinkerforge. Once that is done, since we are using callback methods for its performance, as mention earlier, we need to register which method that will handle the callback [35]. Finally, setting the period, i.e. how many times a second the IMU should report new values, is possible (in milliseconds).

5.2 Alternatives

Even though we had an IMU readily available, some research was spent into alternative IMUs. A brief discussion about the various IMUs considered follows.

CTiSensors CS-IM200 IMU

The CTiSensors CS-IM200 IMU is a highly industrial IMU, with a solid enclosure, water-resistant interfacing and high specifications [26]. And a non-disclosed price. Please refer to Table 5.2 for technical specifications. One could say that for this project, it is slightly too much.

Raspbery Pi Sense HAT

The Sense HAT is an add-on for the Raspberry Pi and features not only an IMU, but a joystick and an 8×8 grid of Red Green Blue (RGB) Light Emitting Diodes (LED)s [19]. This module is intended to be placed on a Raspberry

37 Property Value Sampling Rate 250 Hz Weight 3.5 g Dimensions (W×D×H) 21×23×5 mm Current Consumption 64 mA @ 5 V

Table 5.3: Osmium MIMU22BL IMU technical specifications [75]

Pi, which is not very applicable for natural reasons (wearing a Raspberry Pi is less than ideal).

Osmium MIMU22BL The Osmium MIMU22BL is of the same size as the Tinkerforge IMU and features an optional battery. It is listed as having a four nine-axis IMU array (accelerometers, gyroscopes, magnetometers), wireless communication and a barometer. Refer to Table 5.3 for specifications. It also one of the thinnest packages out there [71] However, this seems to be more oriented towards step-counters, even if the data output would be the same, using their API. To summarise, it was very comparable to the Tinkerforge IMU, but not so much as it was necessary to replace it.

SparkFun 9DoF Razor IMU The SparkFun 9DoF Razor IMU seems to be a very competent, and comparable to the performance to the Tinkerforge IMU [75], and it is very well documented [3]. Most notably, the accelerometer and gyroscope may be polled at 1000 Hz, with the magnetometer at 100 Hz. However, this says nothing about its precision. It also uses the same processor as an Zero, which could prove beneficial for tuning the firmware to the needs of a specific project. However, all this is a bit out of scope, and it seems more for hardware projects. In conclusion, this IMU, or any of the others, did not provide anything radically different from what the IMU from Tinkerforge offered, and none could beat its ease of use. Furthermore, as is furthered elaborated in section TODO, is the fact the Tinkerforge provides a C# API which is a profound benefit for developing against the Unity Engine.

38 Chapter 6

Camera

This chapter will discuss the cameras considered for this project in Section 6.1, one USB camera and one Power Over Ethernet (POE) camera.

6.1 Camera

As previously discussed, an IMU is good at detecting the movements of a users head quickly but lacks spatial awareness. In essence, there is quite a lot of drift and noise in the input from the IMU, which needs to be compensated for. This is where the camera is needed. Recording frames and using this for spatial awareness is a slower process, but highly valuable. Sadly, due to time constraints and other factors, this topic was not explored as much as we initially hoped for. Nevertheless, this section will briefly discuss the two different cameras considered.

6.1.1 Universal Serial Bus (USB) Camera

The camera selected for this project was a simple USB camera bought online [7], which you can see in figure 6.1. The main goal was to find a camera with a USB 3.0 interface, allowing for higher bandwidth and FPS than USB 2.0. In our experience, most commodity USB cameras found online are able to give 50 FPS at 1920×1080 Px of uncompressed video (which is beneficial due to latency concerns) over USB 3.0 compared to 30 FPS over USB 2.0. The maximum resolution of the Sony IMX291 chip in this device is 1920×1080 Px @ 120 FPS [51].

Another reason for going with USB is the fact that there are two other devices on the headset requiring USB already, and adding this to the USB- hub placed in the headset was trivial. However, as it turns out, there might be bandwidth issues with this solution, which is further discussed in Section 4.1.3

39 Figure 6.1: Universal Serial Bus (USB) Camera

Figure 6.2: Basler Power Over Ethernet (POE) Camera [5]

Property Value Shutter Global Shutter Resolution 2046×1086 Px Frame Rate 50 FPS Housing Size 42×29×29 mm Weight 90 g

Table 6.1: Basler Ace acA2000-50gc technical specifications [5]

40 6.1.2 Power over Ethernet (POE) Camera In our lab, we had some Power Over Ethernet (POE) cameras available, of type Basler Ace acA2000-50gc. Compared to the USB camera, this is a much better option technically due to its global shutter. A global shutter means that all pixels in a frame are captured simultaneously compared to a rolling shutter where pixels are captured over time. This often results in skewed images if the camera is in motion, or skewed objects of the object in the frame is in motion. The global shutter eliminates this issue, although motion blur is still, of course, a factor. Please refer to Table 6.1 for technical details. Basler also provides an easy to use API that would have potentially been beneficial. However, due to the size and weight of the camera, having the camera placed on the users head turned out to be heavy and awkward. A later consideration, due to the potential power or bandwidth restrictions, discussed in Section 4.1.3 in the USB-hub used on our headset, having the camera transfer data and get its power over a separate cable might have been beneficial.

41 42 Chapter 7

Game Engines

There are currently two game engines that dominate the world of game development and that are widely used for other commercial and non- commercial applications. These are Unity and Unreal. Both are actively commercially developed, but provide free licenses that make their non- commercial use highly attractive. They provide a better starting point for research than any other game engine that is also freely available.

This chapter will discuss Unity in Section 7.1 by giving a short introduction into key elements of the engine used, such as its editor and how to create custom code. Finally, an Unreal is briefly introduced in Section 7.2.

7.1 Unity 3D

Unity 3D is the game engine selected for this project. Initially, this was due to the impression that Unity 3D was the more supported engine for AR/VR frameworks that were looked into during the planning phase. Equally important, due to the lack of experience with game development, Unity 3D seemed to be the easiest to dive into as a new game developer. This was gathered from various online articles, forums, and anecdotes from fellow students. Further discussion on the similarities and differences from Unity 3D to the other game engine considered may be found in Section 7.2. Finally, and critically, the headset used (North Star 4.1), provides a ready package to be imported directly into Unity 3D, drastically simplifying development. More on this can be found in Section 4.1.2.

During development, version 2019.4 Long Term Support (LTS) [103] was found compatible with all our development, and is recommended to ensure that any code from this project is compatible. That being said, there should not be any version-specific code in the code of this project, so upgrading to the latest version should not be an issue.

What follows is some guides, steps and information deemed pertinent for replicating this project, using Unity 3D.

43 Figure 7.1: Unity 3D interface overview

7.1.1 Overview

This section will attempt to give the reader a quick overview into Unity 3D. Please refer to Figure 7.1 when reading the following sections. For a complete introduction, please do not rely on this section, but instead go online and find the latest guide on www.unity.com

Project

The full project structure can be found on the bottom square in the image. This is where all Assets are listed. Assets are any component of a Unity 3D project, be it code, 3D-models or anything else. Any imported packages and Scenes (explained below) used may be found here. Use this to navigate the file structure of the project.

Hierarchy

A Unity Scene can be imagined as a game level, where all its components, GameObjects, are listed in the square to the left. Each component may inherit from another, which in effect means that if one GameObject is placed beneath another, actions applied to the upper object will also affect the object below. By observing figure 7.1, the object ARCameraRig may be seen. When this object rotates, all objects below it in the hierarchy will also rotate the same way as if they are the same. Use this to structure the Scene.

This inheritance is created by, for example, dragging a GameObject or script onto another GameObject.

44 Inspector The Inspector panel, found to the right in the figure, will manipulate the GameObject selected in the hierarchy. What is adjusted here depends on the object in question. As an example, if the object is a simple cube, its location, size and scale, as well as its material (and other aspects that are for now, out of the scope of this project) may be modified. Use this to modify each object to the needs of the game. Here any scripts that are currently attached to the GameObject that is currently inspecting may be found. Any public parameters may be changed or scripts disabled/enabled. An example of this is the WindowOffsetManager discussed in Section 4.1.2: this would be where these parameters are modified, and can be seen in the figure. Here is also where an object is given a Tag. The importance of this in this project are two things: • MainCamera. This tag will tell Unity that this is the main camera when the game is launched. • Custom tags. Here new tags may be created by pressing Add. When this object needs to be referenced in a script, the method GameObject.FindWithTag("") may be used. More on this later.

Scene The view in the middle, the one with a car in it, show a graphical representation of the current Scene. Any object selected in the hierarchy will be shown selected in the Scene view, and vice versa. The controls found at the very top either move or rotate objects, or the current viewpoint. A thing to note here is that changing the viewpoint in the Scene will not change the viewpoint in-game. Use this to have a visual idea of how to build the Scene. In the figure, this view has been used to place the ARCameraRig in the front seat of the car.

Playback controls At the very top, the playback controls exist. Use this to run the game inside the editor without compiling/building. Objects may be modified directly when the game is running, be it rotating, moving them or changing any parameters with these controls. It is also possible to pause the game while doing this. Please note, performance may be less than expected when running in the editor.

7.1.2 Scripting Unity 3D calls all its custom code for “scripts”, but what they really mean is code written in C#. To edit a script, the recommended way to do this is to open a script from Unity (e.g. by double-clicking a script) which opens the editor set in Unity settings, which is Visual Studio by default.

45 This is recommended for the uninitiated as the whole project is loaded into Visual Studio, giving the programmer access to all documentation. This also connects error messages in Unity and Visual Studio, for easier debugging. Finally, when returning to Unity, the editor will notice changes in any files and automatically attempt to compile the entire project, making the scripts ready to use. This update process includes all code, including external frameworks, if one wishes to do so. This project has included files from the Tinkerforge API by merely placing the files in the Assets folder. Another thing to note, Unity also supports using pre-compiled programs, but this has not been used by this project. It could, however, be of relevance if, e.g. a proprietary driver is necessary for future work.

What follows is the basic structure of a Unity script, where its components will be explained after. using UnityEngine; using ExampleNameSpace; public class ExampleClass : MonoBehaviour { public int example;

void ExampleMethod(int example) { ... }

void Update() { ... }

void Start() { ... }

void OnApplicationExit() { ... }

void OnGUI() { ... } }

46 MonoBehaviour

MonoBehaviour is the base class from which every Unity script derives. This tells the engine and IDE what methods are available to use. Take also note of the using <...> calls. Here namespaces are added (think of it as scopes/imports in C#). UnityEngine is required here, and an example for this project would be to use Tinkerforge with the respective code. More on this later.

ExampleMethod()

An example of a normal method that can be created in the standard C# way. Another thing to note is the example variable of the class. Since it is public, this variable will be possible to change in the Inspector tab in the editor. If it is anything else, such as private, it will not be shown in the editor.

Start()

This is a special Unity method. When any GameObjects with this script attached to it is initiated, i.e. loaded in a scene, this method is called. Imagine this as a sort of constructor, and this project uses it to establish contact with IMU, saving objects that need to referenced to later, and so on. In contrast, OnApplicationExit() is called when the game is quit, which is useful for gracefully terminating connections to, say, the IMU.

Update()

The most crucial method here is this one. It is called once every frame is drawn. Anything that is done here will delay the frame until every instance of this method is complete. Please take note that it is not known when this method is called, and if you need anything done based on time, there are methods for this, such as Time.deltaTime. In our project, this is where the rotation of the users’ camera is calculated, as an example.

OnGUI()

A practical method to take note of, as it will present anything statically on the screen in-game. Useful for printing metrics or having other static elements. Used in this project to debug.

7.2 Unreal Engine

Unreal Engine is another option that is worth considering. Unreal Engine offers much of the same as Unity, and both are free to use [113]. Unreal is developed in C++, and it is possible to access the engine’s full source code [114]. One key difference between the engines is their respective

47 programming paradigm. While Unreal is something used as part of a program, Unity is a program that may be extended by a programmers contributions. According to anecdotal information gathered online and from fellow students, the general impression is that while both of the engines offer most of the same things, Unity is the one that is easier to get into While when mastering Unreal, a much more graphically impressive result may be produced. As discussed in Section 4.1, the North Star headset only has support for Unity. A lucky coincidence, as the headset was procured quite late in the project’s timeline. However, if one wants to replicate this project without the North Star headset, the fact that Tinkerforge offers an API in C++, and that OpenCV is written in C++, makes Unreal worth considering, particularly if you have experience with it before. As of now, even if the hand tracking module by Leap Motion has not been used, the plugin for Unity offers several benefits if one does use the North Star headset, such as calibration for the reflectors. Therefore, in the end, Unity is the more applicable game engine of the two.

48 Chapter 8

Other Software

8.1 Visual Studio

Visual Studio 2019 is the Integrated development environment (IDE) usually installed alongside a Unity 3D 2019.x installation, and is used as the default editor for all their scripts (see Section 7.1 for an explanation). When a script is opened in the editor, it will launch Visual Studio, which will take care of all linting and IntelliSense (Microsoft’s code completion).

This IDE is highly recommended, as it has helped this project along by suggesting code that is core to the Unity game engine. However, the user may want to substitute this IDE with another, where VS Code is an option that seems to have a large user-base. Changing the default editor can be done in Unity settings.

8.2 Kdenlive

Kdenlive is an acronym for KDE Non-Linear Video Editor. It is primarily aimed at the GNU/ platform but also works on BSD and MacOS. It is currently being ported to Windows as a GSOC project. ([4])

This editor was used during the delay experiments, where the number of frames between the start and end of an action taking place is needed to calculate time. Please refer to Section TODO for more detailed information.

Kdenlive was selected due to its simpleness, availability and cross-platform compatibility. However, any other similar editor will, of course, be sufficient.

49 8.3 Blender

Blender is 3D-model constructor and renderer that is natively supported by Unity 3D. This project has used it to a very limiting degree for importing 3D models downloaded from the internet into Unity for testing. To import a model from Blender into Unity, place a .blend file into the Assets folder of the Unity project [111]. Please note that all colours and textures need to placed onto the model manually afterwards.

50 Chapter 9

Integration

This chapter will describe the integration process of the components added to the AR headset. As explained in Section 4.1.1, the headset comes with a Display Driver Board requiring power over USB and image over DisplayPort. When adding two more components to the headset, running two more cables could become cumbersome for the user. Furthermore, the added hardware should be placed on the headset in such a way that they serve a purpose while being attached securely and convenient to the user. The following sections will illustrate this process.

9.1 Mount

As seen in Figure 4.1, there is a space on the top where the Leap Motion hand tracking module is intended to be placed. As this module was out of scope for this project, it was used as a location for our attachments. To avoid modifying the headset directly, a mounting bracket/adapter was manufactured. This was beneficial for two reasons: compatibility with the hand tracking module down the line; and costs. If the headset were damaged, it would be a major set-back.

Figure 9.1: North Star headset attachments

51 Figure 9.2: North Star headset (rear view)

As the area designed to accept the hand tracking module was free, and holes for screws had already been incorporated into the 3D-print, it was natural to mount our equipment here as well. The result may be observed in Figure 9.1, where the adapter may be seen as a flat piece of plastic between the headset and the attached IMU and camera. The plastic may come from anything, but in this case, it was the lid of a food container. The lid was cut by hand with scissors, and its size measured by eye. Finally, the adapter was attached to the headset with included screws that were meant to hold the hand tracker. This was at first intended to be a temporary solution while a bracket would be designed and 3D-printed. However, it turned out to be such a robust solution that the time required for manufacturing a custom 3D-print was deemed not essential and therefore dropped. The adapter turned out to be simple, of low cost and more than stable enough for this project.

9.2 USB hub

As previously mentioned, to run the headset alone, one USB extension for powering the Display Driver Board (DDB) is needed, as well as one DisplayPort cable for image. If we wish to add the IMU and camera to this setup, there would be four cables running from the headset. This was cumbersome in testing, both in weight and in freedom for the user to move. Therefore, a USB hub was purchased and attached with strips to the back of the mounting bracket mentioned in Section 9.1. It is a USB 3.0 hub with four ports and a theoretical transfer speed of 5 GB/s [80]. It needed to be USB 3 to be compatible with the camera selected, described in Section 6.1. From this, the provided USB extension cable was connected to the output connector, and the Display Driver Board, IMU, and Camera were connected to the hub. The result may be observed in Figure 9.2. This figure also shows how the

52 Device Power consumption Bandwidth requirements IMU 83mA at 5V [108] Unknown, assumed low Camera Unknown, assumed low 208 MB/s [72] DDB Unknown, assumed high None

Table 9.1: Power and bandwidth requirements headset attachments

IMU, camera and DDB is connected. In use, the headset felt no different as without the attached equipment, and there are still only two cables running from the headset. This was in theory a good idea, and with the facts known to this project and our assumptions compiled into Table 9.1, it was felt within reason. However, due to the sensitiveness of the DDB, previously mentioned in Section 4.1.1, it seems that when connecting the camera, the DDB glitches out. This manifests as flickering. Without the camera, and only the IMU connected, the board operates fine. If this is due to all three devices connected pulls more than the 900 mA maximum power draw of USB 3.0 [112] or by the USB hub used failing to deliver clean, or enough power is unknown and is a potential for future work.

9.3 IMU

With the mounting bracket in place, it proved to be easy to install the IMU onto the headset. The Tinkerforge kit included mounting hardware, and this was used to attach the IMU. Four holes were drilled into the mounting bracket, where the spacers were attached with screws to the bracket, and the IMU attached to the spacers with similar screws. The spacers are needed to accommodate for the height of the IMU. Observe Figure 9.1 for the result. The IMU connects with mini USB, and a short cable was purchased to connect the IMU to the USB hub. The resulting angle of the IMU on the headset should not be of any consequence, since the rotation tracking is based upon finding the rotation from one position to another. In other words, the rotations are relative, and not based upon the absolute orientation of the IMU.

9.4 Camera

The USB camera was attached to the headset in a similar way to the IMU, and may be observed in Figure 9.1. Due to the angle of the bracket in the headset intended for the hand tracking module, and the USB-header on the camera motherboard, the camera was mounted up-side-down. This should not be an issue, as the image may be flipped in software later, but has not been tested due to the time constraints of this project. Similarity, the exact angle of the camera, which provides optimal tracking, has not been

53 determined. Therefore it was attached to the headset with strips rather than screws. This to maintain room for adjustment. The USB header may also be removed, which may facilitate a custom cable which is shorter and has a smaller footprint. This might be beneficial in future work for adjusting the cameras angle. Currently, the provided cable, which is quite long, is bunched up next to the USB hub it is connected to.

9.5 Cables

The cables used are quite important and was a source of trouble for this project. The first computer used with the headset was an Intel NUC which only had mini DisplayPort connections, the same as on the DDB. Finding a compatible cable turned out to be difficult, and in the end, a high-end DisplayPort 1.4 cable was purchased, to ensure the high refresh rate, high- resolution screens will run natively. This standard supports 4K (4096×2160) @ 120 FPS [2], which is enough for the screens in the headset which are at 2880×1600 @ 120 FPS. When using older cables, neither the native resolution nor refresh-rate was able to be set, due to bandwidth constraints of the older standards. When running the headset on a regular desktop machine, the included mini DisplayPort to DisplayPort cable works well. The second issue is related to the USB cables or USB hub. As mentioned in Section 9.2, connecting and using a USB hub was beneficial for the headsets usability but turned out to be a source of troubles. The headset included a USB extension cable that claims to be USB 3.0 but has not been confirmed. One should ensure that any extension cables used should be USB 3.0 or newer for two reasons: 1. Power delivery. USB 3.0 is able to power devices with up to 900 mA maximum power draw [112], while USB 2.0 is only able to give 500 mA [115]. 2. Bandwidth. USB 3.0 is able to provide the necessary bandwidth of 312 MB7s for 1920×1080 @ 50 FPS [72][112], while USB 2.0 is only able to transfer data at 60 MB/s [37].

9.6 Room for improvement

Considering the issues elaborated upon in the previous sections, there is room for improvement. Here are some suggested avenues worth exploring should future work include the same equipment: • Troubleshoot USB issues. The first step is to make sure that everything runs on the headset reliably. For this, one could explore changing the USB extension cable to one of higher quality. In the event this is not enough, one could change the USB hub, possible to a powered one with its own power supply at the cost of an extra cable. A final option is to run the camera on its own cable, as the power and

54 bandwidth needs of the IMU and DDB, seem to work well with the existing USB hub. • Camera orientation and mounting. Due to the limited amount of research gone into using the camera with this headset, there are probably adjustments that need to be made to the mounting of the camera. One should consider changing the cable it uses, maybe by shortening it, and adapting the connector so that it may sit more flush with the device in the correct orientation so that the image does not have to be flipped in software. • Mounting. The current mounting is primitive but effective. It is, however, a bit difficult to attach or remove the attachments currently. A 3D-printed housing would not only be practical and more stable but would look better.

55 56 Part III

Development

57

Chapter 10

Latency

Considering the latency aspect of this project, there is significant value in finding the delay between physically moving the IMU and a program registering this movement, i.e. the delay between IMU and machine. This latency may be an issue for users, and as such, there is value in measuring this delay and possibly finding a limit to what a user may tolerate.

As we have established in previous sections, the IMU has a reporting rate of 100 Hz, and the North Star headset has screens that run at 120 Hz. If we have a latency of less than 8.3 ms, there would be no delay observable by the user as this is as fast as the screen can refresh. This would be the best-case scenario. However, some delay is to be expected, and these experiments will attempt to quantify the amount. The following sections will describe the process. First, Section 10.1 will introduce the equipment needed to conduct these experiments. Second, Section 10.2 will estimate the time between IMU and a straightforward program written in C. Finally, Section 10.3 will do the same within a simple Unity game. From this, the delay introduced from the Unity engine may be estimated.

10.1 Equipment

To be able to conduct and replicate this experiment, a camera capable of high-speed recording is needed, as well as a high refresh rate screen, and finally suitable equipment for filming. The following sections will describe in detail which equipment we used.

10.1.1 Huawei P30 Pro

For the delay experiments, a high-FPS camera is needed. A Huawei P30 Pro was used, simply because it was at hand, and due to it having a camera capable of shooting at 960 FPS in short bursts [49]. Any other camera capable of shooting at high FPS would be sufficient to replicate the experiments as long as one accounts for any difference in FPS.

59 10.1.2 Asus VG24QE

For the delay experiments, a high-refresh-rate screen is needed. An Asus VG24QE was selected, as it was at hand and is capable of refreshing at 144 Hz with a 1 ms pixel Gray-To-Gray (GTG) response time [13].

Since the screen is refreshing at a faster rate than the IMU at 100 Hz (refer to Section 5.1.1 for more details) one can mostly eliminate any delay as a result of waiting for the screen to refresh. Furthermore, due to its low GTG response time, anyone pixel will take at most 1 ms to change from one colour to the other. Many modern screens at the time of writing have a refresh rate of 60 Hz and a GTG response time of about 4 ms, such as this screen selected at random: Samsung S24F354 [92], and there has been measured up to 58 ms on some laptops [88].

10.1.3 Tripod

A tripod of type EXELAS Model LA-23B was used during the delay experiments for filming, which was in my possession. Any tripod or similar setup that allows the setup to be filmed would be sufficient for replication.

10.1.4 Phone holder

Due to the use of a mobile phone for filming the experiments, some device that allows the phone to stand securely on the tripod is required, where this projected opted for purchasing a phone holder that mounts onto the standard camera fastener of the tripod [62] of brand Linocell.

10.2 USB stack

To test the latency between IMU and program, we set up an experiment with a fast screen and a simple program, where the first registered new data from the IMU would change an element on-screen. When recording this with a high framerate camera, we can observe the number of frames between the first frame of IMU being moved and the first frame of change on-screen. First, Section 10.2.1 will describe how to set up the experiment, Section 10.2.2 will briefly explain the software developed, Section 10.2.3 will go through how to conduct the experiment, and finally Section 10.2.4 will present the results.

10.2.1 Setup

Using the aforementioned tripod and phone holder, the camera was placed in front of the IMU and screen, using an old computer to bring the IMU closer to the top of the display. Please refer to Figure 10.1 to see how it was set up.

60 Property Value IMU Tinkerforge IMU Brick 2.0 Screen Asus VG24QE Recorder Huawei P30 Pro Computer Intel NUC Intel Core i7 16 GB RAM OS Ubuntu 18.04 LTS Other Tripod Phone holder

Table 10.1: Equipment overview

Figure 10.1: USB latency test, an overview

61 Using the Huawei P30 Pro as a camera, the default camera app allows the user to set it to record immediately, which proved difficult to time correctly, or set an area in the viewfinder that the camera will monitor for movement. When the camera detects movement in that frame, it will store its rolling buffer and create a clip. This configuration proved easy to employ by mounting a contrasting piece of tape on the IMU, resulting clips that proved very useful. Figure 10.2 shows this as a green tape, and the resulting clips may be observed by referring to Appendix A.4.

10.2.2 Software A small piece of software was written in C to keep it as small as possible. It uses the Tinkerforge API, which can be read more about in Section 5.1.4, and the source code may be found in Appendix A.1 in the file test_framebuffer.c. What follows is a short brief about the structure of the code.

Program structure The main principle of the program is to edit the framebuffer directly, bypassing the need to understand and develop any graphical interface, and most importantly, bypassing any delay originating from the X server (or Wayland, an alternative to X). This, however, creates a permissions issue. Per default, only users in the group video (or root) may access the framebuffer directly by writing to /dev/fb0. Therefore, anyone wanting to run the program should add their user to video, as running the program as root could potentially break the OS. Instructions for how to do this is found in the code or presented to the user if the required permissions are not in place. The programs main method does a couple of tests that ensure the correct permissions, then generates some files. Finally, it establishes a connection with the IMU and performs the experiment. The files generated are simple files that represent zero and one binary values which are interpretable by the framebuffer. Say, if this command is run: $ dd /dev/zero > /dev/fb0 the screen would be completely blacked out, and the command would error out with a file out of memory error. This program only generates a short bar on the screen, as this is sufficient for testing. If the screen fills with a random colour pattern, everything is set. Finally, the program establishes a connection with the IMU and registers a callback method. This callback method accepts the quaternion values from the IMU and does one simple thing: if the value of the w-part of a quaternion is different from last, meaning the IMU has moved, it checks what colour is currently on-screen and flips it and stores the value. From this, the latency between the IMU being moved, and the program registering it may be observed. A note about the callback method: as described in Section 5.1.4, it is possible to select the rate of callbacks. As established in Section 5.1.1, the

62 Figure 10.2: USB latency test, detail

IMU has a refresh rate of 100 Hz, or in other terms: every 10 ms there is a new value. As it turns out, the Tinkerforge API allows us to set the callback to 1 ms, which is an effective polling rate of 1000 Hz. This is beneficial as it increases the precision of the measurements from the fact that we get the new value from the IMU as fast as possible, due to waiting at the most for 1 ms for a new value to be registered. With a polling rate of 10 ms, the worst case would be that the callback method return at T = 0ms, and at the T = 1ms the IMU has registered a movement, but the callback will not return before at T = 10ms, resulting in a 9 ms delay in that measurement.

10.2.3 Conducting the experiment

First, ensure that the program runs in a UNIX based environment as it is required. The program has been tested on Ubuntu and Arch Linux and should not require any special dependencies. Secondly, compile it with $ make and make sure that the IMU is connected and powered. For more information about the Tinkerforge IMU and how to connect it, refer to Section 5.1.4. Third, the program will check if it is in a TTY and quit otherwise. This is to ensure no interference from a display server, such as X or Wayland, as we have no control over what and when these will update the framebuffer. Furthermore, to the extent of our knowledge, there exists no easy way to directly manipulate the framebuffer in Windows, which is why a UNIX environment was selected. Most UNIX based OS run a couple of TTYs, where usually the log-in manager lives in the first, and the desktop environment either in the second or seventh TTY, depending on the distribution. To switch to another TTY, use the following keyboard command: ctrl + alt + F where is a corresponding number on your keyboard. TTY 1 lives therefore behind F1. Once that is done, you may run the program with $ ./test_fb. There it will perform the tests mentioned above, and finally, start showing the white/black alternating line on top of the screen in response to registered IMU movement.

63 To conduct the experiment, the first test that the program works by moving the IMU and observing that the bar flashes. Then put the IMU on a stationary surface and make sure the bar does not change. Depending on the setup, make sure that the high framerate camera will capture the full event of IMU starting to move until the bar changes colour. With the setup complete, start the recording, and do a scientific poke of the IMU. This was done by pushing the cap the IMU rests on sideways, which creates a clear image of when the IMU first moved on video. Also, as we are only interested in when the first movement is registered, the distance and speed of motion were not controlled. This motion was not rotational, but since we were listening to quaternions, this could have been a source of error and is a suggestion for further investigation. Now the video may be used to count frames, which is explained below.

10.2.4 Results With the setup and testing program set up, the resulting clips must be interpreted. By loading them in a video editor such as Kdenlive, explained in Section 8.2, one can traverse the video frame-by-frame. For this, ensure that the editor is running in the same profile as the output clip from the recording device. In our case, the resulting clips run at 30 FPS. If there is a mismatch between profile and clip, stepping through frames would be mismatched, resulting in incorrect calculations. Now, after loading the clips into the program, take note of the first frame where the IMU moves. The usual way of noting frames is in this convention: :. 00:00 would, therefore, signify the first frame of the video, 00:30 would signify the 30th and the last frame of the 1st second of video in our 30 fps example. Another example: 02:03 would be the 3rd frame of the 2nd second of film. In seconds, frame 01:15 would, therefore, be one and a half-second into the video. The results from the clips can be found in Table 10.2. The first two columns show the start and end frame in the annotation mentioned above, with the number of frames between them in the third column. The final columns shows the calculated delay, which is done after this formula:

Td = delay(ms)

f# = # frames (output clip)

Fs = framerate high speed recording

f# Td = ∗ 1000 Fs

As an example, where a delay of 0.5 seconds in real time is measured:

f# = 480frames = 16s ∗ 30fps

Fs = 960fps high speed 480 T = ∗ 1000 = 500ms d 960

64 Run # Start (s:f) End (s:f) Frames # Delay (ms) 01 03:01 04:16 47 48.96 02 02:15 03:18 63 65.62 03 03:03 04:09 42 43.75 04 03:14 04:12 56 58.33 05 03:26 05:02 88 91.67 06 05:02 06:24 56 58.33 07 02:07 04:04 71 73.96 08 05:13 06:19 62 64.58 09 01:01 02:28 59 61.46 10 01:25 03:02 87 90.62 avg 63.1 65.73 min 42 43.75 max 88 91.67

Table 10.2: Delay: USB-stack

Or in other words, every 32 seconds of output film represents 1 second in real time. A simple script is provided in Appendix A.1: calc_times_from_frames.py which does this for you. It expects a file of structure: xx:xx xx:xx where the last line represents the second:frame of the first and last frame of the clip. It will print the delay and average, or generate LATEXcode for tables or graphs. Figure 10.3 shows a visual representation of the data, with uncertainty and average plotted. We have 2 ms of uncertainty from the 1 ms GTG Px response time of the screen added to the 1000 Hz polling rate of the IMU. The average measurement of 65.73 ms is shown as the horizontal bar. At first, it might seem that the data is quite noisy, which, to some degree, it is. However, please consider that this experiment attempts to capture the real-world delay in a typical OS with a typical workflow. Therefore a standard Ubuntu install was used, albeit a clean one without any other running programs at the time of the experiment. The reasoning behind this is that we want to see how CPU scheduling and normal operation might affect the delay of capturing data over the USB-stack.

10.3 Game engine

Now that we established the estimated delay between IMU and OS, finding the amount of delay the Unity 3D game engine introduces to the pipeline is the next step. This is done similarly as the USB experiment in Section 10.2, where a camera records the movement of the IMU and

65 90

80

70

60 Delay (ms)

50

40 1 2 3 4 5 6 7 8 9 10 Run #

Figure 10.3: Delay measurements USB stack

Figure 10.4: Unity 3D latency test, detail

an object on-screen, in-game, is updated. What follows is a report of the setup for this experiment in Section 10.3.1, following by a brief explanation of the software developed in Section 10.3.2, Section 10.3.3 briefly explains how to conduct the experiments, and finally a discussion of the results in Section 10.3.4.

10.3.1 Setup

This experiment uses the same setup reported in Section 10.2.1 with slight modifications. This time the whole screen is placed in the frame so that some statistics from Unity may be shown, as well as the fact that it runs in full screen. Please refer to Figure 10.4 for a reference frame from the resulting clips. A yellow box is shown, which means the program is ready and reset. When the game registers movement from the IMU, the box will turn green.

66 10.3.2 Software Similarity to the USB stack experiment, some software had to be developed which binds the Tinkerforge API to Unity 3D. For a brief introduction into how Unity works, please refer to Section 7.1. In the Unity package, which may be found in Appendix A.4, there is a Scene called “box” that shows the setup of the experiment. Attached to the Camera object are a couple of scripts that are related to the experiment: • Snappy Tracker.cs The script that contains the code for the experi- ment. • FPS.cs A script that show the current refresh rate of the running game. • Quit.cs A script that allows the user to quit the experiment by hitting the esc key. What follows is a short description of the Snappy Tracker.cs script.

Program structure For an introduction to the basic structure of a Unity script, please refer to Section 7.1. The Start() method finds the box in the scene with its tag and stores it for later reference. The box is then coloured red to show that the script has been called. Then it establishes a connection with the IMU, and if that fails, the box will remain red. A callback method is set up with a polling rate of 1000 Hz as before with the method QuaternionCB() which stores the latest quaternions in a Quaternion variable independent of the game engine. When the connection has been successfully established, the box will turn green as the IMU is moved. This change is handled by the Update() method which, similarity to the USB experiments, checks if a value of the quaternion is the same as before, and if not changes the colour of the box and stores the value. It also monitors for the input of the spacebar key which will reset the view and colour of the box.

10.3.3 Conducting the experiment With everying ready, set up the camera and screen in a similar manner as described before, connect the IMU and launch the game. The scene may either be imported from the Unity package or built with theFile -> Build & Run menu command, or use one of the pre-built executables available for Windows, Mac or Linux available in Appendix A.4. If the box has turned green and movement is observed, set the IMU on a stationary surface. Doing so, press spacebar which will reset the view and turn the box yellow. This means the program is waiting for movement. The setup should now be complete. Initiate the recording and move the IMU. The in-game camera will move, and the box turns green. The clips

67 Run # Start (s:f) End (s:f) Frames # Delay (ms) 01 02:29 04:28 117 121.88 02 02:24 04:20 104 108.33 03 01:09 03:10 79 82.29 04 03:08 05:06 74 77.08 05 04:11 06:11 82 85.42 06 04:00 05:22 52 54.17 07 06:00 08:12 72 75.00 08 05:02 06:28 60 62.50 09 03:26 05:26 112 116.67 10 04:19 06:19 98 102.08 11 05:03 07:06 69 71.88 12 02:22 05:02 114 118.75 13 04:00 06:04 64 66.67 14 05:27 07:29 116 120.83 15 06:00 08:03 63 65.62 16 05:00 07:02 62 64.58 17 04:00 06:00 60 62.50 18 03:26 05:26 112 116.67 19 05:02 07:05 67 69.79 20 04:20 06:25 105 109.38 avg 84.1 87.60 min 52 54.17 max 117 121.88

Table 10.3: Delay: Unity 3D need should now be available for analysis.

10.3.4 Results

With the recorded clips, load them into a video editor and take note of the start and end frames as before.

The results from this projects experiment can be seen in Table 10.3, which has the same structure as Table 10.2.

The corresponding graph may be seen in Figure 10.5. As may be observed, this data may also seem quite noisy. Again, this is intended as we want to see the effect of running the game in a real-world situation. This experiment was run on the same machine and OS as the USB stack experiment. Considering the average delay of 87.60 ms in Unity compared to the average of 65.73 ms of the small program, it seems that Unity 3D adds about 20 ms of latency to the pipeline.

Finally, considering the frame time of 8.3 ms at 120 FPS, we are 87.6/8.3 = 10.6 frames on average behind the IMU.

68 120 110 100 90 80 Delay (ms) 70 60 50 1 5 10 15 20 Run #

Figure 10.5: Delay measurements Unity 3D

10.4 Concluding thoughts

We have now estimated the delay to be 65.6 ms in a simple application and 87.6 in-game, but without context, these numbers do not mean much. The question that needs answering is if these numbers affect the usability of the tracking system. However, due to the late arrival of the headset in this thesis’ timeline, there was no time or opportunity to conduct a user experience experiment. There is some anecdotal literature about how much latency users will tolerate [1], but the general impression is that this is highly personal. Some people are susceptible to latency; some are not. This is also based on practice and what a person is used to in my personal experience. If someone who is used to a low latency experience is exposed to high latency, this person might notice this latency quicker and tolerate it less than someone who is not used to low latency. When discussing latency with fellow students and friends, their perception of latency varies a lot. Some may not see that latency when using a typical 60 Hz screen in a fast- paced action game, while in contrast, some require screens running at 240 Hz to be happy. At the same time, most game consoles, such as PlayStation and XBOX, run at 30 Hz in my experience, and many are content with that. Of course, this example is not related to AR but is relevant to the general awareness of latency. One paper mentions 50 ms as being responsive but noticeable in VR and recommends a latency of less than 20ms, but neither of these numbers is backed by data [1]. As such, if 50 ms is noticeable, our latency of 86 ms will most certainly be detectable by some people, and if the recommended latency is 20 ms, there remains much work to do. Alternatives to having the IMU communicate with TCP/IP-over-USB may have to be considered.

69 70 Chapter 11

Tracking

The main task of this project is to investigate how to accomplish efficient head tracking of a users head while wearing an AR headset. Considering the time it took between the projects start and when we had a headset in hand to test with, there has been a limited amount of development. It is possible to theorise and experiment with an IMU or camera, but it did not prove easy to find anything that could be done without a headset and a visual representation of how the heads movement should be represented in a game engine. This chapter will first the progress we had with rotations of the users head by interpreting the data from an IMU in Section 11.1. Second, some of the theory of interpreting movement is discussed in Section 11.2 and finally, in Section 11.3, how to use the camera is broached upon and the various libraries considered. All references to Unity in this chapter is based upon the scene Assets/Scenes/headset in the provided Unity package (Appendix A.4) unless otherwise specified.

11.1 Rotation

As previously mentioned in Section 7.1, manipulating an object in Unity 3D may be done by attaching a script to the object. This project has developed such a script for interpreting the IMUs rotations and will discuss its methods, benefits and problems in this section. For a brief introduction to scripting in Unity, refer to Section 7.1.2.

This section will discuss the process of implementing this rotation in Unity. The script that handles rotation can be found in the file Assets/Rotation/RotateScript.cs in the aforementioned Unity package. First, this section will give an introduction to the issue in Section 11.1.1, second a brief explanation of how to transform objects in Unity is presen- ted in Section 11.1.2. Then input from the IMU is discussed in Section 11.1.3 and finally the results are presented in Section 11.1.5.

71 Figure 11.1: Scene overview

Figure 11.2: Scene interior

11.1.1 Introduction As this projects initial motivation was to create a solution for tracking heads in a quite stationary position in a cockpit, a model of a car was found online [73] to use for testing. The exterior of the model may be observed in Figure 11.1. This file is easy to import into Unity as it has native support for the format. More on this in Section 8.3. In a later stage, an interior that matches whatever cockpit the user is sitting in must be created as a 3D- model and imported. As you may have noticed in Figure 11.1, the interior is quite vivid. This is because anything that is rendered in black will not reflect on the lenses of the AR headset. Therefore we opted for bright shades of blue and pink during testing. The interior may be observed in Figure 11.2, which shows an approximate angle of the experience when one wears the headset. Due to how one renders the view presented to the headset, it makes little sense to show a screenshot of this. As the view is split into two different angles to represent eyes, and adjusted for the angle and shape of the combiner lenses discussed in Section 4.1.1, it gives little meaning.

11.1.2 Transforming objects in Unity Scripting in Unity has been discussed previously in this thesis, where attaching a script to a game object will ensure that the script runs has been shown. Another relevant function of this as manipulating the object trough the transform command [106].

72 Mode Description 0 Sensor fusion off 1 Sensor fusion on 2 Sensor fusion on, without magnetometer 3 Sensor fusion on, without fast magnetometer calibration

Table 11.1: Overview of sensor fusion modes, IMU [31]

transform.rotation = Quaternion.identity; An example can be seen above. This line of code would rotate the object that the script is attached to the identity quaternion, i.e. what Unity has set as base coordinates. In the inspector tab in the Editor, a Transform section may be observed, which allows the developer to manipulate the position, rotation and scale of the object in X, Y and Z dimensions. The object may be manipulated directly in the inspector using Euler angles, or using transform.Rotate() in a script. Please note that these commands do not accept rotations over 180°at a time due to gimbal lock whereas the transform.rotation accepts quaternions, which do not suffer from this limitation.

11.1.3 IMU input The Tinkerforge API is used in the same way as discussed in Sec- tion 5.1.4, 10.2.2 and 10.3.2 in how it establishes a connection with the IMU.

Sensor fusion The Tinkerforge IMU supports several levels of sensor fusion [31], as previously mention in chapter5. Sensor fusion is when the device combines the input from its sensors and gives an estimate on the device’s orientation. Table 11.1 shows the modes the IMU supports, which may be set with the function: void BrickIMUV2.SetSensorFusionMode(byte mode) This project has set the mode to 2: without magnetometer, which signifies: [. . . ]fusion mode without magnetometer. In this mode the calculated orientation is relative (with magnetometer it is absolute with respect to the earth). However, the calculation can’t be influenced by spurious magnetic fields. ([31]) There is a trade-off in this. In this projects experience, without the magnetometer the input from the IMU seemed more precise when only rotation is concerned. As the script for rotation in our project only rotates the camera without changing its position, there is no need for the IMU to try and estimate its location: we are only interested in quaternions.

73 Furthermore, the motivation of this project is to use a camera for this sort of tracking as the IMU is not able to give precise data over time.

Callbacks By using callbacks, which are independent of how Unity calls its Update() methods when rendering a frame, we ensure that that the latest data from the IMU is always available to the rotation script. Since the callback stores the quaternion in a variable available to the class, there is no need to “get” a quaternion. This reduces the number of method calls and critically removes the need to wait for the round-trip time of asking and getting a packet from the IMU over its TCP/IP over USB pipeline. Furthermore, this ensures that the framerate of the game does not affect the performance of the tracking: when a new frame is rendered, we have the latest information.

11.1.4 Calculating rotation Now that the method of getting quaternions from the IMU is established, we need to calculate the rotation. Several methods were tested, where finding a way to correlate the axis of the IMU with the coordinate system in Unity was the goal. Applying Euler angles directly to the GameObject will not work, even if they are intuitive for the user to manipulate, as the axis from the IMU, as in the physical orientation, is not know and changes when the user is moving. Observing Figure 5.1, in the bottom corner, an overview of what the IMU considers as X, Y, Z may be found. Now, in Figure 4.2, how the IMU is mounted at an angle may be observed: the IMU is tilted almost 90° down. Finally, observe the scene in Figure 11.2 from the interior of the car. In the upper right corner, an overview of the coordinate system in Unity may be seen. Coincidentally, the Z-axis is both pointing forwards in this setting. Now imagine the users turns 90° to to either side. Now the Z-axis of the IMU would in a static mapping still manipulate the Z-axis in Unity but would be a rotation around the X-axis. Thankfully, there is a way around this. Quaternions are not dependent on an axis and work by one number describing the size of the scaling, one number for degrees to be rotated, and the last two numbers giving the plane in which the vector should be rotated. This may be done as the XY plane can be rotated to any plane in XYZ space through the origin by giving the rotation angles about the X and Y axes [27]. Using this information, we store an initial quaternion at game launch, which is an approximation of where the user is viewing, and also offer a method to reset this initial quaternion by pressing the spacebar. This gets the current quaternion and stores it in a variable: qOrigin. Please note that this reset method is only for testing, and in a complete solution, we would use the camera to determine where the user is looking and use that as a basis. A second property to be aware of is that multiplying two quaternions together gives the resulting rotation [119]. By multiplying Qb ∗ Qa, the

74 rotation quaternion that rotates Qa to Qb is the result. So for every frame, we do the following steps: • Reset the camera. By rotating the game object representation of the camera to the identity quaternion, testing has shown that this helps with creating a correct rotation. • Calculate the new quaternion. We multiply qOrigin with the latest quaternion from the IMU, q. • Rotate the camera. Through extensive testing, the method of rotating the camera in Unity which has proved to the most resilient is transforming the object over each axis in Unity with the Euler angles of the rotation quaternion from above. However, this should be changed to using quaternions directly, as this is what Unity uses, and would potentially be faster and more precise, and remains as a potential for future work. Since the frame is not rendered before the Update() method is complete, the camera does not jump around.

11.1.5 Result After extensive testing, this seems to produce a satisfying rotation of the users head. However, due to the time constraints, this project has had due to the late arrival of the headset, there has been limited time to test the algorithm. It is one thing sitting there with the IMU trying to emulate a users head inside Unity, and another thing entirely to have it attached to a headset. One consequence of this, as an example, the realisation that tracking has to use the inverted movement of the IMU to reflect that the world is stationary and the user is moving. This was not intuitive without the headset but was thankfully a trivial fix. Furthermore, there is currently a bug where when the user is looking almost straight up or down, the algorithm or IMU is not able to track correctly. This manifests as twitchy movement but settles down again when the user has stopped staring at the ceiling. This is something that must be tested further, but due to the fact that we are rotating the head as a stationary ball at the moment, it might be that when we are able to track the users’ movement, this will be less of an issue.

Horizon There is also the issue of matching the horizon of the real world with the horizon in-game. Since the IMU is not mounted perpendicular to the horizon, and that there probably are errors in the data from the IMU, a method has to be established where we can compensate for this. Some attempt of this can be found in the same file as rotation. Of course, when a camera is employed and tracking a tag, more on this later, that would be the basis of placing the rendered parts of the cockpit in

75 its proper place. However, one cannot guarantee that the camera tracking is either 1) working, 2) able to see a tag or 3) fast enough. Therefore, being able to keep track of this is essential for not breaking the suspension of disbelief. One way of doing this is by using the gravity vector from the IMU.

The gravity vector tells which axis is pointing down. In other words, the highest axis is the one pointing down. Furthermore, this vector, which is saved in another callback method, GravityCB(), may be used as a basis for doing an additional rotate on the camera position, ensuring the user has the correct viewpoint. This may be done by employing the Rodrigues rotation formula which is an efficient algorithm for rotating a vector in space, given an axis and angle of rotation [91].

The RotateAroundOrthogonal() method rotates the input 3×3 matrix around the axis k = a × b (cross-product). k is a vector that is orthogonal (perpendicular) to the plane that is defined by the vectors a and b. Imagine a door, where the lowest hinge is at (0, 0, 0). If a points at the door and b points at the doorframe, the resulting vector k points up through the hinges. Please assume that the angle V tells how much one must rotate the door to close it, which is unknown. We know that the length of k is sin(V), which in another words is ||k|| = sin(V). Therefore one is able to calculate V = asin(norm(k)) If we use the Rodrigues rotation formula to rotate around k with the angle V, then this is the natural movement to close the door. This may be employed to ensure that the horizon is correct by assuming that the door is the real horizon and the gravity vector is the doorframe.

The framework for this code is in place in code, with an efficient implementation of a 3×3 matrix class that supports addition, subtraction and multiplication of two matrices, as well as multiplying it by a constant. There also exists a method to convert the matrix mentioned above to a quaternion.

11.2 Movement

Another function of the IMU is to track movement through linear acceleration. In our experience, this data gets quite quickly imprecise but may be used between frames to maintain the fluidity of motion between the camera tracking. This section will briefly discuss the framework that has been established for enabling this, but there has however been little progress, again due to time.

This section will briefly discuss what progress has been made during this thesis.

76 Figure 11.3: Marker 2×2

11.2.1 Development

In the Unity Package available in Appendix A.2, under Assets/Rotation, there is a script that has begun the development for handling linear acceleration and translating this in to position. The file establishes a connection with the IMU and registers a callback method for linear acceleration. This data is subsequently stored in a Vector3 and may be used in Update() in the same manner as explained in Section 11.1 for further development. The general principle would be to keep track of time between frames, which is possible with the Unity method Time.deltaTime which provides the time between the current and previous frame [104]. From this, one should be able to calculate the speed and vector of the users’ movement by multiplying acceleration with time. This was briefly tested in Unity, and I did manage to get the object to move about, albeit in a wildly erratic manner. As of now, there is no code or theory developed in this thesis regarding positional tracking and remains as a potential for future work.

11.3 Visual

The final part of this project was to implement visual tracking with a camera and tags. Some testing has been done with using OpenCV to identify tags but has been troublesome to get running in Unity. Thankfully for any future work of this project, Leap Motion has included an implementation of OpenCVSharp, which can be read more about in Section 11.3.5. The basic idea as if a computer vision library can detect at the very least 3 out of 4 markers placed in a square, such as in Figure 11.3, one should be able to place the users head in space according to the following principles: • Distance from markers. If the distance from each marker is known,

77 one should be able to determine the distance the camera is from the markers through measuring the relative distances the markers are from each other. The markers must be identified with computer vision. In essence, if the distance between the markers are less now than they were a moment ago, one should assume this is due to the image becoming smaller than before, and thus the user has moved further away from the markers.

• Angle from markers. By calculating the skewness of the square, or partial square from three markers, one should be able to determine the angle the camera is relative to the markers. E.g., suppose the distance between the two leftmost markers is relatively less than the distance between the rightmost markers. In that case, one should assume the camera is pointing at the markers from an angle towards the right according to the same principle as above.

By looking at the relative distances between the corners of this square, one should be able to estimate the placement of the users head in the room and use this to correct drift from the IMU. By looking at the delta differences of the distances, and could also estimate the speed or abruptness of the users’ motion, which may be a potential for predicting further movements.

To be able to implement this tracking, some sort of computer vision library needs to be employed that can detect markers. The following sections will introduce the various libraries and frameworks that have been explored. Sadly, again due to the time constraints of this project, there has been no development of the visual tracking and remains as a potential for future work.

The following sections will discuss the various libraries and frameworks considered.

11.3.1 CCTags

At first, this thesis considered using Concentric Circle Tags (CCTags) to detect markers with a library from the AliceVision project. This library was designed with markers that are robust to partial occlusions, varying distances and angles of view, and fast camera motions [21]. This is highly relevant for this project considering the nature of a user manipulating controls of a cockpit with the possibility of arms blocking the view to markers, or the users head movements. There are, however, two significant pitfalls with including this library into this project. First, it relies on CUDA cores from an nVidia Graphics Processing Unit (GPU) [29], which breaks with the wish of this project: not to be limited to specific devices. Also, this would not work with the machine selected for use, an Intel NUC with a GPU from AMD. Second, this library is difficult to compile and include into a project and was very difficult to get to work with Unity 3D.

Therefore, other marker-based detecting libraries were explored.

78 11.3.2 ARToolKit ARToolKit is a quite established toolkit, originally developed in 1999 [53]. It has a robust user-base and is Open Source Software [10][12]. It has changed hands a couple of times through the years and now operates under the name artoolkitX [11], who are the new maintainers. However, at the time of this project, the general impression is that it is mostly abandoned. Consider what they write of themselves: One of the key difficulties in developing Augmented Reality applic- ations is the problem of tracking the users viewpoint. In order to know from what viewpoint to draw the virtual imagery, the applic- ation needs to know where the user is looking in the real world. AR- ToolKit uses computer vision algorithms to solve this problem. The ARToolKit video tracking libraries calculate the real camera position and orientation relative to physical markers in real time. This enables the easy development of a wide range of Augmented Reality applica- tions. Some of the features of ARToolKit include: - Single camera position/orientation tracking. - Tracking code that uses simple black squares. - The ability to use any square marker patterns. - Easy camera calibration code. - Fast enough for real time AR applications. - SGI IRIX, Linux, MacOS and Windows OS distributions. - Distributed with complete source code. ([9]) This all sounds very well, but in this project’s experience, it is less than trivial to import this into Unity 3D and get it running. Based upon research online, there are some guides available on how to import this framework, but most of them are based upon obsolete packages in the Unity Asset Store which are unavailable to download at the time of writing, as well as old, incompatible versions. Furthermore, it seems like the main focus of ARToolKit is to use the markers to overlay 3D virtual objects on real markers [8]. It does support marker-tracking, but if that is the only feature that is going to be used, OpenCV may just be used directly, as an alternative. More on this later.

11.3.3 Vuforia Vuforia claims to be the most widely used platform for AR development, with support for leading phones, tablets, and headsets. Developers can easily add advanced computer vision functionality to Android, iOS, and UWP apps, to create AR experiences that realistically interact with objects and the environment [116]. Vuforia is integrated into Unity, and can be imported with the Unity package manager [44]. It is straightforward to get running in Unity. However, as far as the experiences this project has, its main focus is the same as ARToolKit, which is to overlay 3D-models over real markers in-game and not to use

79 these markers as a basis for tracking a users position. Another experience gained is that Vuforia is aiming more at the mobile marked with ARCore and ARKit from Google and Apple, respectively, where warnings for deprecation shows up in the editor when headset use is selected.

11.3.4 OpenCV

As an alternative to any AR frameworks, one could use OpenCV directly for optical tracking. OpenCV supports marker detection, where ArUco markers seem to be quite established [28], which are based upon a paper that seems to be very applicable:

This paper presents a fiducial marker system specially appropriated for camera pose estimation in applications such as augmented reality and robot localization. Three main contributions are presented. First, we propose an algorithm for generating configurable marker dictionaries (in size and number of bits) following a criterion to maximize the inter-marker distance and the number of bit transitions. In the process, we derive the maximum theoretical inter-marker distance that dictionaries of square binary markers can have. Second, a method for automatically detecting the markers and correcting possible errors is proposed. Third, a solution to the occlusion problem in augmented reality applications is shown. ([43])

Getting OpenCV to run is fine, but getting it run inside Unity 3D has proven to be a challenge, especially considering the other packages that have been imported. OpenCV is implemented in C++, while the language used by Unity is C#. It is possible to use pre-compiled libraries and binary blobs in Unity, but this proved challenging to implement and not possible within the time constraints of this project. Therefore, some alternative to running OpenCV in its basic form is preferable, such as with a wrapper around OpenCV, or re-implementation in C#. What follows is a brief discussion about the various OpenCV implementations this project has considered.

11.3.5 OpenCVSharp

OpenCVSharp is an OSS cross platform wrapper of OpenCV for .NET Framework [96]. It supports several operating systems and platforms, and aims to be as close to the original C/C++ API style as possible.

However, according to their documentation, OpenCVSharp does not work on Unity. However, the Project North Star headset includes OpenCVSharp in their plugin discussed in Section 4.1.2. You can find the library OpenCvSharpExtern.dll under /Plugins/LeapMotion/North Star/Plugins/x86_64/ in the editor. This seems to be a potential starting point for using OpenCV in future work.

80 11.3.6 EmguCV Emgu CV is a cross-platform .Net wrapper to the OpenCV image processing library. Allowing OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Xamarin Studio and Unity, it can run on Windows, Linux, Mac OS, iOS and Android. ([40]) However, this projects experience is that trying to run this in Unity will need some serious amount of work and tweaking. It is however OSS and seems to be frequently updated [39], so it is worthy of note.

11.3.7 OpenCV plus Unity This is a package found in the Unity Asset Store, which is an adaptation of OpenCVSharp for Unity [77]. It has no support, but has proven to be easy to import and get running, but please note that this is OpenCV 3, and not 4.

81 82 Part IV

Conclusion

83

Chapter 12

Results

This thesis set out with a high goal, and although a lot less has been accomplished than what we initially tried to do, there has been some significant progress. What follows is a summary of what this thesis has achieved. The main takeaway from the project is the Unity package available in Appendix A.4, which contains most of one year of experimentation and research into which frameworks or libraries are compatible with what, which ones are easy to use, and which ones are applicable to the goals of this project. A project that hopefully will extend beyond what this thesis has accomplished. It is my hope that for further work, this thesis will give valuable insight into what limitations and possibilities there are with the tools available at the time of writing, and hopefully save valuable time. We have delved into various headsets and IMUs, we have explored various AR frameworks and libraries and learned a lot about how to integrate this into a game engine. As such, we have managed to create the basis of a cockpit simulator with the beginnings of user head-tracking developed. From this, we have successfully created rotational tracking, and shown that this open-source approach to low-latency head tracking is feasible, albeit with more work needed. We have also developed code that is able to test the latency of the IMU and established that the latency is not very low, but at the same time not very good. It remains to be seen if users will notice the latency of about 80 ms, but the code developed and methodology may be the basis of further testing regarding user experience. Finally, an open-source headset has been assembled and integrated with an IMU and camera. This headset has successfully been used in testing the developed tracking system within Unity 3D.

85 86 Chapter 13

Future work

To be able to implement a working tracking system, several things remain as a potential for future work. These items will be presented below, as well as some suggestions for improving the experiments conducted.

Hardware

• USB hub. Investigate the potential power draw or bandwidth issues related to connecting the Display Driver Board, IMU and camera to the USB hub. This will ensure that the headset remains comfortable to use. Alternatives are using a powered USB hub or running the camera on a separate cable. • Mounting bracket. Manufacture a mounting bracket to ensure that the attachments stay in place, and also allows the camera to placed in a suitable manner.

Experiments

• Rotational and linear motion. One should determine if there is a difference in delay between rotational and linear motion. In essence, establishing if the delay in the callback methods for quaternions and linear acceleration are the same. • USB hub. Does the USB hub introduce any delay, and if so is it enough to disrupt the user experience. • Headset screens. Does the Display Driver Board (DDB) introduce any latency to the rendered image? Are the refresh rate and response time of the screens in the headset comparable to the screen used in thesis’ experiments? • Operating systems. The experiments in this thesis are conducted in a UNIX environment due to the ability to manipulate the framebuffer directly. However, Windows remains the operating system which most people play games on, and in my experience still has the best

87 support for drivers and games. Is there any difference in delay between these two operating systems? • User experience. Finally, one should try to establish how much delay a user will tolerate before they do not want to use the headset. This could be done by creating a simple game, and randomly introduce a synthetic delay to the tracking. A subsequent user survey could estimate the delay limit one should aim for.

Tracking • Adapt rotational algorithm with quaternions. Currently, the rotational algorithm rotates the camera in Unity through Euler angles. Considering that Unity uses quaternions directly, it is a wasted step to convert to Euler angles and back in the algorithm. Improving the algorithm with this could also fix potential issues with the currents algorithm bugs when the user is looking straight up. • Improve rotational algorithm. As of now, the Unity script handles rotation as if the IMU is mounted at the very centre of the users head, which it, of course, is not. The rotational tracking should be extended to deal with the fact that the IMU is mounted outside the users head. • Anchor the scene with the horizon. Considering how the goal of this thesis was to implement a cockpit simulator, one should reliably be able to anchor the rendered parts of the cockpit to the environment. Knowing where the horizon is and rotating the scene around this could help. • Implement positional tracking. As of now, there is no positional tracking. This could be done by interpreting the linear acceleration data from the IMU. Over time, the drift and poor precision of this linear acceleration are quite dramatic. However, since we are only looking at the differences in linear acceleration between frames, it might be little enough time gone for the data to be of value. This could be the core of the tracking if implemented well. • Implement visual tracking. Currently, there is no visual tracking. To be able to correct for the drift in both rotational and positional information from the IMU is crucial to ensure that the rendered and the real are coherent. • Handle occlusion. Another important aspect of this project is understanding what not to render. If the user’s arm is in front of something initially rendered, such as part of the cockpit, the game should sense this and ensure that the user may see through the lenses. Some type of hand tracking, possibly with the Leap Motion module, could be implemented to help with this. Furthermore, real elements of the cockpit that the user wants to interact with must be let through the scene: e.g. how to handle when the user looks at the wheel from the side.

88 Bibliography

[1] (PDF) Measuring Latency in Virtual Reality Systems. DOI: 10.1007/ 978 - 3 - 319 - 24589 - 8 _ 40. URL: https : / / www . researchgate . net/publication/300253386_Measuring_Latency_in_Virtual_ Reality_Systems (visited on 15/08/2020). [2] 8K DisplayPort-Kabel (miniDP-miniDP), 2m 8K@60Hz, 4K@120Hz, DP 1.4, 32.4Gb/s - Digital Impuls Oslo AS. URL: https : / / www . digitalimpuls.no/mini-displayport/141230/8k-displayport- kabel - minidp - minidp - 2m - 8k60hz - 4k120hz - dp - 14 - 324gb - s (visited on 14/08/2020). [3] 9DoF Razor IMU M0 Hookup Guide - Learn.Sparkfun.Com. URL: https: //learn.sparkfun.com/tutorials/9dof-razor-imu-m0-hookup- guide (visited on 03/07/2020). [4] About | Kdenlive. URL: https://kdenlive.org/en/about/ (visited on 30/06/2020). [5] Basler AG. Basler Ace acA2000-50gc - Area Scan Camera. URL: /en/ products / cameras / area - scan - cameras / ace / aca2000 - 50gc/ (visited on 02/07/2020). [6] Oytun Akman et al. ‘Multi-Cue Hand Detection and Tracking for a Head-Mounted Augmented Reality System’. In: Machine Vision and Applications 24.5 (2013), pp. 931–946. ISSN: 0932-8092. DOI: 10.1007/ s00138-013-0500-6. [7] Amazon.Com : USB Camera Module Full HD 1080P Mini Webcam USB with Cameras with Sony IMX291 Image Sensor, USB3.0 Web Camera Low Illumination Camera Board with 100 Degree No Distortion Lens for Android Linux Windows : Electronics. URL: https : / / www . amazon . com/gp/product/B07MG4K8CT/ref=ppx_yo_dt_b_asin_title_o00_ s00?ie=UTF8&psc=1 (visited on 29/06/2020). [8] ARToolKit Documentation (Feature List). URL: http : / / www . hitl . washington . edu / artoolkit / documentation / features . htm (visited on 10/07/2020). [9] ARToolKit Home Page. URL: http://www.hitl.washington.edu/ artoolkit/ (visited on 09/07/2020). [10] Artoolkit/ARToolKit5. artoolkit, 6th July 2020. URL: https://github. com/artoolkit/ARToolKit5 (visited on 09/07/2020). [11] artoolkitX. URL: http : / / www . artoolkitx . org/ (visited on 09/07/2020).

89 [12] Artoolkitx/Artoolkitx. artoolkitX, 7th July 2020. URL: https : / / .com/artoolkitx/artoolkitx (visited on 09/07/2020). [13] Asus VG248QE. URL: https://www.asus.com/Monitors/VG248QE/ (visited on 04/12/2019). [14] Ronald Azuma et al. ‘Tracking in Unprepared Environments for Augmented Reality Systems’. In: Computers & Graphics 23.6 (1st Dec. 1999), pp. 787–793. ISSN: 0097-8493. DOI: 10.1016/S0097-8493(99) 00104-1. URL: http://www.sciencedirect.com/science/article/ pii/S0097849399001041 (visited on 10/08/2020). [15] R. Behringer. ‘Registration for Outdoor Augmented Reality Applic- ations Using Computer Vision Techniques and Hybrid Sensors’. In: Proceedings IEEE Virtual Reality (Cat. No. 99CB36316). Proceedings IEEE Virtual Reality (Cat. No. 99CB36316). Mar. 1999, pp. 244–251. DOI: 10.1109/VR.1999.756958. [16] Ranita Biswas and Jaya Sil. ‘An Improved Canny Edge Detection Algorithm Based on Type-2 Fuzzy Sets’. In: Procedia Technology. 2nd International Conference on Computer, Communication, Control and Information Technology( C3IT-2012) on February 25 - 26, 2012 4 (1st Jan. 2012), pp. 820–824. ISSN: 2212-0173. DOI: 10 . 1016 / j . protcy . 2012 . 05 . 134. URL: http : / / www . sciencedirect . com / science / article / pii / S2212017312004136 (visited on 13/06/2019). [17] BNO055. URL: https://www.bosch-sensortec.com/bst/products/ all_products/bno055 (visited on 30/10/2019). [18] Brick_imu_v2_tilted2_800.Jpg (800×674). URL: https : / / www . tinkerforge . com / en / doc / _images / Bricks / brick _ imu _ v2 _ tilted2_800.jpg (visited on 13/06/2019). [19] Buy a Sense HAT – Raspberry Pi. URL: https://www.raspberrypi. org/products/sense-hat/ (visited on 03/07/2020). [20] Calibration-Stand-No-Blur-Rgba.Png (640×360). URL: https://raw. githubusercontent.com/leapmotion/ProjectNorthStar/master/ Mechanical/imgs/calibration-stand-no-blur-rgba.png (visited on 29/06/2020). [21] L. Calvet et al. ‘Detection and Accurate Localization of Circular Fiducials under Highly Challenging Conditions’. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 562–570. DOI: 10.1109/CVPR.2016.67. [22] J. Canny. ‘A Computational Approach to Edge Detection’. In: IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8.6 (Nov. 1986), pp. 679–698. ISSN: 0162-8828. DOI: 10 . 1109 / TPAMI . 1986.4767851. [23] Canny Edge Detection — OpenCV-Python Tutorials 1 Documentation. URL: https://opencv- python- tutroals.readthedocs.io/en/ latest/py_tutorials/py_ imgproc/py_canny/py_canny.html (visited on 12/06/2019). [24] Canny Edge Detector. URL: http://justin-liang.com/tutorials/ canny/ (visited on 12/06/2019).

90 [25] A. I. Comport et al. ‘Real-Time Markerless Tracking for Augmented Reality: The Virtual Visual Servoing Framework’. In: IEEE Transac- tions on Visualization and Computer Graphics 12.4 (July 2006), pp. 615– 628. ISSN: 1077-2626. DOI: 10.1109/TVCG.2006.78. [26] CS-200-Datasheet.Pdf. URL: https : / / www . ctisensors . com / Documents/CS-200-Datasheet.pdf (visited on 03/07/2020). [27] Erik B Dam, Martin Koch and Martin Lillholm. ‘Quaternions, Interpolation and Animation’. In: (), p. 103. URL: http://web.mit. edu/2.998/www/QuaternionReport1.pdf. [28] Detection of ArUco Markers. URL: https://docs.opencv.org/trunk/ d5/dae/tutorial_aruco_detection.html (visited on 15/08/2020). [29] Detection of CCTag Markers Made up of Concentric Circles.: Alicevi- sion/CCTag. AliceVision, 1st June 2019. URL: https://github.com/ alicevision/CCTag (visited on 05/06/2019). [30] C. Diaz et al. ‘Designing for Depth Perceptions in Augmented Real- ity’. In: 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Oct. 2017, pp. 111–122. DOI: 10.1109/ ISMAR.2017.28. [31] Doc | Tinkerforge | Advanced Functions (C#). URL: https : / / www . tinkerforge . com / en / doc / Software / Bricks / IMUV2 _ Brick _ CSharp.html#advanced-functions (visited on 15/07/2020). [32] Doc | Tinkerforge | Brick Daemon. URL: https://www.tinkerforge. com / en / doc / Software / Brickd . html # brickd (visited on 04/12/2019). [33] Doc | Tinkerforge | Brick Viewer. URL: https://www.tinkerforge. com/en/doc/Software/Brickv.html (visited on 29/06/2020). [34] Doc | Tinkerforge | C/C++ - IMU Brick 2.0. URL: https : / / www . tinkerforge.com/en/doc/Software/Bricks/IMUV2_Brick_C.html (visited on 30/10/2019). [35] Doc | Tinkerforge | Callbacks (C#). URL: https://www.tinkerforge. com / en / doc / Software / Bricks / IMUV2 _ Brick _ CSharp . html # callback (visited on 12/07/2020). [36] Doc | Tinkerforge | Programming Interface. URL: https : / / www . tinkerforge . com / en / doc / Programming _ Interface . html # programming-interface (visited on 04/12/2019). [37] Document Library | USB-IF. URL: https://www.usb.org/documents (visited on 14/08/2020). [38] C. Du et al. ‘Edge Snapping-Based Depth Enhancement for Dy- namic Occlusion Handling in Augmented Reality’. In: 2016 IEEE In- ternational Symposium on Mixed and Augmented Reality (ISMAR). 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Sept. 2016, pp. 54–62. DOI: 10.1109/ISMAR.2016.17. [39] Emgu. Emgucv/Emgucv. 10th July 2020. URL: https://github.com/ emgucv/emgucv (visited on 10/07/2020). [40] Emgu CV: OpenCV in .NET (C#, VB, C++ and More). URL: http : / / www . emgu . com / wiki / index . / Main _ Page (visited on 10/07/2020).

91 [41] Wei Fang et al. ‘Real-Time Motion Tracking for Mobile Aug- mented/Virtual Reality Using Adaptive Visual-Inertial Fusion’. In: Sensors 17.5 (5 May 2017), p. 1037. DOI: 10.3390/s17051037. URL: https : / / www . mdpi . com / 1424 - 8220 / 17 / 5 / 1037 (visited on 15/08/2020). [42] Borko Furht. Handbook of Augmented Reality. 1. New York, NY: Springer New York : Imprint: Springer, 2011. 752 pp. ISBN: 978-1- 283-35180-5. [43] S. Garrido-Jurado et al. ‘Automatic Generation and Detection of Highly Reliable Fiducial Markers under Occlusion’. In: Pattern Recognition 47.6 (June 2014), pp. 2280–2292. ISSN: 00313203. DOI: 10 . 1016 / j . patcog . 2014 . 01 . 005. URL: https : / / linkinghub . elsevier . com / retrieve / pii / S0031320314000235 (visited on 15/08/2020). [44] Getting Started with Vuforia Engine in Unity. URL: https://library. vuforia . com / articles / Training / getting - started - with - vuforia-in-unity.html (visited on 10/07/2020). [45] Konrad Urdahl Halnum. Konraduh/ProjectNorthStar. 30th June 2020. URL: https://github.com/konraduh/ProjectNorthStar (visited on 10/08/2020). [46] Kari Haugsdal. ‘Edge and Line Detection of Complicated and Blurred Objects’. In: 39 (2010). URL: https://ntnuopen.ntnu.no/ ntnu-xmlui/handle/11250/250824 (visited on 13/06/2019). [47] Anna Katharina Hebborn, Marius Erdt and Stefan Müller. ‘Robust Model Based Tracking Using Edge Mapping and Refinement’. In: Augmented and Virtual Reality. Ed. by Lucio Tommaso De Paolis and Antonio Mongelli. Lecture Notes in Computer Science. Springer International Publishing, 2015, pp. 109–124. ISBN: 978-3-319-22888- 4. [48] Aleksander Holynski and Johannes Kopf. ‘Fast Depth Densification for Occlusion-Aware Augmented Reality’. In: ACM Trans. Graph. 37.6 (Dec. 2018), 194:1–194:11. ISSN: 0730-0301. DOI: 10 . 1145 / 3272127.3275083. URL: http://doi.acm.org/10.1145/3272127. 3275083 (visited on 12/06/2019). [49] Huawei P30 Pro - GSMArena. URL: https://www.gsmarena.com/ huawei_p30_pro-9635.php (visited on 04/12/2019). [50] Hui Xue, Puneet Sharma and Fridolin Wild. ‘User Satisfaction in Augmented Reality-Based Training Using Microsoft HoloLens’. In: Computers 8.1 (2019), p. 9. ISSN: 2073-431X. DOI: 10 . 3390 / computers8010009. URL: https://www.mdpi.com/2073-431X/8/1/9 (visited on 29/05/2019). [51] IMX291LQR-C Datasheet | Sony - Datasheetspdf.Com. URL: https : //datasheetspdf.com/datasheet/IMX291LQR-C.html (visited on 29/06/2020). [52] Marco Iosa et al. ‘Wearable Inertial Sensors for Human Movement Analysis’. In: Expert Review of Medical Devices 13.7 (2016), pp. 641– 659. ISSN: 1743-4440. DOI: 10.1080/17434440.2016.1198694.

92 [53] H. Kato and M. Billinghurst. ‘Marker Tracking and HMD Calibra- tion for a Video-Based Augmented Reality Conferencing System’. In: Proceedings 2nd IEEE and ACM International Workshop on Aug- mented Reality (IWAR’99). Second International Workshop on Aug- mented Reality. San Francisco, CA, USA: IEEE Comput. Soc, 1999, pp. 85–94. ISBN: 978-0-7695-0359-2. DOI: 10 . 1109 / IWAR . 1999 . 803809. URL: http://ieeexplore.ieee.org/document/803809/ (visited on 10/07/2020). [54] Greg Kipper and Joseph Rampolla. Augmented Reality: An Emerging Technologies Guide to AR. Saint Louis: Elsevier Science & Technology Books, Syngress, 2012. ISBN: 978-1-59749-733-6. [55] Leapmotion/ProjectNorthStar. Leap Motion, 24th June 2020. URL: https://github.com/leapmotion/ProjectNorthStar (visited on 29/06/2020). [56] Leapmotion/ProjectNorthStar. URL: https : / / github . com / leapmotion/ProjectNorthStar (visited on 29/06/2020). [57] Leapmotion/ProjectNorthStar | Assembly Guide. URL: https : / / github . com / leapmotion / ProjectNorthStar / blob / master / Mechanical/Assm%20Drawing%20North%20Star%20Release%203. pdf (visited on 13/07/2020). [58] Leapmotion/ProjectNorthStar | Software. URL: https://github.com/ leapmotion/ProjectNorthStar/tree/master/Software (visited on 30/06/2020). [59] Hansol Lee and Sangmi Chai. ‘Flow Experience in AR Application: Perceived Reality and Perceived Naturalness’. In: Augmented Cog- nition. Enhancing Cognition and Behavior in Complex Human Environ- ments. Ed. by Dylan D. Schmorrow and Cali M. Fidopiastis. Lecture Notes in Computer Science. Springer International Publishing, 2017, pp. 185–198. ISBN: 978-3-319-58625-0. [60] Wenkai Li, A. Nee and S. Ong. ‘A State-of-the-Art Review of Augmented Reality in Engineering Analysis and Simulation’. In: Multimodal technologies and interaction 1.3 (2017), pp. 17–. ISSN: 2414- 4088. DOI: 10.3390/mti1030017. [61] João P. Lima et al. ‘Standalone Edge-Based Markerless Tracking of Fully 3-Dimensional Objects for Handheld Augmented Reality’. In: Proceedings of the 16th ACM Symposium on Virtual Reality Software and Technology (Kyoto, Japan). VRST ’09. New York, NY, USA: ACM, 2009, pp. 139–142. ISBN: 978-1-60558-869-8. DOI: 10.1145/1643928. 1643960. URL: http://doi.acm.org/10.1145/1643928.1643960 (visited on 29/06/2019). [62] Linocell Mobilholder for kamerastativ - Mobiltelefonholder. URL: https: //www.kjell.com/no/produkter/mobilt/mobiltelefonholder/ linocell - mobilholder - for - kamerastativ - p97405 (visited on 29/06/2020). [63] H. Liu, G. Zhang and H. Bao. ‘Robust Keyframe-Based Monocu- lar SLAM for Augmented Reality’. In: 2016 IEEE International Sym- posium on Mixed and Augmented Reality (ISMAR). 2016 IEEE Inter-

93 national Symposium on Mixed and Augmented Reality (ISMAR). Sept. 2016, pp. 1–10. DOI: 10.1109/ISMAR.2016.24. [64] Xiaoju Ma et al. ‘The Canny Edge Detection and Its Improvement’. In: Artificial Intelligence and Computational Intelligence. Ed. by Jing- sheng Lei et al. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012, pp. 50–58. ISBN: 978-3-642-33478-8. [65] J. MacLean et al. ‘Fast Hand Gesture Recognition for Real-Time Teleconferencing Applications’. In: Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real- Time Systems. Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. July 2001, pp. 133–140. DOI: 10.1109/RATFG.2001.938922. [66] Magic Leap 1 | Magic Leap. URL: https://www.magicleap.com/en- us/magic-leap-1 (visited on 30/06/2020). [67] mattzmsft. HoloLens (1st Gen) Hardware. URL: https : / / docs . microsoft . com / en - us / hololens / hololens1 - hardware (visited on 30/06/2020). [68] META 2 - Exclusive Augmented Reality Development Kit. URL: https: //www.schenker-tech.de//en/meta-2/ (visited on 30/06/2020). [69] Microsoft HoloLens - Virtual Reality and Augmented Reality Wiki - VR AR & XR Wiki. URL: https://xinreality.com/wiki/Microsoft_ HoloLens#Hardware (visited on 08/07/2020). [70] Miller Welding Helmet Generation III Replacement Headgear 271325. URL: https://store.cyberweld.com/miwehegeiiir.html (visited on 09/07/2020). [71] MIMU22BL_product-Brief.Pdf. URL: https://www.inertialelements. com/documents/mimu22bl/MIMU22BL_product-brief.pdf (visited on 03/07/2020). [72] Minimum Bandwidth Requirements for Multi-Channel Video Capture - Magewell. URL: http://www.magewell.com/kb/000020025/detail (visited on 13/07/2020). [73] Mitsubishi Lancer Evolution X Free 3D Model - .Max .Fbx - Free3D. URL: https://free3d.com/3d-model/mitsubishi-lancer-evolution- x-98027.html (visited on 11/06/2020). [74] Mixed & Augmented Reality 3D Headset - Aryzon. URL: https : / / shop.aryzon.com/products/aryzon-headset?_ga=2.13845298. 466597994.1554208624-0bbdca1d-1AB7-4F0F-DAF4-35572454F05B (visited on 30/06/2020). [75] MPU9250REV1.0.Pdf. URL: https://cdn.sparkfun.com/assets/ learn _ tutorials / 5 / 5 / 0 / MPU9250REV1 . 0 . pdf (visited on 03/07/2020). [76] okreylos. On the Road for VR: Microsoft HoloLens at Build 2015, San Francisco. 1st May 2015. URL: http://doc-ok.org/?p=1223 (visited on 08/07/2020). [77] OpenCV plus Unity | Integration | Unity Asset Store. URL: https : //assetstore.unity.com/packages/tools/integration/opencv- plus-unity-85928 (visited on 10/07/2020).

94 [78] Ilaria Pasciuto et al. ‘How Angular Velocity Features and Different Gyroscope Noise Types Interact and Determine Orientation Estima- tion Accuracy’. In: Sensors (Basel, Switzerland) 15.9 (2015), pp. 23983– 24001. ISSN: 1424-8220. DOI: 10.3390/s150923983. [79] W. Pasman et al. ‘Alternatives for Optical Tracking’. In: (1st Sept. 2001). URL: https : / / www . researchgate . net / publication / 245269902_Alternatives_for_optical_tracking. [80] Plexgear Portable 420 USB 3.0-hub 4-veis - USB-huber. URL: https : / / www . kjell . com / no / produkter / data / datatilbehor / - tilbehor/usb- huber/plexgear- portable- 420- usb- 3.0- hub- 4-veis-p69515 (visited on 29/06/2020). [81] Project North Star Display (3.5inch, 1440x1600 Pixels, 120fps). URL: https://www.smart- prototyping.com/Display- for- Project- North-Star-3_5inch-1440x1600-pixels (visited on 09/07/2020). [82] Project North Star Display Driver Board. URL: https://www.smart- prototyping.com/Project-North-Star-Display-Driver-Board (visited on 09/07/2020). [83] Project North Star Is Now Open Source. 6th June 2018. URL: http : //blog.leapmotion.com/north- star- open- source/ (visited on 29/06/2020). [84] Project North Star Kit B (All Parts). URL: https : / / www . smart - prototyping . com / Project - North - Star - Kit - B (visited on 29/06/2020). [85] Project North Star Single-Piece Combiner Set (v3.1SPC). URL: https: //www.smart- prototyping.com/Project- North- Star- Lenses- Single-Piece-Set-v3_1-SPC (visited on 09/07/2020). [86] Purchase Augmented/Mixed Reality Headset - Aryzon. URL: https:// www.aryzon.com/get-aryzon (visited on 30/06/2020). [87] Quickstart for Android | ARCore. URL: https://developers.google. com/ar/develop/java/quickstart (visited on 05/06/2019). [88] Kjetil Raaen. ‘Response Time in Games: Requirements and Improve- ments’. Oslo: University of Oslo, Faculty of Mathematics and Nat- ural Sciences, Department of Informatics, 2016. URL: https : / / bibsys - almaprimo . hosted . exlibrisgroup . com / permalink / f / 13vfukn/BIBSYS_ILS71534279260002201. [89] Adi Robertson. AR Headset Company Meta Shutting down after Assets Sold to Unknown Company. 18th Jan. 2019. URL: https : / / www . theverge.com/2019/1/18/18187315/meta-vision-ar-headset- company - asset - sale - unknown - buyer - insolvent (visited on 06/07/2020). [90] Adi Robertson. Magic Leap Reportedly Laying off 1,000 Employees and Dropping Consumer Business. 22nd Apr. 2020. URL: https://www. theverge.com/2020/4/22/21231236/magic- leap- ar- headset- layoffs - coronavirus - enterprise - business - shift (visited on 08/07/2020). [91] Rodrigues’ Rotation Formula. In: Wikipedia. 13th May 2020. URL: https://en.wikipedia.org/w/index.php?title=Rodrigues% 27_rotation_formula&oldid=956436101 (visited on 15/05/2020).

95 [92] S24F350F 24" PLS Full HD Monitor | Samsung Support UK. URL: https://www.samsung.com/uk/support/model/LS24F350FHUXEN/ (visited on 29/06/2020). [93] scooley. HoloLens 2 Hardware. URL: https://docs.microsoft.com/ en-us/hololens/hololens2-hardware (visited on 30/06/2020). [94] Yoones A. Sekhavat and Jeffrey Parsons. ‘The Effect of Tracking Technique on the Quality of User Experience for Augmented Reality Mobile Navigation’. In: Multimedia Tools and Applications 77.10 (2018), pp. 11635–11668. ISSN: 1380-7501. DOI: 10 . 1007 / s11042 - 017-4810-y. [95] Lei Shi. ‘Improving AR Tracking and Registration with Marker- less Technology’. Universitat Politècnica de Catalunya, 2017. URL: https : / / recercat . cat / handle / 2072 / 273748 (visited on 15/08/2020). [96] shimat. Shimat/Opencvsharp. 10th July 2020. URL: https://github. com/shimat/opencvsharp (visited on 10/07/2020). [97] Vetle Smedbakken Sillerud. ‘Reconstruction of Indoor Environ- ments Using LiDAR and IMU’. In: (2019). URL: https://www.duo. uio.no/handle/10852/69792 (visited on 10/02/2020). [98] Skarredghost. All You Need to Know on HoloLens 2. 24th Feb. 2019. URL: https://skarredghost.com/2019/02/24/all- you- need- know-hololens-2/ (visited on 08/07/2020). [99] Renjie Song, Ziqi Zhang and Haiyang Liu. ‘Edge Connection Based Canny Edge Detection Algorithm’. In: Pattern Recognition and Image Analysis 27.4 (1st Oct. 2017), pp. 740–747. ISSN: 1555-6212. DOI: 10. 1134 / S1054661817040162. URL: https : / / doi . org / 10 . 1134 / S1054661817040162 (visited on 13/06/2019). [100] Suspend Disbelief | Definition of Suspend Disbelief in English by Lexico Dictionaries. URL: https : / / www . lexico . com / en / definition / suspend_disbelief (visited on 27/06/2019). [101] Juan Jose Tarrio and Sol Pedre. ‘Realtime Edge-Based Visual Odometry for a Monocular Camera’. In: IEEE, 2015, pp. 702–710. ISBN: 978-1-4673-8391-2. DOI: 10.1109/ICCV.2015.87. [102] Unity Technologies. Unity - Manual: Rotation and Orientation in Unity. URL: https://docs.unity3d.com/2017.4/Documentation/ Manual / QuaternionAndEulerRotationsInUnity . html (visited on 11/08/2020). [103] Unity Technologies. Unity - Manual: Unity User Manual (2019.4 LTS). URL: https://docs.unity3d.com/Manual/index.html (visited on 30/06/2020). [104] Unity Technologies. Unity - Scripting API: Time.deltaTime. URL: https://docs.unity3d.com/ScriptReference/Time-deltaTime. html (visited on 15/08/2020). [105] Unity Technologies. Unity - Scripting API: Transform.eulerAngles. URL: https://docs.unity3d.com/ScriptReference/Transform- eulerAngles.html (visited on 11/08/2020).

96 [106] Unity Technologies. Unity - Scripting API: Transform.Rotation. URL: https : / / docs . unity3d . com / ScriptReference / Transform - rotation.html (visited on 15/07/2020). [107] ‘The HoloLens 2 Puts a Full-Fledged Computer on Your Face’. In: Wired (). ISSN: 1059-1028. URL: https://www.wired.com/story/ microsoft-hololens-2-headset/ (visited on 08/07/2020). [108] Tinkerforge. Doc | Tinkerforge | IMU Brick 2.0. URL: https://www. tinkerforge.com/en/doc/Hardware/Bricks/IMU_V2_Brick.html (visited on 31/05/2019). [109] Tinkerforge | About Us. URL: https://www.tinkerforge.com/en/ home/about-us/ (visited on 03/07/2020). [110] Tracking | Leap Motion Controller | Ultraleap. URL: https : / / www . ultraleap.com/product/leap- motion- controller/ (visited on 29/06/2020). [111] Unity - Manual: Importing Objects From Blender. URL: https : / / docs . unity3d . com / 560 / Documentation / Manual / HOWTO - ImportObjectBlender.html (visited on 09/07/2020). [112] Universal Serial Bus Revision 3.0 Specification | Wayback Machine. 19th May 2014. URL: https : / / web . archive . org / web / 20140519092924 / http : / / www . usb . org / developers / docs / documents _ archive / usb _ 30 _ spec _ 070113 . zip (visited on 13/07/2020). [113] Unreal Engine | Frequently Asked Questions. URL: https : / / www . unrealengine.com/en-US/faq (visited on 10/07/2020). [114] Unreal Engine 4 on Github. URL: https://www.unrealengine.com/ en-US/ue4-on-github (visited on 10/07/2020). [115] USB Power Delivery 2.0: What Is USB PD? | Arrow.Com. URL: https: / / www . arrow . com / en / research - and - events / articles / usb - power- delivery- 2- end- of- the- line- for- the- power- brick (visited on 14/08/2020). [116] Vuforia | Overview. URL: https://library.vuforia.com/getting- started/overview.html (visited on 10/07/2020). [117] Yue Wang et al. ‘A LINE-MOD-Based Markerless Tracking Ap- proachfor AR Applications’. In: The International Journal of Advanced Manufacturing Technology 89.5 (2017), pp. 1699–1707. ISSN: 0268- 3768. DOI: 10.1007/s00170-016-9180-5. [118] Eric W. Weisstein. Euler Angles. URL: https://mathworld.wolfram. com/EulerAngles.html (visited on 11/08/2020). [119] Gordon Wetzstein. ‘EE 267 Virtual Reality Course Notes: 3-DOF Orientation Tracking with IMUs’. In: (), p. 14. URL: https : / / stanford.edu/class/ee267/notes/ee267_notes_imu.pdf. [120] Z. Ziaei et al. ‘Real-Time Markerless Augmented Reality for Remote Handling System in Bad Viewing Conditions’. In: Fusion Engineering and Design. Proceedings of the 26th Symposium of Fusion Techno- logy (SOFT-26) 86.9 (1st Oct. 2011), pp. 2033–2038. ISSN: 0920-3796. DOI: 10 . 1016 / j . fusengdes . 2010 . 12 . 082. URL: http : / / www . sciencedirect.com/science/article/pii/S0920379611000160 (visited on 12/06/2019).

97 98 Appendix A

Source Code

A.1 Delay experiment code

The source code for the delay experiments are available in the repository snappytracker-delay-experiments, found here: https://github.uio.no/konraduh/snappytracker-delay-experiments.

A.2 Unity assets

The source code the Unity assets are available in the repository snappytracker-unity found here: https://github.uio.no/konraduh/snappytracker-unity.

A.3 Project North Star repository

The OSS repository for Project North Star is available here https://github.com/leapmotion/ProjectNorthStar. Alternatively, the repository is forked and available as it was during this thesis here: https://github.com/konraduh/ProjectNorthStar.

A.4 Videos & Unity package

Jottacloud is Norwegian cloud hosting service, where all resources related to this thesis is uploaded and shared through this link: https://www.jottacloud.com/s/055ed3f13f0afa44e8b9f4438ae9d38d5bb. The contents are as follows: • /delay-videos. The clips recorded for the delay experiments may be found here.

99 • /unity. All files related to Unity 3D are here. snappytracker.unitypackage may be directly imported into Unity. The other zip files are pre-compiled versions of the tests used in this thesis. • /repo. This contains zipped archives of the repositories above, if need be.

100