Masaryk University Faculty of Informatics

Multimodal Control for Augmented Reality Applications

Bachelor’s Thesis

Michal Rychetník

Brno, Spring 2019

Masaryk University Faculty of Informatics

Multimodal Control for Augmented Reality Applications

Bachelor’s Thesis

Michal Rychetník

Brno, Spring 2019

This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document.

Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Michal Rychetník

Advisor: doc. Ing. RNDr. Barbora Bühnová, Ph.D.

Consultant: Ing. Michal Košík, M.Sc.

i

Acknowledgements

I would like to thank tremendously to my advisor doc. Ing. RNDr. Barbora Bühnová, Ph.D., and consultant Ing. Michal Košík, M.Sc. that had massive patience with me, both always pushed me in the right direction and helped me immensely to make this thesis see the light of the day.

iii Abstract

As the augmented reality and smart glasses appear more prominently on the market and even start to influence everyday work in many industries, questions about the use of those technologies arise. More traditional touch control methods may be insufficient when it comes to and mainly smart glasses or headsets. On the other hand, touchless methods could be more comfortable to use and be even more efficient. Alternatively, touchless methods like eye track- ing can improve the precision of other inputs. The thesis shows the possibilities of different means of control of augmented reality appli- cation with a focus on touchless methods, which are introduced in the thesis and implemented by the prototype. This prototype was further subjected to end-user testing to discover the current possibilities and how users accept those control methods.

iv Keywords

Multimodal control, smart glasses, augmented reality, AR, input meth- ods, voice control, hand gestures, eye tracing

v

Contents

Introduction 1

1 Smart glasses and augmented reality 3 1.1 Sensors and input methods on smart glasses ...... 3 1.2 Smart glasses currently on the market ...... 4 1.3 Examples of smart glasses and augmented reality utilization 6

2 Control methods for AR applications 9 2.1 Touch methods ...... 9 2.1.1 On-device interaction ...... 10 2.1.2 On-body interaction ...... 10 2.2 Touchless methods ...... 11 2.2.1 Voice input ...... 11 2.2.2 Hand gesture ...... 12 2.2.3 Head movement ...... 12 2.2.4 Eye tracking ...... 13

3 Prototype design and development 15 3.1 Available toolkits ...... 15 3.1.1 Available toolkits for voice control ...... 15 3.1.2 Available toolkits for hand gesture ...... 21 3.1.3 Available toolkits for eye tracking ...... 22 3.2 Prototype design ...... 23 3.3 Prototype development ...... 24

4 Prototype testing 27 4.1 Testing of individual control methods ...... 27 4.1.1 Voice input testing ...... 27 4.1.2 Hand gesture testing ...... 28 4.1.3 Eye tracking testing ...... 28 4.2 Comparison test ...... 28

5 Results from prototype testing 31 5.1 Individual tests results ...... 31 5.1.1 Voice input ...... 31 5.1.2 Hand gesture ...... 33

vii 5.1.3 Eye tracking ...... 34 5.2 Comparison test ...... 34 5.3 Possible future follow-up work ...... 37

Conclusion 39

Bibliography 41

A Comparison test questionnaire 51

B Electronic appendix 55

viii List of Tables

1.1 Sensors and features on smart glasses 5 3.1 Browser support of voice input libraries 16 3.2 Languages supported by voice input libraries 17 3.3 Available functions of voice input libraries 18 3.4 Browser support of hand gesture and eye tracking toolkits 22

ix

List of Figures

4.1 The testing grid for eye tracking 29 5.1 Unsuccessful recognition rate of English commands 32 5.2 Unsuccessful recognition rate of Czech commands 33 5.3 Result distribution of comfortability of control methods usage 35 5.4 Result distribution of willingness to use control methods in public or at work 36

xi

Introduction

Augmented reality (AR) and smart glasses are not particularly new on the market, but they are neither widespread nor commonly used. Not yet, at least. Augmented reality, virtual reality (VR) and even mixed reality (MR) products are on the rise, and some predict their continuous expanding growth [1]. Utilization can be from simple applications showing a manual to a worker to more complex ones highlighting the specific object and projecting the whole work process [2]. Whatis more, some companies like DHL [3] or GE [4] already started testing ways to make use of augmented reality. There is a significant potential for these somewhat new technolo- gies and various fields from logistics and manufacturing [5, 3]to healthcare [6] or entertainment [7, 8] and military[9] can benefit from AR applications. Moreover, the influence of augmented reality is al- ready more noticeable as Apple introduced AR Measure app [10] and AR game Pokémon GO made over 2 billion dollars with the end of 2018 [11]. Nevertheless, there are still things that need sorting out, given this field is somewhat new. Since smart glasses are wearable technology and just a keyboard or touchscreen on the device could not be an option for the user, there is a need for a different way of controlling the device.

Thesis aim The main objective of the thesis is to introduce control methods for augmented reality applications in addition to presenting a prototype that could be used for end-user testing to examine the potential of each control method.

Structure of the thesis Chapter 1 introduces smart glasses and their presence on the market today. Chapter 2 takes a closer look at control methods of AR applications at the whole and presents some of the control gadgets in development that might become relevant very soon. The core part of the thesis follows, which presents the development of a prototype with currently available options to test some of the possible control methods – voice input, hand gestures, and eye tracking. Firstly,

1 Chapter 3 describes the development and design of a prototype while showing some toolkits that can be used for different inputs as well as describing their usage. Chapter 4 follows up with actual prototype testing and how exactly were those methods tested. In continuation, Chapter 5 showcases the results of those tests and their comparison with other similar studies in this area.

2 1 Smart glasses and augmented reality

Today, there is plenty of different types of smart glasses. This chapter covers the features and sensors of some currently available smart glasses. Additionally, it presents some of the smart glasses to illustrate typical features as well as to introduce some of the different available input approaches. Furthermore, this chapter also shows some of the possible utilization of smart glasses and augmented reality.

1.1 Sensors and input methods on smart glasses

Given the increasing amount of smart glasses models, there is a slight diversity in their features and what sensors and other technical spec- ification they offer. Not all of them use or can use the same input methods. Nevertheless, practically every model possesses a camera, microphone, accelerometer, magnetometer, and gyroscope. Smart glasses can use different types of camera, which can be positioned in different ways. The primary camera (or possibly more cameras) is positioned to follow the user’s field of vision, where it can capture hand gestures. The most common is a regular RGB camera, although some have a depth camera, that provides depth perception of images and makes it easier to work in 3D space to, for instance, position some elements. Alternatively, smart glasses may have an infrared camera for thermal vision. Cameras can also be positioned for other purposes such as to scan eye and act as an eye-tracking device in some capacity. The microphone is a standard feature on smart glasses, as can be seen in Table 1.1. The wearer may use microphone either for voice commands or for recordings. Speakers, on the other hand, are not necessarily a built-in feature. However, if included, they may be either regular audio speakers or speakers using bone conduction. The gyroscope, accelerometer, magnetometer, and GPS all detect the orientation of the device in some way. Gyroscope measures ori- entation with rotation around three primary axes. Accelerometer de- termines the acceleration of the device, which can then let the device detect if the wearer is in motion or being still. This detection can further aid in reducing noise input or reducing wrong input recognition (for

3 1. Smart glasses and augmented reality example, unwanted gesture inputs). A magnetometer is essentially a compass, as it evaluates magnetic fields, and can be also helpful with the calculation of the wearer’s position and orientation. Some smart glasses can be equipped with GPS so the application may tell directions, position on the map, or react to certain places like showing information about some building or distance from the destination. Another standard equipment is the light sensor for light detection, which allows the application to react to different environments and conditions, mainly to adjust the brightness of the display. Additionally, proximity sensor allows the device to detect obstacles or detect if the smart glasses are currently worn. The eye tracker is lately becoming more frequently embraced by smart glasses and headsets. Tracker reads eye movement and gaze, both of which could improve control over the device or help adjust elements on display. Lastly, smart glasses might have built-in touch sensors like a track- pad or some mechanical interface like buttons. Moreover, smart glasses with available wireless technology may be linked with external con- trollers, be it another wearable technology or some handheld device. Those can be more intuitive and efficient for controlling device. On the other hand, there is a problem of losing hands-free interaction and other possible drawbacks.

1.2 Smart glasses currently on the market

Since the introduction of Google Glass, plenty of other smart glasses models appeared on the market. Furthermore, as new models come out, different approaches and capabilities of them do as well. Somelike Glass [13] focus on the business side of usability, some as HoloLens [7] emphasize more on the entertainment side, and others like Focals [14] try to make smart glasses natural everyday experience. As touched upon in the previous section, not all smart glasses have the same features, as shows the Table 1.1 covering sensors and other features of some models.

4 1. Smart glasses and augmented reality Table 1.1: Sensors and features on smart glasses Camera Microphone Build in controller External controller Eye tracker Accelerometer Gyroscope Magnetometer Light sensor Speakers

Glass X X X - - X X X X X HoloLens X - - - X X X X X X HoloLens2 X X - - X X X X X X Focals X X - X - X X X X X Vuzix M300 a X X X - - X X X X X Vuzix Bladea X X X - - X X X X -* Moverio BT-35E b X -* X - - X X X X -* Magic Leap One [12] X X X X - X X X X X * - not built-in, possible via connector a. https://www.vuzix.com/products/compare-vuzix-smart-glasses b. https://tech.moverio.epson.com/en/bt-35e/

Glass X company’s Glass [13] in its current commercially unavailable version Glass Enterprise Edition focuses on companies, in contrast to its previ- ous, better-known Glass Explorer Edition [15]. Both versions possess camera facing users view, microphone for voice instructions, touchpad on the side, and small monocular crystal-like display. They can detect light along with orientation thanks to the gyroscope, magnetometer, and accelerometer, which are fundamental sensors available in most smart glasses’ models.

HoloLens Like Glass, HoloLens [7] by Microsoft recently introduced the second version of the product – HoloLens 2. HoloLens is more sizeable mixed reality headset than just smart glasses. It features binocular lenses display, main camera facing the user’s viewpoint and provides no on-hand input option. HoloLens 2 makes available voice input via a microphone, hand recognition via camera and eye-tracking thanks to

5 1. Smart glasses and augmented reality the two infrared cameras oriented at eyes. HoloLens can also detect head position with typical sensors and has a depth sensor, that can further improve object projection in 3D space.

Focals Focals [14], more significantly than HoloLens or Glass, aim to make smart glasses casual wearable accessory. As of now, they are custom- made with an option for customers to have prescription lenses as regular glasses or adding clips that work as sunglasses. Focals have built-in Alexa by Amazon, so the main controlling option is voice control. Nonetheless, Focals are also paired with Loop – smart ring using a joystick with a click option providing another alternative control method for smart glasses.

1.3 Examples of smart glasses and augmented reality utilization

Each model of smart glasses tries to specialize in some specific field with different features, style, or marketing. Current Glass edition focusses on non-commercial utilization in manufacturing and logistics. HoloLens headset has, besides industry or healthcare, popularity in entertainment as there are several AR and MR games available on it. Focals intent to be a smart accessory like as a modernized version of regular prescription glasses. Augmented reality is becoming everyday experience even for ca- sual users, thanks to applications like Instagram or Snapchat that let filters alter the appearance of photos and videos, or AR gamessuch as Pokémon GO. Nonetheless, there are many other more practical uses. For instance, smart glasses can improve the learning process and altogether help in education with better communication of tutor and student or creating more active group learning [16]. Smart factories can strongly benefit from utilization of augmented reality, not only in the manufacturing process itself but even in plan- ning the layout of the factory itself, where it might optimize planning with help in visualization [17]. Furthermore, in the assembly line, it may significantly help workers with showing the instructions and

6 1. Smart glasses and augmented reality process of assembling along with highlighting individual objects and operations, they must do [18]. Aeronautics can find considerable value in AR as well. Smart glasses could aid pilots to receive crucial information about the flight [19] or provide visual guidance and nondisruptive overview with warnings (such as notice of obstacles in the path) for pilots on the runway [20]. AR may improve flight control to manage and control planes on the ground as well [21]. Additionally, AR might advance aircraft design and manufacturing [22] in addition to aeronautical maintenance with training inspectors or reducing error rate while speeding up the upkeep [23]. Another area that could benefit from AR technology is healthcare, with AR and VR applications ranging from education and surgical training through mental health treatment of PTSD or physical ther- apy, to medical data gathering and overview [24]. Thanks to the AR application, patients may release stress and raise their mental and emotional health [25]. Furthermore, many applications can support healthcare workers. For instance, AR applications could assist sur- geons as a needle guiding tool for better accuracy [26]. There are many other suggested usages of augmented reality. One of the examples can be BoatAR – application for boat dealership, which virtually alters boats for customers to create the personal desired boat design [27]. Not unlike guidance in aircraft, AR could visually aid in navigation for car drivers [28], or it may act as a navigation element and graphical enhancement of landmarks for tour apps [29]. Lastly, it is worth mentioning the incorporation of AR in the military for training purposes [9].

7

2 Control methods for AR applications

Controlling augmented reality or mixed reality is different from con- trolling typical application, as augmented reality mostly builds on the mobility of user and alteration based on the current scene. It can be beneficial to use smart glasses because the user can move with them instead of just sitting in front of some display. Although with this movement comes the problem of using usual controlling methods like keyboard, mouse, or touch display. This chapter shows what possible methods for controlling devices using augmented reality – mainly smart glasses – are available, in ad- dition to presenting various approaches and devices for those specific interactions. There are different ways of dividing control methods. One of them is the separation of interaction based on the use of hands into hands-on and hands-free methods. Another possible classification is into three classes – handheld, touch, and touchless – as proposed in Interaction Methods for Smart Glasses [30]. Handheld methods depend on some handheld controller – some special device, tablet, or . Touch methods refer to the ones using input like stroke, gesture, or tap on some sensor surface. This touch can be on the device itself or additional external devices like wearable devices or touch interfaces on the user’s body. Touchless methods do not depend on any tactile input as active pressing or touching anything. It can be, for example, voice recognition or hand gesture recognition. Those methods can also use some external devices or equipment such as a haptic glove. Since the handheld methods require a constant hold on the device from the user, the emphasis is put here on the touch and touchless methods that may be more convenient for combination with smart glasses and mostly leave free hands for performing other tasks.

2.1 Touch methods

Touch methods require physical interaction from the user. It can be some device or controller as well as it can be interaction with some scanned surface or even skin, though the latter is not that common.

9 2. Control methods for AR applications

2.1.1 On-device interaction

On-device touch methods are the traditional ones utilizing the touch screen, touchpad, button, keyboard, or other input devices. Those can be either peripherals or part of the primary device itself. Smart glasses may have a touch surface build on them. For instance, Glass [13] have a touchpad on the arm of the frames. This touchpad can detect usual gestures like swipe, tap, or scroll. External device interfaces are more diverse, as they can range from small ring to large sleeve. The former represents iRing [31] that is capable of recognizing finger rotation and gestures aside from provid- ing stroke and push input. Focals [14] even utilize a ring controller Loop as a standard input device. From the other side of the specter is GestureSleeve [32]. GestureSleeve is a textile wrapped around arm able to detect touch inputs same as touchscreen does. Besides those gadgets, there is also a possibility to use smart- , that are becoming more prominent wearable technology nowadays. Alternatively, smart wristband [33] creates a larger touch screen surface than smartwatches along with wrist rotation detection. Another suggested approach is Belt [34] – smart belt, that creates touch-enabled surface all along the waist, where individual sections can create shortcuts for application.

2.1.2 On-body interaction

Several studies explore the option of making user’s skin into a touch surface. Some propose forearm [35, 36] as it has a benefit of being the relatively large and flat area. What is more, the study from 2014[37] shown that from on-body inputs, the forearm was perceived as the most comfortable to use by half of the users. Another examined body part is hand [38], with the utilization of palm, fingers, or a combination of both as an interaction surface to create an eyes-free input method. PalmType [39] creates a virtual keyboard on a palm and fingers that is optimized to shape of a hand. PalmGesture [40] inspects option to use a palm for drawing gestures such as simple shapes or letters. Alternatively, some approaches work with fingers as input surfaces. DigitSpace [41] is an eyes-free interface using finger worn widgets to

10 2. Control methods for AR applications

simulate buttons on fingers with thumb working as a pointer. Then there are more subtle devices like TIMMi [42], which is a textile device covering index finger and provides buttons for input. FingerPad [43] uses a different approach as it is attached on a nail and creates virtual touchpad on the tip of a finger. Rather than using an arm or hand, some studies present hand-to- face interaction methods [44] examining the use of chin or cheek as an interaction surface. Additionally, EarPut [45] is a gadget tucked behind the ear, enabling touch interaction on the ear that could potentially use smart glasses or other head-worn devices. Finally, some proposed devices can be more flexible about their location on the body, as iSkin [46] introduces stretchable skin-worn sensors which position can vary from the back of an ear to a forearm.

2.2 Touchless methods

Touchless methods do not require any physical interaction of the user with any device, as they rely on another source of input. The hands-free interaction methods are the main representatives of this category. However, not all touchless techniques are hands-free as the freehand interaction is included in this category as well. While hands- free methods do not rely on hands at all, freehand interaction still needs the user to move their hands in some capacity as kind of a controller.

2.2.1 Voice input

Nowadays, the voice input is becoming a more prominent feature as technology like the virtual assistants rely heavily, if not entirely, on voice commands. In order to use voice input, the device needs a connected microphone. Then again, the problem can occur with the quality of the recording, as noise can significantly reduce the understanding of commands. Therefore, for the voice command to be correctly recognized, there is a need for a quiet area or a good-quality microphone that can cancel the recorded noise.

11 2. Control methods for AR applications

2.2.2 Hand gesture

The hand gesture is a freehand interaction that utilizes either hand motion tracking or recognition of some specific gesture of a hand or both hands. Gestures can be either static or dynamic. The static gestures are, for instance, a closed fist, ok sign or finger counting. The dynamic gestures can vary from the ones where the hand can simulate inputs reminiscent of touch input such as stroke, tap or grab and drop off an item to just hand movement on its own. There are different approaches to identifying hand gestures. Firstly, the standard camera can record bare hands to identify gestures. This camera can be positioned in different places, not only on the head. For instance, CyclopsRing [47] is a ring-like device recording fisheye video to recognize hand gestures. Alternatively, WeARHand [48] uses a pair of cameras for manipulation of a 3D object with hands. Another external camera-based device Leap Motion [49] can be combined with the headset to scan hands motion similarly. Secondly, instead of recording on the camera, some additional device on a hand can detect gestures. One of the options how to do that is with tactile glove [50]. This is, for example, the Maestro Gesture Control Glove [51] that in addition to the recognition of hand gestures, might also be utilized as air-keyboard or air-mouse. Lastly, Soli [52] is a small chip using radar for gesture-recognition. Using radio waves can lead to better precision in contrast to the meth- ods mentioned above, in addition to not requiring frontal view as a camera nor necessitating wearing any additional device on a hand.

2.2.3 Head movement

Since headsets and smart glasses are positioned on head and stan- dardly come equipped with built-in gyroscope and accelerometer as well as a magnetometer, monitoring head movement is not hard to achieve. The head movement evaluation is essential when showing mounted objects for the current users’ view. Nonetheless, using head gestures as an input may prove problematic, as it can have a problem with the natural head movement of the user and tilting head for a longer time is unpleasant. Then again, as GlassGesture [53] demon- strates, head gestures might be useful for the user authentication.

12 2. Control methods for AR applications

2.2.4 Eye tracking Eye tracking might be useful for improving the precision of other input methods, as a pointer for virtual object manipulation [54] or for other adjustments of projections such as moving text along while reading it. UbiGaze [55] goes even further and utilize not just an eye gaze for object selection but even simple eye gestures (such as a triangle) as an input method. However, gaze movement recognition requires special eye-tracker, that is not a common feature of smart glasses nor headsets.

13

3 Prototype design and development

Developers nowadays have a decent number of choices in what they can use for AR application development. Some companies even pro- vide specialized toolset or even environment with their headset, like LuminOS for MagicLeap [12], that are specifically customized for head- set device. Then there are additional devices like Leap Motion [49], that also provide software toolsets for better control over their hardware device. Given the nature of this thesis with a focus on smart glasses tech- nology, screen touch input will be excluded, as this area can be easily utilized and using it on mobile devices for AR application is funda- mentally the same as using it for any other application. Then again, it is noteworthy that touch screen input can be naturally used for opera- tions such as specific gestures for zooming in and out of the scene or hold and swipe for manipulation of the augmented objects.

3.1 Available toolkits

Since the prototype’s premise is to be potentially used on mobile de- vices, the emphasis is put on tools compatible with them – primarily than on toolkits available for web browsers as they provide compat- ibility and make it straightforward to use on both Android and iOS devices. On top of that, given currently commonly available technologies, the emphasis is put on control inputs utilizing either microphone or camera. All the following libraries also work on regular hardware, as of desktop or phone, and they do not require any additional hardware or specific device (like smart glasses with Kinect or eye tracker) to work correctly.

3.1.1 Available toolkits for voice control The voice control has been lately one of the leading fields in input control for plenty of devices. With the rise of virtual assistants such as Siri or Google Assistant, the voice input is being used more and more often. This also allows using speech-to-text for any application after

15 3. Prototype design and development

Table 3.1: Browser support of voice input libraries Chrome (v. 73) Chrome (Android) (v. 73) Firefox (v. 66) Firefox (Android) (v. 66) Edge (v. 42) Safari (v. 12) Opera (v. 58)

Annyang X X - - - - - Artyom X X - - - - - JuliusJS X X X X X X X Mumble X X - - - - - Pocketspinhx.js X X X X X X X Voice-commands.js X X - - - - - Voix X X - - - - - only being given the authorization to use the device’s microphone from the user.

Web Speech API The Web Speech API [56, 57] is the key component on which many libraries build on. It has two main parts – speech recognition (speech- to-text) functionality and speech synthesis (text-to-speech) functional- ity. Before being able to use speech-to-text functions, the web page needs to ask for microphone authorization. If the connection is via HTTP, the user must allow microphone usage after every page reload. Alternatively, if instead of HTTP the connection uses HTTPS, the per- mission is later remembered, and no need for additional authorization is required. HTTPS is also quicker with recognition and response. If loading the page from local files, it acts similarly to the HTTP version, albeit it can require microphone authorization after every command recognition. Even though no official list is available, Web Speech API does support over 50 languages or dialects 1. Most libraries built on Web Speech API should be able to work with all the languages, although it

1. From languages and dialects available in the official demo seen here: https: //www.google.com/intl/en/chrome/demos/speech.html.

16 3. Prototype design and development

Table 3.2: Languages supported by voice input libraries English Czech Slovak Spanish German Russian Mandarin

Annyang X X X X X X X Artyom X - - X X X X JuliusJS X -* -* -* -* -* -* Mumble X X X X X X X Pocketspinhx.js X -* -* X X X X Voice-commands.js X X X X X X X Voix X X X X X X X * - not default support, but can be added with the new language model and acoustic model

may not be the case – as may be seen in Table 3.2. Additionally, Web Speech API is not supported by all browsers, as can be seen in Table 3.1 covering browser support of individual libraries. Since the introduction of Web Speech API in 2013, JavaScript li- braries built on it have been emerging. Those libraries provide a cleaner and better-arranged interface wrapped around original Web Speech API. They grant access to quite similar functions, although some add additional ones. The core function is adding new voice commands consisting of verbatim string to be identified and action the command triggers after its detection. Some libraries offer an option to use regular expressions to describe command along with some other additional options for command definition. There is an option to define callbacks after start or end of listening as well as one that executes if no de- fined command was detected. Some of those functions are covered in Table 3.3 with an overview of which exact libraries provide each of them.

Annyang

Annyang [58] provides straightforward and easy to use interface. It provides the possibility to add more commands at once, define gen- eral callbacks, do continuous listening, pause and resume listening without deactivating microphone. Additionally, its smaller version has a surprising miniature size under 1 kB.

17 3. Prototype design and development

Table 3.3: Available functions of voice input libraries continuous listening regular expressions (support) word capture offline functionality new grammar (definition) language switching (while running)

Annyang X X X - - X Artyom X X X - - - JuliusJS X - - X X - Mumble X X - - - X Pocketspinhx.js X - - X X - Voice-commands.js X X - - - X Voix ------

Encountered issues and notes: In command definition, Annyang can take optional words ((go) up can be recognized as ‘go up’ or just ‘up’), variables (go :direction executes function(direction)) or splats for capturing multiple words (they need to be at the end of the command). Using regular expressions in command definition is a bit tricky, as there needs to be defined string definition of command besides additional regular expression definition.

Artyom In contrast to all the other JS libraries introduced here, Artyom [59] is the only one utilizing text-to-speech in addition to the speech-to-text functionality given by Web Speech API. Text-to-speech adds a new layer to communication and possible control over application as it simulates dialog between application and user. It also allows adding a list of commands at once. There are two types of commands – regular and special smart commands. The smart commands provide an option to use regular expressions in a definition of command and an option to use word or phrase said by the user in some further functions. This may be used for name recognition or adding a word to the database. Additionally, Artyom offers some other helpful functions. An op- tion to simulate command via text instead of saying it may help in

18 3. Prototype design and development

testing or act as an alternative if the microphone is unavailable for a user. Adding a name for the system, so every command must start with this given name, can turn the application in a sort of virtual assistant.

Encountered issues and notes: The problem with using Czech in Artyom is with a diacritic (for Czech, it does support though, for instance, Spanish), which it does not support. Otherwise, it has no problem initializing with Czech language and does detect words with- out the diacritic.

Mumble Mumble [60] is essentially an extension of Annyang. It retains the same functions while slightly changing the interface. It simplifies the use of regular expressions in the command definition. It adds name property for commands, that serves as a unique identifier for other functions such as removing or changing command.

Encountered issues and notes: Using a name identifier for com- mands can be useful, but it may also be tiresome when defining more commands since the name must be unique and always must be speci- fied.

Voice-commands.js Voice-commands.js [61] is a small library providing just the elemen- tary operations. It supports regular expressions, naming individual commands and provides callbacks for start and end of listening. Dis- tinctively as the only library mentioned here, it provides an option to change minimal confidence level for the whole listening class and choose the specific command. On the other hand, it does not let remove or change existing command after its definition.

Encountered issues and notes: Even though it should support Czech, and it mostly does, it has a problem with some words (for instance it has no problem with detecting ‘tři’, ‘medvěd’, ‘počkej’ or ‘šelest’ but cannot detect ‘přepni’, ‘devět’, ‘čtyři’ and ‘šest’).

19 3. Prototype design and development

Voix Voix [62] is another simple library providing only the core functionality of the Web Speech API. Among those is initialization with just an abbreviation of language to be used, adding and removing commands, start and stop. Then again, its interface also provides key binding (by default ‘v’ key), which indicates voice recognition after pressing the designated key (Voix listens only if the key is pressed down).

Encountered issues and notes: In order to change key bounded for voice recognition, it must be rewritten in Voix class; there is not a method for just omitting the key press to start listening. API does not provide a function to add more commands at once, so they need to be added one by one, though they can be linked one after another. Even though it should support continuous listening, it has a prob- lem with detecting more commands at once.

CMUSpinx – Pocketspinx CMUSpinx2 is a toolkit that can be used on devices from a desktop to simple embedded systems. It is language independent, but there is a need for language-specific acoustic and language model. Languages that have those models provided are listed below in Table 3.2. The toolkit is made up of four main tools, one of which – Pocketspinx – is a recognizer library with the port to JavaScript Pocketspinx.js [63]. A developer can add new words to recognize the use of pronunciation spelling transcript. For that, the CMU pronunciation dictionary is available online, though only the main pronunciation is listed in there and no other alternative pronunciations considering accents. Given its nature, Pocketspinx.js requires a more in-depth understanding of its API and how it works than Web Speech API libraries do.

Encountered issues and notes: Pocketspinx.js needs to be accessed with HTTPS, as it does not work at all otherwise. Since every word needs pronunciation definition specifically added, the recognition may

2. https://cmusphinx.github.io/

20 3. Prototype design and development

lack behind Web Speech API as it puts more work on the developer. The JavaScript version also has an extreme amount of false positive recognition in comparison to the Web Speech API toolkits.

Julius Julius3 works similarly to CMUSpinx in that it can be used for em- bedded systems, has a JavaScript port JuliusJS [64], and while it is not language dependent, it needs specific language and acoustic model to work. In the default version, Julius does not have any of those models, though there are official Japanese models downloadable and English models created by users.

Encountered issues and notes: JuliusJS project was never actually finished. The last public update was in 2016 and had left some things just partially done. Both JuliusJS and PocketspinxJS has a problem with reducing the noise and detect seemingly random words from provided grammar, though the quality of microphone also plays a significant role in this. However, even with a good-quality microphone, the false positive rate is significant, and sometimes the recognition even seems random.

3.1.2 Available toolkits for hand gesture In contrast to voice control, the hand gesture is not currently widely used for any purpose. Kinect let user control applications, with mo- tion and gestures of the whole body. It was used for Xbox as a game controller and is still currently used to detect movement by some application and devices like HoloLens. Hand detection is either linked with neural networks and object detection or with motion detection. Using neural networks is more accurate, but it requires a trained model for recognition, needs more storage, and is computationally more expensive. Motion detection is less demanding but does not detect only the hand itself and more importantly requires a static camera, as moving position of the camera changes the whole scene and invalidates the motion detection.

3. https://github.com/julius-speech/julius

21 3. Prototype design and development

Table 3.4: Browser support of hand gesture and eye tracking toolkits Chrome (v. 73) Chrome (Android) (v. 73) Firefox (v. 66) Firefox (Android) (v. 66) Edge (v. 42) Safari (v. 12) Opera (v. 58)

Diff-cam-engine X X X X - X X Handtrack.js X X X X X X X WebGazer.js X X - - X X X

Handtrack.js Handtrack.js [65] apply neural network trained on images of the hu- man hand to detect hands not only on static images but also manages to track hands in real time input via webcam on website. Handtrack.js is developed with TensorFlow, a platform for machine learning us- able for different object detection, among other things. It provides the option to adjust confidence level detection or a maximum number of recognized objects so that it can detect just one hand with the best precision.

Diff-cam-engine Diff-cam-engine [66] utilizes the other approach in contrast toHand- track.js. It can detect motion from live stream webcam input thanks to a heatmap of color changes. This technique is less demanding and easier utilizable than neural network object detection. On the other hand, the camera requires to be static, and the user is not able to move around, as every movement is detected and evaluated. From this comes a similar potential problem with another movement in the scanned scene, that may be falsely accepted as well. Supported browsers for Diff-cam-engine along the rest non-voice toolkits are covered in Table 3.4.

3.1.3 Available toolkits for eye tracking Even though eye gaze or eye movement is likewise hand gesture not used for any everyday use, it is used in marketing for creating a

22 3. Prototype design and development

heatmap of testers attention to specific sections for commercials or web design [67].

WebGazer.js WebGazer.js [68] is a JavaScript library that provides real-time eye tracking via webcam for web pages. WebGazer.js learns prediction based on the user following the cursor with eyes, so naturally, the more data it gets, the better the prediction will be. It works entirely in the client browser, so there is no load on the side. There is also an option to save calibration data after closing the tab or browser so that the user does not have to let WebGazer learn from scratch again.

Encountered issues and notes: WebGazer.js does work in Chrome and Opera, but it has a problem with initialization in Firefox. Since it is running in the client’s browser, it can have a problem with running on mobile browsers as well as it lags in Edge. Predictions of eye gaze are most of the time well-rounded in the right area, though they can sometimes wildly jump from one end of the screen to another.

3.2 Prototype design

The primary objective of the prototype is the ability to run on different mobile platforms as well as on desktop for better access to different input methods with better-quality equipment – primarily camera and microphone – besides the bigger screen of desktop needed for proper examination of the eye tracking as well as for better control of hand gestures. For those reasons, the priority was web frameworks that would be able to function on both mobile devices and desktops. For the sole voice input tests, the toolkits built on Web Speech API were chosen as they provide a better interface and generally had a better response in demos. As Annyang had the best results in the first wave of tests (17 of 22 users that took the initial test preferred it over Diff-cam-engine), it was implemented for the second part comparing directions.

23 3. Prototype design and development

Both the Diff-cam-engine toolkit and Handtrack.js were tested for hand gestures recognition. Because Handtrack.js felt more natural to use for more users, it was selected for the comparison test.

3.3 Prototype development

The prototype was developed for browser with JavaScript, HTML, and CSS, with the functional part being written in JavaScript. Most of the toolkits need to be opened with HTTPS to work accurately. HTTPS also provides quicker response and remembers authorization, so that the webpage can use a microphone and camera constantly while being opened in the browser. Each library has its page dedicated to every test. Beside specific parts as a field with instructions or randomly generated maze, the implementations are left relatively without a change in each version. The voice input pages are the ones, that change the most, as every toolkit provides slightly varying API. Moreover, since the primary goal of individual tests for voice commands is to evaluate the recognition rate of specific phrases, instead of using regular expressions the exact string description is used in command definition to distinguish the difference between similar phrases (such as ‘go up’ and ‘up’). However, for direction commands, the regular expressions were used, as the exact structure was not relevant at this point. For the number dictation the word catcher was used, if available, otherwise the exact string for tested digital input was defined. An- nyang has a quite straightforward structure of command definition that can catch one word, where the text after colon signals variable which will take everything that comes after the recognized text. ’number:num’: function(num){ writeDown(num); } The API offers an alternative version that catches more words at the end of the sentence, though this form might be less precise for numbers with decimal places as it may split the number into two. ’number*num’: function(num){ writeDown(num);

24 3. Prototype design and development

} The command in Artyom has a similar structure, with an exception that it needs to be flagged as a smart command and does not letto name the variable. indexes:[’number*’], smart:true, action:(i,wildcard) => { writeDown(wildcard); } Finally, Mumble uses the regular expression structure with an option to use wildcard in any place of the command and the structure is more complicated than previous two. name:’numbers’, command: /^number(\d+)(.)?(\d+)?/, action: function(number, point, decimal){ let result= number; if(point){ result += point+ decimal; } writeDown(result); } As neither Voice-commands.js nor Voix offers similar function- ality, they need to have a command for every number on its own. Alternatively, instead of saying the whole number, it can be dictated consecutively digit by digit. There are two options for implementing direction recognition for hand gestures. On the one hand, the motion of the hand can be cal- culated and then evaluated the most significant axis difference in the given interval. On the other hand, the threshold for margin of the video can be outlined, and the recognition of hand or motion behind the border evaluates as one of the directions. Eventually, the prototype uses the latter, as this method creates fewer false positive results as well as it requires fewer operations from the program. The eye tracking uses a similar approach of margin as hand motion with a modification of using the whole page background as a canvas for eye gaze prediction instead of the recorded video.

25 3. Prototype design and development

The prototype is available online on https://rychemi.github.io/ mcfara/. The page offers access to all the tests and the questionnaire. The source code is also in the electronic appendix.

26 4 Prototype testing

The testing of control methods consists of two main parts. The first part focuses on the capabilities of individual libraries presented in the previous chapter. The second part aims at the comparison of control methods between each other in four basic direction recognition. This chapter describes the premise of those end-user tests and what was the plan of each test.

4.1 Testing of individual control methods

Before comparing the different means of inputs, the individual analy- sis of each control method was examined. Since every control method has a distinct characteristic and not necessarily offer the same con- trol options, the specific tests were held for voice commands, hand gestures, and eye tracking.

4.1.1 Voice input testing Each voice input library was tested for English language and, if sup- ported, for Czech as well. The test itself consists of two parts. Firstly, the sample commands that could be useful for controlling augmented reality application or smart glasses have been tested on each voice input library. The commands tested in English are: start, stop, wait, pause, up, down, left, right, go up, go down, go left, go right, next, back, previous, continue, hide, show, open, close, turn on, turn off, swap, switch, zoom in, zoom out. Essentially equivalent commands were tested in Czech. The complete list is: zapni, vypni, počkej, pokračuj, nahoru, dolů, doleva, doprava, posuň nahoru, posuň dolů, posuň doleva, posuň doprava, dál, zpět, zpátky, předchozí, další, skryj, ukaž, zobraz, otevři, zavři, vyměň, přepni, přibliž, oddal. Secondly, the capability of receiving digital input tested sample numbers: 1, 5, 8, 3.14, 76, 673.26, 1256.5, that the participant had to say after the introductory phrase (‘number’ for English, ‘napiš’ for Czech). Both commands lists were dictated twice by each participant. If the command was not detected, there was an option to repeat the phrase once after the first failed attempt if there was no false positive recog-

27 4. Prototype testing nition. If then the phrase was unrecognized or evaluated mistakenly, the attempt was marked as incorrect.

4.1.2 Hand gesture testing Hand gestures were tested for recognition of basic directions up, down, left, and right to test basic capabilities of the libraries. Tested directions went in the following order: up, down, left, right, down, left, up, right, down, up, right, left, up, right, down, left. Each command had a time limit of 10 seconds. After the passing of the limit, the command was evaluated as erroneous. Furthermore, to find out how and if at all the toolkits canwork in different conditions, three distinctive light scenarios were tested. The first scenario was the natural ambient light to test capabilities in normal conditions. The other two were more extreme conditions – the bright sunlight and the dim lamplight.

4.1.3 Eye tracking testing The eye tracking was tested for its accuracy in a grid of 8 target areas, consisting of three rings, the scheme of which can be seen in Figure 4.1. The whole target was 4.5cm wide, with each ring being approximately 0.7cm wide on the screen. After calibration of the toolkit, the tester had to look at the center of the target for 3 seconds before moving to another area. The areas were looked at in the following order: middle center, bottom left, top right, middle left, middle right, bottom center, bottom right, top center, bottom left, middle right, bottom center, top right, bottom right, middle center, middle left, top center. Analogously to hand gestures, eye tracking was tested in three different light conditions – bright sunlight, natural daylight, anddim lamplight. The objective of this test was to determine how precise eye tracking is, and with what smallest area the developer can roughly work.

4.2 Comparison test

The testing to compare control approaches, where every input method had to perform direction test, were held after the initial tests and their

28 4. Prototype testing

Figure 4.1: The testing grid for eye tracking evaluation. The comparison test had two possible commands scripts. The first script that considered the possibility of command repetition was: up, up, down, down, left, right, left, right, up, down, up, down, left, left, right, right. The shorter second was: up, left, down, right, up, right, down, left, up. Besides the own correct direction performance, the time between each direction, and the complete time of the script execution was measured. Additionally, for each input method, the tester had an option to navigate through the randomly generated maze before the test itself to learn with a given control method and better understand how the controls work. Alternatively, the tester was free to use the maze application after the direction test to get a better feel of every input method. The other part of the test was a questionnaire, consisting of choos- ing the preferred control method, ranking comfortability of each con- trol methods and ranking if any methods would be unpleasant to use in public or at work. The complete questionnaire is available as an appendix.

29

5 Results from prototype testing

This chapter presents the results of end-user testing and suggests pos- sible follow-up work and research. Moreover, the chapter considers potential issues and problems that may arise from employing aug- mented reality and smart glasses in everyday routine. Both individual test and comparison test used H1 Zoom recorder as a microphone for voice input. The video quality of webcam for hand gestures and eye tracking tests was 360p and 30fps. Additionally, tests ran on the system with a dual-core CPU of the 2.5GHz base frequency. The most demanding was WebGazer.js, which took steadily 90% of dual CPU power on average, followed by Handtrack.js with 65% that could though peak to even 100%. Diff-cam-engine typically required around 30% with 10% deviation. Lastly, the Web Speech API toolkits needed merely about 3% of the CPU power. Moreover, neither library had any major network demands.

5.1 Individual tests results

Individual tests were completed in advance of the comparison test to see if any need for modifications in implementation would arise and to test possible conditions in which can the input methods work.

5.1.1 Voice input Total of 22 participants took the test for English voice commands, from which 12 are Czech native speakers, 6 Spanish, 3 Slovak, and 1 English. Neither had a rather thick accent or any speech impediment. The complete list of recognition fail rate can be seen in Figure 5.1, where the Annyang is not included, as it had nearly 100% success rate, with only four times detecting right instead of wait. Mumble was similarly successful, with just a couple of misheard commands. If the command is not listed in the figure, it had a total 100% success rate. Though that does not mean that all those phrases were recognized after every first try. The commands regularly repeated across all tested libraries were wait, pause, up, left, right, go up, go left, hide, swap, and zoom in. The two most problematic words were hide and up, which

31 5. Results from prototype testing

Figure 5.1: Unsuccessful recognition rate of English commands

almost always had to be repeated, and even then for most toolkits, the recognition was usually unsuccessful. From testing Voice-commands.js, two main problems transpired. First, with the minimal confidence of 3, the phrases with ‘go’ were accepted without the first part was not said (‘go up’ accepted as just ‘up’, ‘go left’ as ‘left’ and so on). This is because the direction without the go part was added to the command list sooner. However, due to the commands usually taking both versions for the same action, it does not have to be ultimately wrong for the direction recognition. Additionally, if the command definition is switched, the recognition is not always correct and sometimes uses the other definition, be it with or without go. The second problem was more critical, as a significant portion of commands was activated after detection twice in more than half of the cases. In number dictation, the only problem that occurred was occasion- ally omitting point and decimal places, mainly with 1256.5. However, Voix and Voice-commands.js had a problem with basic number dicta- tion as the results were mostly incorrect.

32 5. Results from prototype testing

Figure 5.2: Unsuccessful recognition rate of Czech commands

The Czech voice commands were tested with 10 native speakers. In comparison to the English phrases, the Czech ones, even if needed to be repeated, had a better recognition rate in total, as the unsuccessful recognition rate can be seen in Figure 5.2. However, neither library was able to recognize the word ‘oddal’. Furthermore, the Voix had a problem to detect the words ‘skryj’ and ‘posuň’. Dictation of numbers in Czech proved to be more problematic than in English, as neither library was able to detect 1256.5 or 673.26. How- ever, Annyang and Mumble correctly recognized all other numbers. Moreover, unlike in English, the small numbers from one to five are written as numbers and not as words.

5.1.2 Hand gesture

22 participants with complexion varying from pale to light brown took the hand gestures test. The initial tests performed in bright light showed that neither Diff-cam-engine nor Handtrack.js could work in a bright light condition, as they nearly never even detected any hand movement at all.

33 5. Results from prototype testing

Both libraries proved to function better in the dim lamplight, though with difficulties. Even though the Handtrack.js mostly de- tected the hand correctly, it sometimes recognized some objects falsely as a hand. What is more, the recognition had a slight delay. Neverthe- less, the success rate of the library was 69%. Diff-cam-engine showed better results, with an 80% success rate. Finally, both libraries performed nearly perfect score in natural day- light. The success rate of Diff-cam-engine was 96%, for Handtrack.js it was 97%.

5.1.3 Eye tracking Eye tracking tests turned out to be ineffective in bright light as well as in candlelight, due to the face-tracking feature having a problem to recognize a person in both environments. With that said, the face detection sometimes had a problem to detect a face even in daylight, as nearly half of the participants experienced a problem with initial face recognition. Then again, after the second or rarely the third try, the face was always successfully recognized. Eye tracking tested 11 participants with somewhat mixed results. In five cases, the eye tracking oscillated around the midpoint most of the time within the whole designated area. However, in the rest of the cases, the prediction hugely diverted, with the general are mostly maintained but out of the designated area for the most part. The predictions enormously shifted toward the right in two cases (so the left targets were recognized as the middle ones). In one case, projection inclined toward the lower right (where the center target was predicted as the lower right target), and in another, they seemed nearly random and chaotic (target 7 was predicted as 6, target 4 as 3, and others shifted toward upper right direction with none being accurate at all). Skin color seemed not to affect those results, as all participants had a pale or light complexion.

5.2 Comparison test

The comparison test took 33 participants between age 20 to 56, with the average age 23. The native language distribution is 16 Czech, 4 Slovak,

34 5. Results from prototype testing

Figure 5.3: Result distribution of comfortability of control methods usage

12 Spanish, and 1 English. Three participants have a strong accent. Finally, 13 participants need to wear prescription glasses or contact lenses. 61% chose voice input as a preferred method, while the other pre- ferred hand gestures. All the participants found the voice commands as comfortable to use – more than half found it even very comfortable. Hand gestures comfort level was lower, with 12% of participants stat- ing that the usage of it is slightly unpleasant. The most divisive was eye tracking, which more than half of the people found uncomfortable. The full results are visualized in the bar chart in Figure 5.3. Most of the participants would not mind using hand gestures in public or at work. 76% would not mind using voice commands and 70% eye tracking. The most negative response got eye tracking, as some of the participants objected to the recording of the eyes for a more extended period if the eye tracking would be for other than passive usage. The complete results can be seen in Figure 5.4. In terms of the success rate, the most precise was hand gesture control with 71% of correct direction recognition, then with 68% the

35 5. Results from prototype testing

Figure 5.4: Result distribution of willingness to use control methods in public or at work

voice input and lastly with 52% eye tracking. The longest average time for executing the whole script has a voice input with 46.02 seconds. Eye tracking was the quickest with 36.35 seconds. Hand gestures were in the middle with 41.86 seconds. The skin color had not affected eye tracking or hand gestures method. Moreover, the accent proved to be insignificant for the basic commands, even participants with strong accents had no substantial problem with command recognition. As half of the participants wear prescription lenses in some capacity, this proved to be a problem for eye tracking, which does not work with glasses on. This, however, can be fixed with an option to incorporate prescription lenses in some smart glasses. Furthermore, disability can be another problem with using touch- less control methods. The hand tremor might be an obstacle for hand gestures, the strabismus restricts the use of eye tracking and stammer, or another speech impediment creates a complication for correct voice commands recognition.

36 5. Results from prototype testing 5.3 Possible future follow-up work

As the compared control methods were only voice commands, hand gestures, and eye tracking, it would be interesting to compare touch- less and touch methods with the employment of some external devices like a smart ring. Furthermore, the cooperation of methods like eye tracking and hand gestures for better object manipulation (as this ap- proach is available for HoloLens), could have interesting results. Head movement that was not tested in the thesis could also aid similarly as eye tracking in better precision with object manipulation. Incorporating smart glasses in everyday use may have some draw- back. As the research of University in Maribor from 2018 shown, there is a problem with visual acuity after the use of smart glasses for some time, but more significantly, there was a high percentage of occurrence of scotoma where projection on the smart glasses was shown [69]. The potential health issues with the comfort felt by wearers is another topic worth to investigate further. Additionally, the smart glasses may not be such a significant boost for all the users. The research from 2017 showed that thanks to smart glasses, the efficiency increased considerably for young people under 29 but almost not at all for adults over 65 [70]. This might not be that important for using AR in the workplace, though it demonstrates that not every group benefit from it the same. Therefore, it might alsobe interesting to compare the handling and performance of AR and smart glasses by people in different industries with varying tasks.

37

Conclusion

Even though it is just a few years that the smart glasses have become more visible technology on the market, their presence is felt in many fields, from manufacturing and logistics to education and healthcare. Similarly, the augmented reality is more and more prominent than ever before. The examples of smart glasses and their possible, as well as already implemented, usage was the first main topic this thesis covered. With far-reaching potential use, the thesis presented several control methods that were suggested or are already used in some capacity. Those methods vary from touch methods with on-body devices to touchless methods that the thesis paid the most attention to in usability and testing. The core part of the thesis was the introduction of the prototype, that would show the currently available possibilities in touchless con- trol for applications. This prototype utilized three different input meth- ods – the voice commands, hand gestures, and eye tracking. This pro- totype was an instrument in end-user testing for finding out how the users perceived each control method. The preferred method turned out to be hand gestures. What is more, the survey showed that voice commands and hand gestures are found comfortable to use. The eye tracking was more divisive, as over the half participants did not like to use it actively. Nevertheless, participants were mostly not bothered about using neither touchless method around other people.

39

Bibliography

1. Virtual Reality and Augmented Reality Device Market Worth $1.8 Billion in 2018 [online]. CCS Insight, 2018 [visited on 2019-04-24]. Available from: https://www.ccsinsight.com/press/company- news / 3451 - virtual - reality - and - augmented - reality - device-market-worth-18-billion-in-2018/. 2. RENNER, P. Prompting Techniques for Guidance and Action As- sistance Using Augmented-Reality Smart-Glasses. In: 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 2018, pp. 820–822. Available from DOI: 10.1109/VR.2018.8446292. 3. DHL successfully tests Augmented Reality application in warehouse [online]. DHL International GmbH, 2015 [visited on 2019-04- 24]. Available from: https : / / www . dhl . com / en / press / releases / releases _ 2015 / logistics / dhl _ successfully _ tests_augmented_reality_application_in_warehouse.html. 4. GE Augmented Reality | Healthcare | Renewable Energy | Aviation [online]. Upskill, 2017 [visited on 2019-04-24]. Available from: https://upskill.io/landing/upskill-and-ge/. 5. SYBERFELDT, A.; DANIELSSON, O.; GUSTAVSSON, P. Aug- mented Reality Smart Glasses in the Smart Factory: Product Eval- uation Guidelines and Review of Available Products. IEEE Access. 2017, vol. 5, pp. 9118–9130. ISSN 2169-3536. Available from DOI: 10.1109/ACCESS.2017.2703952. 6. GÖKEN, M.; BAŞOĞLU, A. N.; DABIC, M. Exploring adoption of smart glasses: Applications in medical industry. In: 2016 Portland International Conference on Management of Engineering and Tech- nology (PICMET). 2016, pp. 3175–3184. Available from DOI: 10. 1109/PICMET.2016.7806835. 7. Microsoft HoloLens | Mixed Reality Technology for Business [online]. Microsoft, 2019 [visited on 2019-04-24]. Available from: https: //www.microsoft.com/en-us/hololens. 8. Fragments | Asobo Studio [online]. Asobo Studio company, 2017 [visited on 2019-04-24]. Available from: http://www.asobostudio. com/games/fragments.

41 BIBLIOGRAPHY

9. RATHNAYAKE, W. G. R. M. P. S. Usage of Mixed Reality for Military Simulations. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT). 2018, pp. 1–5. Available from DOI: 10.1109/ICCTCT.2018.8550993. 10. Use the Measure app on your iPhone or iPad - Apple Support [online]. Apple Inc., 2018 [visited on 2019-04-24]. Available from: https: //support.apple.com/en-in/HT208924. 11. NELSON, Randy. Pokémon GO Caught Nearly $800 Million in Global Revenue Last Year, Growing 35% Over 2017 [online]. Sensor- Tower Inc., 2019 [visited on 2019-04-24]. Available from: https: //sensortower.com/blog/pokemon- go- revenue- december- 2018. 12. Magic Leap One: Creator Edition | Magic Leap [online]. Magic Leap Inc., 2019 [visited on 2019-04-25]. Available from: https://www. magicleap.com/magic-leap-one. 13. Glass [online]. Google Developers, 2017 [visited on 2019-04-25]. Available from: https : / / developers . google . com / glass / develop/gdk/location-sensors. 14. Explore Focals - North [online]. North Inc., 2019 [visited on 2019- 04-25]. Available from: https://www.bynorth.com/focals. 15. Locations and Sensors [online]. X Development LLC., 2018 [vis- ited on 2019-04-25]. Available from: https://www.x.company/ glass/. 16. KUMAR, N. M.; KRISHNA, P. R.; PAGADALA, P. K.; SARAVANA KUMAR, N. M. Use of Smart Glasses in Education-A Study. In: 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, An- alytics and Cloud) (I-SMAC), 2018 2nd International Conference on. 2018, pp. 56–59. Available from DOI: 10.1109/I- SMAC.2018. 8653666. 17. SHAN, W.; JIAN-FENG, L.; HAO, Z. The application of aug- mented reality technologies for factory layout. In: 2010 In- ternational Conference on Audio, Language and Image Processing. 2010, pp. 873–876. Available from DOI: 10.1109/ICALIP.2010. 5685203.

42 BIBLIOGRAPHY

18. PAELKE, V. Augmented reality in the smart factory: Support- ing workers in an industry 4.0. environment. In: Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA). 2014, pp. 1–4. ISSN 1946-0740. Available from DOI: 10.1109/ ETFA.2014.7005252. 19. GORBUNOV, A. L.; TERENZI, A.; TERENZI, G. Pocket-size aug- mented reality system for flight control. In: 2015 IEEE Virtual Reality (VR). 2015, pp. 369–369. ISSN 1087-8270. Available from DOI: 10.1109/VR.2015.7223449. 20. MOLINEROS, J.; BEHRINGER, R.; TAM, C. Vision-based aug- mented reality for pilot guidance in airport runways and taxi- ways. In: Third IEEE and ACM International Symposium on Mixed and Augmented Reality. 2004, pp. 302–303. Available from DOI: 10.1109/ISMAR.2004.66. 21. ZORZAL, E. R.; FERNANDES, A.; CASTRO, B. Using Aug- mented Reality to overlapping information in live airport cam- eras. In: 2017 19th Symposium on Virtual and Augmented Reality (SVR). 2017, pp. 253–256. Available from DOI: 10.1109/SVR. 2017.53. 22. MIZELL, D. W. Virtual reality and augmented reality in aircraft design and manufacturing. In: Proceedings of WESCON ’94. 1994, pp. 91–. ISSN 1095-791X. Available from DOI: 10.1109/WESCON. 1994.403622. 23. HINCAPIÉ, M.; CAPONIO, A.; RIOS, H.; GONZÁLEZ MENDÍVIL, E. An introduction to Augmented Reality with applications in aeronautical maintenance. In: 2011 13th International Conference on Transparent Optical Networks. 2011, pp. 1–4. ISSN 2161-2064. Available from DOI: 10.1109/ICTON.2011.5970856. 24. SIK-LANYI, C. Virtual reality healthcare system could be a poten- tial future of health consultations. In: 2017 IEEE 30th Neumann Colloquium (NC). 2017, pp. 000015–000020. Available from DOI: 10.1109/NC.2017.8263275. 25. TIVATANSAKUL, S.; OHKURA, M. Healthcare System Focusing on Emotional Aspects Using Augmented Reality - Implementa- tion of Breathing Control Application in Relaxation Service. In:

43 BIBLIOGRAPHY

2013 International Conference on Biometrics and Kansei Engineering. 2013, pp. 218–222. Available from DOI: 10.1109/ICBAKE.2013. 43. 26. MEWES, A.; HEINRICH, F.; HENSEN, B.; WACKER, F.; LA- WONN, K.; HANSEN, C. Concepts for augmented reality visu- alisation to support needle guidance inside the MRI. Healthcare Technology Letters. 2018, vol. 5, no. 5, pp. 172–176. ISSN 2053-3713. Available from DOI: 10.1049/htl.2018.5076. 27. LIU, Yishuo; ZHANG, Yichuan; ZUO, Shiliang; FU, Wai-Tat. BoatAR: A Multi-user Augmented-reality Platform for Boat. In: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. Tokyo, Japan: ACM, 2018, 74:1–74:2. VRST ’18. ISBN 978-1-4503-6086-9. Available from DOI: 10.1145/3281505. 3283392. 28. SHAHRIAR, S. Tarek; KUN, Andrew L. Camera-View Aug- mented Reality: Overlaying Navigation Instructions on a Real- Time View of the Road. In: Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. Toronto, ON, Canada: ACM, 2018, pp. 146–154. AutomotiveUI ’18. ISBN 978-1-4503-5946-7. Available from DOI: 10.1145/3239060.3240447. 29. WATTS, Nick. TigerEye: Augmented Reality for Clemson Uni- versity Tours. In: Proceedings of the 50th Annual Southeast Regional Conference. Tuscaloosa, Alabama: ACM, 2012, pp. 385–386. ACM- SE ’12. ISBN 978-1-4503-1203-5. Available from DOI: 10.1145/ 2184512.2184617. 30. LEE, L.; HUI, P. Interaction Methods for Smart Glasses: A Sur- vey. IEEE Access. 2018, vol. 6, pp. 28712–28732. ISSN 2169-3536. Available from DOI: 10.1109/ACCESS.2018.2831081. 31. OGATA, Masa; SUGIURA, Yuta; OSAWA, Hirotaka; IMAI, Mi- chita. iRing: Intelligent Ring Using Infrared Reflection. In: Pro- ceedings of the 25th Annual ACM Symposium on User Interface Soft- ware and Technology. Cambridge, Massachusetts, USA: ACM, 2012, pp. 131–136. UIST ’12. ISBN 978-1-4503-1580-7. Available from DOI: 10.1145/2380116.2380135.

44 BIBLIOGRAPHY

32. SCHNEEGASS, Stefan; VOIT, Alexandra. GestureSleeve: Using Touch Sensitive Fabrics for Gestural Input on the Forearm for Controlling Smartwatches. In: Proceedings of the 2016 ACM Inter- national Symposium on Wearable Computers. Heidelberg, Germany: ACM, 2016, pp. 108–115. ISWC ’16. ISBN 978-1-4503-4460-9. Avail- able from DOI: 10.1145/2971763.2971797. 33. HAM, Jooyeun; HONG, Jonggi; JANG, Youngkyoon; KO, Seung Hwan; WOO, Woontack. Smart Wristband: Touch-and-Motion– Tracking Wearable 3D Input Device for Smart Glasses. In: STRE- ITZ, Norbert; MARKOPOULOS, Panos (eds.). Distributed, Ambi- ent, and Pervasive Interactions. Cham: Springer International Pub- lishing, 2014, pp. 109–118. ISBN 978-3-319-07788-8. 34. DOBBELSTEIN, David; HOCK, Philipp; RUKZIO, Enrico. Belt: An Unobtrusive Touch Input Device for Head-worn Displays. In: Proceedings of the 33rd Annual ACM Conference on Human Fac- tors in Computing Systems. Seoul, Republic of Korea: ACM, 2015, pp. 2135–2138. CHI ’15. ISBN 978-1-4503-3145-6. Available from DOI: 10.1145/2702123.2702450. 35. AZAI, Takumi; OGAWA, Shuhei; OTSUKI, Mai; SHIBATA, Fu- mihisa; KIMURA, Asako. Selection and Manipulation Methods for a Menu Widget on the Human Forearm. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. Denver, Colorado, USA: ACM, 2017, pp. 357– 360. CHI EA ’17. ISBN 978-1-4503-4656-6. Available from DOI: 10.1145/3027063.3052959. 36. LIN, Shu-Yang; SU, Chao-Huai; CHENG, Kai-Yin; LIANG, Rong- Hao; KUO, Tzu-Hao; CHEN, Bing-Yu. Pub - Point Upon Body: Exploring Eyes-free Interaction and Methods on an Arm. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. Santa Barbara, California, USA: ACM, 2011, pp. 481–488. UIST ’11. ISBN 978-1-4503-0716-1. Available from DOI: 10.1145/2047196.2047259. 37. WEIGEL, Martin; MEHTA, Vikram; STEIMLE, Jürgen. More Than Touch: Understanding How People Use Skin As an Input Surface for . In: Proceedings of the SIGCHI Con- ference on Human Factors in Computing Systems. Toronto, Ontario,

45 BIBLIOGRAPHY

Canada: ACM, 2014, pp. 179–188. CHI ’14. ISBN 978-1-4503-2473- 1. Available from DOI: 10.1145/2556288.2557239. 38. GUSTAFSON, Sean G.; RABE, Bernhard; BAUDISCH, Patrick M. Understanding Palm-based Imaginary Interfaces: The Role of Visual and Tactile Cues when Browsing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Paris, France: ACM, 2013, pp. 889–898. CHI ’13. ISBN 978-1-4503-1899-0. Available from DOI: 10.1145/2470654.2466114. 39. WANG, Cheng-Yao; CHU, Wei-Chen; CHIU, Po-Tsung; HSIU, Min-Chieh; CHIANG, Yih-Harn; CHEN, Mike Y. PalmType: Us- ing Palms As Keyboards for Smart Glasses. In: Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services. Copenhagen, Denmark: ACM, 2015, pp. 153–160. MobileHCI ’15. ISBN 978-1-4503-3652-9. Available from DOI: 10.1145/2785830.2785886. 40. WANG, Cheng-Yao; HSIU, Min-Chieh; CHIU, Po-Tsung; CHANG, Chiao-Hui; CHAN, Liwei; CHEN, Bing-Yu; CHEN, Mike Y. PalmGesture: Using Palms As Gesture Interfaces for Eyes- free Input. In: Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services. Copenhagen, Denmark: ACM, 2015, pp. 217–226. MobileHCI ’15. ISBN 978-1-4503-3652-9. Available from DOI: 10.1145/2785830. 2785885. 41. HUANG, Da-Yuan; CHAN, Liwei; YANG, Shuo; WANG, Fan; LIANG, Rong-Hao; YANG, De-Nian; HUNG, Yi-Ping; CHEN, Bing-Yu. DigitSpace: Designing Thumb-to-Fingers Touch Inter- faces for One-Handed and Eyes-Free Interactions. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. San Jose, California, USA: ACM, 2016, pp. 1526–1537. CHI ’16. ISBN 978-1-4503-3362-7. Available from DOI: 10.1145/2858036. 2858483. 42. YOON, Sang Ho; HUO, Ke; NGUYEN, Vinh P.; RAMANI, Karthik. TIMMi: Finger-worn Textile Input Device with Multi- modal Sensing in Mobile Interaction. In: Proceedings of the Ninth International Conference on Tangible, Embedded, and Embodied In- teraction. Stanford, California, USA: ACM, 2015, pp. 269–272.

46 BIBLIOGRAPHY

TEI ’15. ISBN 978-1-4503-3305-4. Available from DOI: 10.1145/ 2677199.2680560. 43. CHAN, Liwei; LIANG, Rong-Hao; TSAI, Ming-Chang; CHENG, Kai-Yin; SU, Chao-Huai; CHEN, Mike Y.; CHENG, Wen-Huang; CHEN, Bing-Yu. FingerPad: Private and Subtle Interaction Us- ing Fingertips. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology. St. Andrews, Scotland, United Kingdom: ACM, 2013, pp. 255–260. UIST ’13. ISBN 978-1- 4503-2268-3. Available from DOI: 10.1145/2501988.2502016. 44. SERRANO, Marcos; ENS, Barrett M.; IRANI, Pourang P. Explor- ing the Use of Hand-to-face Input for Interacting with Head-worn Displays. In: Proceedings of the SIGCHI Conference on Human Fac- tors in Computing Systems. Toronto, Ontario, Canada: ACM, 2014, pp. 3181–3190. CHI ’14. ISBN 978-1-4503-2473-1. Available from DOI: 10.1145/2556288.2556984. 45. LISSERMANN, Roman; HUBER, Jochen; HADJAKOS, Aris- totelis; MÜHLHÄUSER, Max. EarPut: Augmenting Behind- the-ear Devices for Ear-based Interaction. In: CHI ’13 Extended Abstracts on Human Factors in Computing Systems. Paris, France: ACM, 2013, pp. 1323–1328. CHI EA ’13. ISBN 978-1-4503-1952-2. Available from DOI: 10.1145/2468356.2468592. 46. WEIGEL, Martin; LU, Tong; BAILLY, Gilles; OULASVIRTA, Antti; MAJIDI, Carmel; STEIMLE, Jürgen. iSkin: Flexible, Stretchable and Visually Customizable On-Body Touch Sensors for Mobile Computing. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. Seoul, Republic of Ko- rea: ACM, 2015, pp. 2991–3000. CHI ’15. ISBN 978-1-4503-3145-6. Available from DOI: 10.1145/2702123.2702391. 47. CHAN, Liwei; CHEN, Yi-Ling; HSIEH, Chi-Hao; LIANG, Rong- Hao; CHEN, Bing-Yu. CyclopsRing: Enabling Whole-Hand and Context-Aware Interactions Through a Fisheye Ring. In: Proceed- ings of the 28th Annual ACM Symposium on User Interface Software & Technology. Charlotte, NC, USA: ACM, 2015, pp. 549–556. UIST ’15. ISBN 978-1-4503-3779-3. Available from DOI: 10.1145/ 2807442.2807450.

47 BIBLIOGRAPHY

48. HA, T.; FEINER, S.; WOO, W. WeARHand: Head-worn, RGB-D camera-based, bare-hand user interface with visually enhanced depth perception. In: 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 2014, pp. 219–228. Available from DOI: 10.1109/ISMAR.2014.6948431. 49. Leap Motion [online]. LEAP MOTION, INC., 2018 [visited on 2019- 04-25]. Available from: https://www.leapmotion.com/. 50. HSIEH, Yi-Ta; JYLHÄ, Antti; JACUCCI, Giulio. Pointing and Selecting with Tactile Glove in 3D Environment. In: JACUCCI, Giulio; GAMBERINI, Luciano; FREEMAN, Jonathan; SPAG- NOLLI, Anna (eds.). Symbiotic Interaction. Cham: Springer Inter- national Publishing, 2014, pp. 133–137. ISBN 978-3-319-13500-7. 51. Gesture Control Glove | Maestro | POWER IN THE PALM OF YOUR HANDS [online]. Microndigital Corp, 2018 [visited on 2019-05-05]. Available from: http://maestroglove.com/. 52. Project Soli [online]. Google ATAP, 2019 [visited on 2019-05-05]. Available from: https://atap.google.com/soli/. 53. YI, S.; QIN, Z.; NOVAK, E.; YIN, Y.; LI, Q. GlassGesture: Explor- ing head gesture interface of smart glasses. In: IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 2016, pp. 1–9. Available from DOI: 10.1109/ INFOCOM.2016.7524542. 54. SLAMBEKOVA, Dana; BAILEY, Reynold; GEIGEL, Joe. Gaze and Gesture Based Object Manipulation in Virtual Worlds. In: Proceedings of the 18th ACM Symposium on Virtual Reality Software and Technology. Toronto, Ontario, Canada: ACM, 2012, pp. 203– 204. VRST ’12. ISBN 978-1-4503-1469-5. Available from DOI: 10. 1145/2407336.2407380. 55. BÂCE, Mihai; LEPPÄNEN, Teemu; GOMEZ, David Gil de; GOMEZ, Argenis Ramirez. ubiGaze: Ubiquitous Augmented Reality Messaging Using Gaze Gestures. In: SIGGRAPH ASIA 2016 Mobile Graphics and Interactive Applications. Macau: ACM, 2016, 11:1–11:5. SA ’16. ISBN 978-1-4503-4551-4. Available from DOI: 10.1145/2999508.2999530.

48 BIBLIOGRAPHY

56. Web Speech API - Web APIs | MDN [online]. Mozilla and individ- ual contributors, 2019 [visited on 2019-04-14]. Available from: https://developer.mozilla.org/en-US/docs/Web/API/Web_ Speech_API. 57. Web Speech API [online]. Speech API Community Group, 2019 [visited on 2019-04-14]. Available from: https://w3c.github. io/speech-api/. 58. TalAter/annyang: Speech recognition for your site [online]. Tal Ater, 2019 [visited on 2019-04-14]. Available from: https://github. com/TalAter/annyang. 59. sdkcarlos/artyom.js: A voice control - voice commands - speech recogni- tion and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website. [online]. Carlos Delgado, 2017 [visited on 2019-04-14]. Available from: https://github.com/sdkcarlos/artyom.js. 60. jrunestone/mumble: A simple Javascript framework for adding voice commands to a web site using the web speech recognition API. [online]. Johan Runsten, 2014 [visited on ]. Available from: https : / / github.com/jrunestone/mumble. 61. jimmybyrum/voice-commands.js: Simple wrapper for Javascript Speech- to-text to add voice commands. [online]. Jimmy Byrum, 2018 [vis- ited on 2019-04-14]. Available from: https : / / github . com / jimmybyrum/voice-commands.js. 62. pazguille/voix: A JavaScript library to add voice commands to your sites, apps or games. [online]. Guille Paz, 2013 [visited on 2019-04- 14]. Available from: https://github.com/pazguille/voix. 63. syl22-00/pocketsphinx.js: Speech recognition in JavaScript and We- bAssembly [online]. Sylvain Chevalier, 2019 [visited on 2019- 04-14]. Available from: https : / / github . com / syl22 - 00 / pocketsphinx.js. 64. zzmp/juliusjs: A speech recognition library for the web [online]. Zach Pomerantz, 2016 [visited on 2019-04-14]. Available from: https: //github.com/zzmp/juliusjs.

49 BIBLIOGRAPHY

65. victordibia/handtrack.js: A library for prototyping realtime hand detec- tion (bounding box), directly in the browser. [online]. Victor Dibia, 2019 [visited on 2019-05-05]. Available from: https://github. com/victordibia/handtrack.js. 66. lonekorean/diff-cam-scratchpad: Various quick demos and experiments showing the concepts behind Diff Cam. [online]. Will Boyd, 2016 [visited on 2019-05-05]. Available from: https://github.com/ lonekorean/diff-cam-scratchpad. 67. RealEye | Professional Eye-Tracking tests, same day results, starting from $59 [online]. RealEye sp. z o. o., 2019 [visited on 2019-05-05]. Available from: https://www.realeye.io/. 68. brownhci/WebGazer: WebGazer.js: Scalable Webcam EyeTracking Us- ing User Interactions [online]. Brown HCI Group, 2018 [visited on 2019-05-05]. Available from: https://github.com/brownhci/ WebGazer. 69. HERZOG, Natasa; BUCHMEISTER, B; BEHARIC, A; GAJŠEK, Brigita. Visual and optometric issues with smart glasses in Indus- try 4.0 working environment. Advances in Production Engineering & Management. 2018, vol. 13, pp. 417–428. Available from DOI: 10.14743/apem2018.4.300. 70. ISHIO, H.; KIMURA, R.; MIYAO, M. Age-dependence of work efficiency enhancement in information seeking by using see- through smart glasses. In: 2017 12th International Conference on Computer Science and Education (ICCSE). 2017, pp. 107–109. ISSN 2473-9464. Available from DOI: 10.1109/ICCSE.2017.8085472.

50 A Comparison test questionnaire

51

B Electronic appendix

WebPrototype.zip

The zip file contains the web prototype’s source code. The archive contains the web page hierarchy, individual pages accompanied with scripts dedicated to specific tests sorted into three folders for each control method, and libraries used for testing.

55