Ubipoint: Towards Non-Intrusive Mid-Air Interaction for Hardware

Home , Smartglasses

UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses Lik Hang Lee Tristan Braud Center for Ubiquitous Computing, The University of Oulu The Hong Kong University of Science and Technology Oulu, Finland Hong Kong SAR [email protected] [email protected] Farshid Hassani Bijarbooneh Pan Hui The Hong Kong University of Science and Technology The Hong Kong University of Science and Technology Hong Kong SAR Hong Kong SAR [email protected] University of Helsinki Helsinki, Finland [email protected] ABSTRACT 1 INTRODUCTION Throughout the past decade, numerous interaction techniques have In recent years, many high-grade virtual reality (VR) and augmented been designed for mobile and wearable devices. Among these de- reality (AR) headsets such as Oculus or Microsoft Hololens ap- vices, smartglasses mostly rely on hardware interfaces such as touch- peared on the market. However, due to their high price, their popu- pad and buttons, which are often cumbersome and counter-intuitive larity remains limited. Other more price-convenient options include to use. Furthermore, smartglasses feature cheap and low-power hard- smartphone headsets and low-end smartglasses, at the cost of se- ware preventing the use of advanced pointing techniques. To over- verely restricted capabilities. Smartphone headsets deliver AR and come these issues, we introduce UbiPoint, a freehand mid-air inter- VR experience to users by holding smartphones in front of users’ action technique. UbiPoint uses the monocular camera embedded sights.However, these headsets do not provide a click option and thus in smartglasses to detect the user’s hand without relying on gloves, offer limited interactions with AR objects. Low-end smartglasses markers, or sensors, enabling intuitive and non-intrusive interac- present a more price-convenient option for accessing mixed reality, tion. We introduce a computationally fast and light-weight algorithm at the cost of extended access times to the interface and cumber- for fingertip detection, which is especially suited for the limited some interactions [34]. Besides, the limited hardware of smartglasses hardware specifications and the short battery life of smartglasses. presents challenging issues, among which reduced display size, small UbiPoint processes pictures at a rate of 20 frames per second with input interface, limited computational power, and short battery life. high detection accuracy – no more than 6 pixels deviation. Our Running energy-consuming algorithms on smartglasses will severely evaluation shows that UbiPoint, as a mid-air non-intrusive interface, hurt the sustainability as stated by Yann LeCun1. On top of these delivers a better experience for users and smart glasses interactions, severely restricted capabilities, the displayed content is not directly with users completing typical tasks 1.82 times faster than when using touchable and often requires addendum hardware inherited from the the original hardware. personal computer world. These combined factors result in fatigu- ing and error-prone direct manipulation. Figure 1 displays several CCS CONCEPTS examples of the headsets currently sold on the market. • Human-centered computing → Pointing; Gestural input Another specificity of smartglasses compared to more powerful ; • Computing methodologies → Object detection; Object recog- headsets is their usage in mobility scenarios. Smartglasses need to nition. be compact and lightweight enough for mobile usage. As such, they embed low-power CPUs, small batteries, and rarely more sensors ACM Reference Format: than a cheap monocular camera. Higher-grade headsets such as Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui. Microsoft Hololens come with a significant increase of size and 2020. UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses. In Proceedings of MMSys’20: ACM Multimedia weight that limit their usage to sedentary situations. Tackling the Systems Conference (MMSys’20). ACM, New York, NY, USA, 12 pages. interaction issues on low-end smartglasses thus remains an open https://doi.org/http://dx.doi.org/10.475/123_4 question in the absence of multiple sensors or advanced algorithms. As of now, the low-end smartglasses market is still experimenting Permission to make digital or hard copies of all or part of this work for personal or with interaction techniques, and many different methods are in use, classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation none providing satisfaction to the users. Most interaction techniques on the first page. Copyrights for components of this work owned by others than ACM rely on hardware solutions that are either embedded on the glass must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a 1"Yann LeCun: AR glasses will be the killer app of energy-efficient machine learn- fee. Request permissions from [email protected]. ing," VentureBeat. https://venturebeat.com/2019/12/17/yann-lecun-ar-glasses-will-be- MMSys’20, June 8-11, 2020, Istanbul, Turkey the-killer-app-of-energy-efficient-machine-learning/ © 2020 Association for Computing Machinery. 2https://moverio.epson.com/ ACM ISBN 123-4567-24-567/08/06. . . $$15.00 3http://madgaze.com/ares/ https://doi.org/http://dx.doi.org/10.475/123_4 4https://www.microsoft.com/en-us/hololens MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui

Even high-end equipment such as Microsoft Hololens [32] proves to be as inconvenient as other solutions, the cursor is fixed to the center of the screen in the Hololens interface, while the projected hologram moves according to the user head rotation. Either a mid-air (a) (b) (c) hand gesture or a single-button controller activates the target confir- mation accordingly. However, due to the weight of the equipment Figure 1: Several AR headsets and their input methods. (a) (more than 500g), some common operations such as typing a textual Epson Moverino – external touchpad2. (b) Mad Gaze ARES – password or a URL can become physically painful, especially in the trackball3. (c) Microsoft Hololens – head activated cursor4. neck area. We further develop on these interaction issues in an introduction video9. To summarize, most AR smartglasses rely either on intrusive technique, such as external peripheral devices and buttons or on counter-intuitive manipulations such as head movement or frame [18], which forces the user to raise his arm for interaction or gestures that result in low QoE for the user [28]. consist of cumbersome external apparatus [5]. Even though smart- To address these concerns, we propose UbiPoint, an intuitive and glasses manufacturers start to adapt the interface to simplify the non-intrusive mid-air interaction technique for hardware constrained interactions, it often results in stripping functionalities. For instance, smartglasses. UbiPoint uses lightweight computer vision algorithms Google Glass relies exclusively on voice text input and does not al- to detect the fingertip of the user, providing an intuitive and efficient low manual additions to the embedded dictionary. Voice commands interface for the smartglasses. The embedded monocular camera might be inappropriate in public areas when user privacy is an issue, detects the user’s fingertip, that directly controls the movement of or issuing a voice command is socially inappropriate. In addition to the cursor through a set of gestures. We apply simple computer vi- voice command, Google Glass features a touchpad. This touchpad sion algorithms to isolate the fingertip and integrate these algorithms only features basic operations and does not present any direct point within a system-wide implementation for precise and intuitive point- or text input. Even though these two sources were designed to inter- ing on low-end hardware. UbiPoint relies on barehanded mid-air act with a specific card-based system interface, they present limited interaction and hence solves most of the issues mentioned above. usability. To reach information, the user navigates through irrelevant This strategy significantly improves the quality of experience (QoE) cards throughout the timeline and has to remember the order of those by providing a 2-dimensional (2D) surface large enough compared to cards to avoid unnecessary movements. Navigation quickly becomes the pointing object, contrary to current smartglasses’ trackballs and a tedious task, especially when the number of applications (thus the mini-touchpads. Furthermore, UbiPoint does not depend on voice or amount of cards) increases. More recent smartglasses follow the head movement, and its utilization is more concealed while respect- same trend. Vuzix Blade5 relies on a touchpad and voice control ing the user’s privacy. Finally, UbiPoint’s lightweight algorithms while Sony SmartEyeGlass6 uses an external controller. Epson’s can accommodate the low computing power of most smartglasses. Moverio smartglasses7 requires a tangible hand-held touchpad the It is important to note that UbiPoint overlays a 2D plane in the 3D size of a smartphone. This peripheral also includes most of the ac- physical environment, instead of constructing and searching point tive components, such as processor and battery. This arrangement clouds in the 3D space in a computational intensive way, enabling has been called ‘difficult to use’ and ‘clunky’, both by specialized low-cost and efficient gestural interaction on the highly constrained websites [4] and the user community8. devices. Mad Gaze applies the post-WIMP (Window, Icon, Menu, and Our contribution is fourfold – 1) Design of a barehanded freehand Pointer) paradigm [21] in the Android OS environment. The mini- interaction technique based on a fingertip detection algorithm. 2) trackball on the device provides a small area in which tap and click Development of a lightweight interaction system for AR headset and combinations form the usual input options: a single-tap moves the resource-constrained smartglasses. 3) System-wide implementation pointer, a double-tap returns to the previous page, a single-click for of UbiPoint on constrained hardware such as Google Glass and Mad selecting icon and menu, and a double-click activates the scrolling Gaze. 4) Enhancing the interaction capabilities of AR headsets from mode. However, the actual size of the mini-trackball added to the limited interactions to two-way interactions. similarity of the different options may lead to unwanted operations. For instance, if the user exerts too much force when performing a 2 RELATED WORK single-tap, it may mistakenly activate the click operation and select an item instead of moving the pointer. Another common mistake The interaction techniques proposed in the research community and may occur when moving the pointer. Due to the size of the touch commercial products can be divided into three categories: Hands- area, the user has to repeat single tap operations to point the desired free/Freehand interaction, and Direct Pointing techniques. object. When the time gap between two single taps is too narrow, the user will accidentally invoke the double-tap operation, returning 2.1 Hands-free Interaction to the previous page. Hands-free input is one of the most popular categories in the domain of interaction techniques. The absence of hand control characterizes this kind of interaction between the user and smartglasses. Voice 5https://www.vuzix.com/Products/Blade-Enterprise 6https://developer.sony.com/develop/smarteyeglass-sed-e1/ recognition, head gesture, and eye-tracking are the common exam- 7https://epson.com/For-Work/Wearables/Smart-Glasses/Moverio-BT-200-Smart- ples of hands-free input where hand control is either impractical or Glasses-%28Developer-Version-Only%29/p/V11H560020 8https://www.slant.co/options/5652/ epson-moverio-bt-200-review. 9A video of problem description: https://youtu.be/lieO_IQFmsY UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses MMSys’20, June 8-11, 2020, Istanbul, Turkey inconvenient for the user. Voice recognition technology has been touch interaction on top of often being intrusive, i.e., relying on deployed in smartglasses and became the major input method on external hardware. An exploratory study [23] shows the comparison Google Glass. Although it permits to limit the utilization of the em- between semaphoric gestural, touch, and mouse interaction in the bedded touchpad, its efficiency becomes limited in shared or noisy WIMP paradigm.The results indicate that gestural interaction suf- environments. Indeed, outside noise may cause disturbance and ob- fers from inaccurate recognition (the hit-to-miss ratio is 1:3), poor trusion [31], making this solution less preferable, or accidentally performance due to unfamiliarity with the hand gesture library, and activate some features [37]. Driven by the built-in accelerometer, a muscle fatigue. Semaphoric gestures also require long recognition third party library10 makes Google Glass supporting head-tilt ges- time compared to mouse or touch interaction and are consequently tures for smartglasses. This technology can be employed to control not appropriate for intensive tasks. Gestural interaction is thus slower text input [22] and user authentication [37]. For instance, Glass Ges- and harder to use than direct pointing in 2D interfaces. ture [37] utilizes both the built-in accelerometer and gyroscope to achieve high input accuracy. A specific sequence of head movement 2.3 Direct Pointing and Manipulation is then regarded as authentication input. However, head movement Due to the issues mentioned above, a middle ground between touch as the primary input source does not fit the ergonomic requirement and gestural interaction should be taken into consideration. Direct due to the long-term discomfort. pointing is the most common interface on computers and smart- Eye tracking [12] and facial expression technology [6] have been phones, but the mouse is not available in the smartglasses [29], and proposed for head-mounted displays. For instance, Fove is an eye- the content is no longer directly touchable [27]. To avoid addendum tracking technique for VR. The major drawback is that these tech- devices, vision-based mid-air gestural pointing techniques become niques are error-prone and require excessive calibration. Further- the only option. Very few works explore using the fingertip for more, the required hardware is not yet widely deployed in current direct pointing. Fingermouse [10] is a very early gesture-mouse AR glasses, though we see emerging numbers of commercial prod- system for wearable computers. A recent work [15] applies deep ucts such as HTC Vive Pro Eye and Tobii Pro Glasses 2. More learning to improve the accuracy of hand recognition in a cluttered explanation about these hands-free interactions can be found in [26]. environment. Another work [7] focuses on increasing the accuracy Although various hands-free techniques are proposed in the literature, finger counting interpretation, which is not ideal for tasks involv- there is no evidence showing that hands-free input with smartglasses ing fast repetition (e.g., text input). These three works do rely on outperforms other interaction techniques such as freehand interac- computation-heavy algorithms that cannot run in the constrained tion. As reported by Zheng et al. [17], human beings are good at environment of smartglasses, or are not adapted to the monocular adapting to various conditions, whether their hands are occupied or camera available in smartglasses [11]. A prototype system with a not. In other words, performing hands-free operations may not be the 3D depth camera mounted on the smartglasses combines both the necessary condition in the design of interaction techniques, and thus 2D color features and 3D depth information to detect the fingertip interactions involving hand gestures are not inferior to hands-free position. However, a costly 3D depth camera is usually unavailable input. A usability study [35] also found that the gestural input is in low-end smartglasses. As a result, a vast area is left for further preferable to on-body gestures and hand-held interfaces, especially vision-based fingertip interaction research. in an interactive environment. To the best of our knowledge, most direct pointing algorithms based on body part detection are designed for powerful systems, 2.2 Freehand Interaction where resources are abundant and do not apply to the constrained Various gestural interfaces were developed for freehand interactions hardware of smartglasses. As a continuing work of [24], UbiPoint with smartglasses. Regarding freehand interaction with smartglasses, resolves most of the problems raised by other solutions by providing various gestural interfaces were developed with diverse goals.Their a software-based direct pointing method affordable and practical to goals include facilitating digital contents in office environment20 [ ], run smoothly on present smartglasses. simulating mouse click [30], and assisting text input [28]. Other gestural interfaces require addendum sensors [16]. Myopoint [19] uses 3 SYSTEM DESIGN a commercial armband, containing electromyography and inertial UbiPoint maps the fingertip of the user to the on-screen cursor’s motion sensors for gesture detection based on the arm muscle move- position. The user can naturally and comfortably interact in mid-air ment, to achieve pointing and clicking operations. ShoeSense [2] in front of the smartglasses camera (Figure 2a). We describe our has an upward-oriented sensor installed on a shoe. Users can make design justifications in the next paragraphs. various two-armed poses in triangular form, and the sensor can read the triangular gesture. TiMMi [38] is a ring-like surface mounted on 3.1 Design Principle the index finger providing multi-modal sensing areas between the We consider that a non-intrusive interaction technique for seamless thumb and index finger. The user can perform various gestures by interaction with AR/VR headsets should fulfill the following five finger bending and pressing. However, addendum sensors introduce principles: setup time before usage, bringing inconvenience to smartglasses users. (1) The interaction technique should allow the user to perform While gestural interaction has compelling features such as be- freehand interaction in mid-air, with no additional hardware ing intuitive, it has been known to perform worse than mouse or or fiducial markers required. (2) We have seen in Section 2 that semaphoric gestures and sign 10https://github.com/thorikawa/glass-head-gesture-detector language interactions present several issues, among which MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui

24 – 32 cm

Icon 1 c Icon 2

16 – 21 cm

10 – 15 cm

Figure 2: Recommended Area of Mid-Air Interaction in the egocentric view. (a) Monocular camera embedded in the smartglasses. (b) Icons in glasses display. (c) Hand movement in front of the camera, ranged from 16×24×10 cm to 21×32×15 cm. Figure 3: State diagram for pointing gesture and swipe-based gestures. their overall complexity and dwelling time. Thus, an intuitive pointing technique similar to the mouse cursor is preferred. (3) The interaction technique should require only subtle move- of the screen for a given time (e.g. 500ms), then swiping to another ment because large movement is prone to fatigue. corner. Corner-to-corner offers 12 additional input options, which (4) An affordable yet efficient algorithm is necessary for finger- can be used to handle objects with up to 6 attributes. However, due tip based interaction technique to be implemented offline. to the potential distance from corner to corner, these gestures should Indeed, smartglasses present limited specifications, especially be kept to a minimum to prevent additional arm fatigue. in terms of computation resources and battery life. In touchscreen interfaces, the end of a gesture is defined by the (5) The interaction technique should enhance the performance user’s finger leaving the touchscreen. In mid-air, the system needs of doing various tasks, compared to other hardware solutions to distinguish deliberate from unintended hand movements to avoid such as touchpads or trackballs. the Midas Touch Problem [8]. UbiPoint uses corners and edges as endpoints of gestures, which allows for a simple unidirectional 3.2 Interaction Approaches swipe, maintaining the simplicity of the small-screen interface. The assistance of gestural input can reduce the number of functional UbiPoint provides two interaction approaches – The Freehand icons, and the saved space can be used for content display. The mode handles scenarios requiring multiple input options (e.g., Browser proposed interaction approach provides a total of 17 input options, application), while the Fast repetitive mode enables users to focus ranging from selecting icons and objects, controlling 2D interfaces on multiple select-and-click operations in a fast and responsive man- such as scrolling page up and down, to manipulating objects with ner (e.g., character input). These two interaction approaches can be multiple attributes such as 3D objects. switched depending on the input scenario [36]. Figures 3 depicts the Fast-repetitive mode mixes pointing gestures with the hardware states of two modes. touch interface of the smartglasses. The hardware handles the click Freehand mode . The cursor directly follows the user’s finger. operation for a more reactive interface. Highly responsive hardware Actions can be triggered after hovering the cursor on top of a point interfaces can support tasks requiring fast-repetitive inputs, while the of interest with visual clues. As shown in 4, we define three gesture: pointing gesture maintains the intuitiveness of freehand interactions (1) Mid-air click (Figure 4-(1a – 1d)) allows users to click on icons (Figure 4-(4a – 4d)). in the headset interface.After hovering the cursor for a given time on top of the object, the user slightly lowers his fingertip to perform a click at the designated location. As described in our implementation 3.3 Mid-air Interaction Strategy for small-screen in the next section, a hovering time of 500ms is recommended, which display strike a balance between usability and performance. We strive to balance the ease-of-use and functionality in the edge (2) Center-to-edge swipe (Figure 4-(2a – 2d)). After hovering the and corner configurations. Including more swipe gestures between cursor for a given time (e.g. 500ms) in the center of the screen, corners, edges, and center of the interface would increase complex- the user can swipe by moving his fingertip towards the edges. This ity and erroneous inputs. Counterexamples are as follows. Swipes gesture allows scrolling on pages or flipping to previous or next through multiple corners are error-prone. For instance, a user intends pages on a web browser. (3) Corner-to-corner swipe (Figure 4-(3a to perform a three-corner gesture. However, the user’s fingertip – 3d)). This mode is activated when the cursor hovers in the corner leaves the camera’s FOV after reaching the second corner, and thus UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses MMSys’20, June 8-11, 2020, Istanbul, Turkey

Figure 4: Pictorial description of Freehand Mode (and Swipe-based Gestures) and Fast-repetitive Mode. trigger a two-corner gesture. Also, multiple corner gestures increase Moreover, it corresponds precisely to the screen resolution, making the computational complexity by adding another layer on top of the the mapping of the cursor to the fingertip easier. First, the OpenCV existing system, which would mobilize more resources from the library converts the image from YUV420sp to HSV. We then split already constrained hardware. Besides, corner-to-edge and center- the image into 10x10 pixels squares to create our data structure of to-corner swipes and may trigger unwanted operations due to the the extended disjoint union representation. Finally, we disable the proximity between edges and corners in a small-screen display. Auto Exposure, Auto White Balance and set Focus to Infinity to keep a consistent image with lighting conditions changes. The finger- 4 IMPLEMENTATION tip detection consists of three phases: image segmentation, disjoint union arborescence construction, and fingertip location estimation. We integrate UbiPoint into a standalone demonstration application. These steps are represented in Figure 5. The hand detection algorithm itself and swipe-based gestures are implemented as a library written in Java, in order to be easily included in other applications. 4.1 Image Segmentation We use a camera resolution of 320x240 pixels as it is more suit- The image segmentation phase starts with converting the image able for our image segmentation process. This resolution reduces from the Android standard color space YUV420sp to HSV. We the artifacts while providing acceptable (> 15FPS) performance. then apply a threshold to extract the skin tone color and return a MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui

of nodes belonging to each depth level. The algorithm also marks the visited windows so that they will not be revisited in future scans. The image will thus be scanned in linear time.

4.3 Fingertip Location Estimation The UbiPoint algorithm returns the data structure representing the set DA, including the depth level of each node and the number of nodes at any given depth. The algorithm selects the AG with the maximum number of nodes from DA. The fingertip is located at the root node of the graph. The hand orientation can be computed by choosing the nodes on the longest path from the root node and finding the vector that connects the root node to them asshownin Figure 5. These three steps are computationally affordable to accommodate for the the low processing power of any smartglasses, from mass- market (ex: Mad Gaze, Google Glass) to higher-end headsets such Figure 5: The three phases used in our UbiPoint algorithm for as Microsoft Hololens. detecting the location of the fingertip; (a) original image, (b) segmented hand region, (c) AG of segmented hand region, (d) 4.4 Limitations and Discussion detected position of pointer on the fingertip position. As UbiPoint detects the fingertip of the user based on the human skin color, several problems may occur: (1) The system can not recognize darker skin tones due to the threshold function. This problem can be binary image. We denote the output of this phase with the binary solved by prior calibration. (2) Other individuals in the background function Ix,y ∈ {0,1} so that 0 ≤ x < W, 0 ≤ y < H, where W and may cause interference in the fingertip detection. We only perform H represent the width and height of the image. The binary function our computations on the largest detected object. As the user’s hand is Ix,y = 1 if and only if the pixel at location x,y belongs to the skin the closest to the camera, it is the most likely to be detected. Second, tone and Ix,y = 0 otherwise. For simplicity reasons, we employ the low-end glasses such as Google Glass or Mad Gaze feature cameras HSV threshold values defined by Sobottka et al. [33] for Asian and with very low sensitivity. Colors tend to fade away, and the skin color Caucasian skin (0 < H < 50,0.23 < S < 0.68,0 < V < 1). will not be detected.(3) Some large objects may be skin-colored. In Many works use morphological transformations to remove the our experiments, we encountered the case of a beige sofa that would artifacts from the resulting threshold image, however morphological be detected as a finger. Similarly to the first point, prior calibration operations (specifically opening and closing) are unpractical forthe can improve accuracy. (4) In order to construct our disjoint union limited computational power of smartglasses. Therefore, we propose arborescence, the system needs pre-configuration regarding which a filter method to remove artifacts through the construction ofa hand is used. disjoint union arborescence. UbiPoint aims at providing an interaction method for low-end smartglasses. As such, it is limited by the hardware specifications. 4.2 Disjoint Union Arborescence Construction UbiPoint works well in simple scenarios and can withstand more In graph theory, an arborescence graph [1] (AG) is a directed acyclic complex situations, thanks to our implementation choices. With the graph where there is only one path from the root node to every other evolution of hardware, we foresee further techniques to improve node in the graph. UbiPoint’s performances (multiple color spaces, stereoscopic vision, Let A⟨V,E⟩ represent the AG with the set of vertices V and the set image recognition). of edges E. We let DA = {A1⟨V,E⟩,A2⟨V,E⟩,...,Am⟨V,E⟩} denote a set of m AGs where the set of vertices for any two AGs in DA 4.5 System-Wide Implementation is disjoint. We call the set DA the disjoint union arborescence set. We extend the implementation of UbiPoint to a system-wide pointing Our UbiPoint method creates an efficient data structure for the set technique for MadGaze ARES. We focus on these smartglasses DA to be constructed from Ix,y. A node vx,y belongs to an AG as their interface follows the WIMP paradigm while providing a Ai⟨V,E⟩ according to the following criteria: vx,y ∈ V ⇔ ∀i, j ∈ full Android environment. The MadGaze trackball operates as a F, Ix i,y j = 1 Where F is set of coordinates that defines a filter of mouse in the operating system. We emulate this behavior using the size S: F = {i, j | i = 0 ∧ 0 ≤ j < S} ∪ {i, j | j = 0 ∧ 0 ≤ i < S}. uinput module, which it requires root access. For the core of our The image Ix,y is scanned by a sliding window of size S. The implementation, we reuse the Java library developed for the test condition in the equation above is applied to the center pixel of application. The uinput interactions are performed in a separate C each window. If the vertex vx,y is chosen then a new AG is added to file using the NDK. the set DA and a recursive breadth-first search is initiated to move We implement UbiPoint as an Android application 6.Although it the sliding window.The image is always scanned from the top row requires root access to emulate the trackball (which can be solved by to the bottom row. During the breadth-first search, our algorithm giving the appropriate permissions to the file during the OS building incrementally marks the depth of each node and updates the number process), UbiPoint can easily be installed without requiring heavy UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses MMSys’20, June 8-11, 2020, Istanbul, Turkey

25 1x1 camera view

overlay view for 20 pointer display

10 Frames Per Second (FPS) Google Glass 5 Figure 6: System wide implementation: the 1x1 pixel camera Google Glass avg view overlay and the pointer overlay on top of the Android OS. Mad Gaze Glass Mad Gaze Glass avg 0 0 10 20 30 40 50 60 modification to the kernel or the OS. As such, we cannot directly Time (seconds) access the hardware of the smartglasses, especially the raw video frames from the camera. For security reasons, the Java Camera API Figure 7: UbiPoint FPS on Mad Gaze and Google Glass. Mad requires displaying the camera view somewhere on the screen. We Gaze shows an average of 19.8 FPS, with a minimum of 13 FPS. skirt this limitation by reducing the view to an unnoticeable 1x1 Google Glass has an average of 17.7 FPS, with a minimum of 15 pixel box at the top left corner of the screen. We close the main FPS. activity at launch and run the image processing in a background process. We use an overlay view to display a red dot at the fingertip location on top of the system (see Figure 2b). (RMS) deviation in pixels. We perform the analysis for 100 different We focus on the Freehand interaction principles described Sec- positions of the finger. For each position, we move the fingertip toa tion 3.2: (1) Moving pointer: the red dot follows the fingertip in target location through a random path. Once the detected finger posi- real-time. (2) Click: when the pointer remains in the same location tion reaches the target location, we capture the frame and measure for more than 500ms, UbiPoint turns into action mode. The pointer the difference between the detected position by UbiPoint and the changes to green, and the user confirms the action by slightly tilting actual fingertip location, identified manually on the frame. Wethen its fingertip downwards. (3) Scroll: Stabilizing the pointer for 500ms evaluate the speed of UbiPoint in the above setting. We compute the in the center of the screen activates the scroll mode. The pointer average FPS throughout 10 frames for 60 seconds. turns to cyan and can be used to scroll through menus. The user can 5.1.2 Experiment Result. The RMS deviations for Mad Gaze quit the scroll mode either by pushing the pointer "outside" of the and Google Glass with a 320 × 240 pixel resolution are 6.415 and screen or by stabilizing the pointer for another 500ms. 6.626 pixels, respectively. The error in fingertip detection does not damage the applicability of UbiPoint, due to the two following 5 EVALUATION premises: (1) the target object to be selected (hyperlink, menu icon)is To demonstrate that UbiPoint applies to smartglasses with various usually larger than 10x10 pixels; (2) users are good at adapting to hardware specifications, we implement it on Google Glass v.1.0 and the minor deviations and readjust their physical position. Mad Gaze ARES Glass. Google Glass runs on Android 4.4.0 with Our system aims at a minimum of 15 FPS, which is the limit of a 1.2 GHz Dual Core CPU, 1.0 GB RAM, 16GB storage capacity, fluidity accepted by users in our preliminary experiments. We strive 5-megapixel camera, and 570 mAH battery life. Mad Gaze has for 25 FPS for a 320 × 240 pixel resolution, which is the maximum the following specifications: CPU with 1.2 GHz Quad Core, 512 achievable FPS value for Mad Gaze and Google Glass. Figure 7 MB RAM, 4GB storage capacity, 5-megapixel camera, and 370 shows the average FPS values. UbiPoint achieves, on average, 19.8 mAH battery life, Android 4.2.2. We choose Mad Gaze smartglasses and 17.7 FPS for Mad Gaze and Google Glass, respectively. Mad because the device has slightly more challenging hardware. Both Gaze performs on average three frames better than Google Glass be- have similar characteristics with current mass-market glasses and cause of the more advanced CPU architecture. Overall, our solution are representative of what can be achieved with present hardware. remains over 15 FPS, except on rare occasions (2 times over one We first evaluate the system performance (speed, accuracy, CPU minute). utilization, and energy consumption). We then conduct a usability study to analyze the users’ benefits from UbiPoint. 5.2 Experiment 2: CPU Utilization and Energy Consumption 5.1 Experiment 1: Accuracy and Speed 5.2.1 Experiment Setup. The CPU utilization of UbiPoint is mea- 5.1.1 Experiment Setup. We measure the speed and accuracy of sured using the Android monitor tool in Android Studio over 50 our technique with the embedded camera in Mad Gaze and Google seconds. For the baseline measurement, we terminate all the un- Glass. We compute the accuracy by measuring the root mean square necessary background applications in the smartglasses. Besides, we MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui

547 Google Glass Mad Gaze Glass 500 478

400

300

205 200 155

Energy consumption (mA) 133 111 100 68 44

Figure 8: UbiPoint CPU Utilization on Mad Gaze (Upper) and 0 Google Glass (Lower); CPU occupation reaches 19.5% on Mad Idle, Screen Off Idle, Screen On FingerPoint Camera Gaze and 35.5% on Google Glass.

Figure 9: UbiPoint energy consumption measured in mA for 300 seconds on Mad Gaze and Google Glass in comparison with measure the energy consumption of UbiPoint with the Android Bat- the device idle and camera-based activities. UbiPoint only con- teryManager tool (Specifically EXTRA_LEVEL and EXTRA_SCALE) on sumes 16% more power than regular camera usage on Mad the Mad Gaze and GoogleGlass. In order to measure the energy Gaze, and 14% on Google Glass. consumption, we run experiments in the following four scenarios: (i) Idle, with display off, (ii) Idle with display on, (iii) UbiPoint, Camera running our illustrative application, (iv) , the camera is on, only an additional 16% (Mad Gaze) and 14% (Google Glass) power- and a picture is taken every 10 seconds. consumption overhead, compared with scenario (iv) (Camera on). We record the battery discharge over 300 seconds for each sce- The computation cost of UbiPoint is minimal thanks to the low nario. Due to the construction of glasses, it is not possible to reach complexity of our algorithm. the battery pins in order to make a direct measurement. Therefore, Although the consumption of UbiPoint is high due to camera we compute the energy consumption by measuring the battery charge usage, we can consider several solutions to this problem. First of all, before and after running a scenario and compare it to its full capacity. the quick advances in mobile systems allow us to envision camera 5.2.2 Experiment Result. The results in Figure 8 show that the sensors with lower energy consumption, combined with better bat- average percentage of CPU utilization on Mad Gaze is 22%, where teries. Another hardware solution consists of embedding a second 3.5% is occupied by the system. UbiPoint uses the remaining 19.5%. low-consumption camera dedicated to UbiPoint. However, both rely On Google Glass, the CPU occupation reaches 40%, with 4.5% used on hardware solutions, where UbiPoint was designed as a software- by the system, and the remaining 35.5% occupied by UbiPoint. Con- only system. The last solution consists in activating UbiPoint only sidering that Mad Gaze has a more advanced CPU architecture, 77% when needed. For instance, the Google Glass interface allows nav- of the CPU capacity is reserved for other applications. Besides, the igating quickly, thanks to its tile interface. As such, UbiPoint may memory used by UbiPoint on Mad Gaze and Google Glass is respec- be employed for text input only, significantly reducing energy con- tively 5.76 and 5.66 MB. With recent technological advancements, sumption. we expect UbiPoint’s footprint to become negligible in the upcoming years while providing a reliable pointing technique for smartglasses. 5.3 Experiment 3: Usability Test We may improve the performance by integrating UbiPoint directly 5.3.1 Experiment Setup. We examine UbiPoint within the 2D into the OS kernel. However, for both security and stability reasons, WIMP user interface in which click operation and cursor movement it is preferable to keep the current library architecture. are the most important operations. One may argue that the WIMP Figure 9 presents the energy consumption. The proportion of interface might not be the most suitable for smartglasses. However, energy consumption for activating UbiPoint on Mad Gaze is the fol- after a quick survey, most commercial glasses exploit a barebone lowing: Running background application (28%), Display On (42%), version of Android – based on the WIMP paradigm – with the notable Camera (14%), and UbiPoint (14%). On Google Glass: Running exception of Google Glass with their TimeLine interface (which background application (12%), Display On (25%), Camera (50%), significantly reduces the possible usages), or higher-end solutions and UbiPoint (12%). Activating UbiPoint on Mad Gaze consumes an such as Microsoft Hololens. As UbiPoint is a direct pointing solution additional of 44 mA compared to the idle scenario with the display similar to a mouse, testing our solution in the environment for which on (ii). Running UbiPoint on Google Glass consumes an additional the mouse has been designed seemed the best way to perform the 342 mA comparing with scenario (ii). The increase in energy con- evaluation. For this first experiment, we focus on the fast-repetitive sumption is mainly due to the camera. Activating UbiPoint incurs mode. UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses MMSys’20, June 8-11, 2020, Istanbul, Turkey

80 78 Task 2: Mini-trackball on Mad Gaze Task 1: Fingerpoint (Mad Gaze) 60 60 Task 3: Fingerpoint (Google Glass) 50 48

40 30 32 26 20 18 13

Task Completion Time (seconds) 0 Min completion time Max completion time Average completion time

(a) Min, max and average user performance for three tasks performed on Google Glass and Mad Gaze glass.

80 78

Task 2: Mini-trackball on Mad Gaze 74 73 Task 1: Fingerpoint (Mad Gaze) 70 Task 3: Fingerpoint (Google Glass) 64 64 60 60 57 55 56

50 50 50 50 48 49 49 46 44 45 45 40 40 40 3938 38 39 36 36 37 34 35 34 33 33 32 33 32 32 32 30 30 31 31 31 30 30 30 31 30 28 29 28 28 26 27 27 2726 27 27 25 24 22 22 22 22 22 23 22

Task Completion Time (seconds) 20 20 20 20 18 18 16 17 13 10

0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 Participants

(b) Individual user performance for three tasks performed on Google Glass and Mad Gaze glass.

Figure 10: User performance for three tasks performed on Google Glass and Mad Gaze Glass with 25 participants (Participant 1-25). Participants complete the task 1.46 times faster with Mad Gaze and 1.82 times faster with Google Glass.

We design an application named Fingerpoint to measure the performance of users when using UbiPoint in a WIMP environment. The application contains an array of 24 boxes (1–24), as shown in Figure 11. We ask the users to select the box of a given randomly generated number shown on the top of the interface. For instance, the user is required to select boxes No.7, No.17, No.24 in a sequence. The user guides the cursor with his left hand in mid-air to the re- quested boxes and performs the click by pressing the button on the body of smartglasses with their right hand. We recruit a total of 25 participants (22 males and 3 females) from various mailing lists on our campus. Their ages range from 18 to 34. There are 24 Figure 11: UbiPoint implementation in our evaluation app, and right-handed and 1 left-handed. All the subjects had none or minimal the user posture with Fast-repetitive mode. experience in using mid-air freehand interaction for smartglasses or head-mounted displays. We screened out the participants who cannot clearly see the digital contents displayed in the smartglasses. task, participants use only the Mad Gaze glasses hardware to per- 24 participants had some experience of gestural interaction with form cursor movement by touching the mini-trackball on the glasses Microsoft Kinect or Wii for computer gaming, and the majority of body, and box selection by clicking on the mini-trackball. Google them had experience or heard about augmented reality, mainly from Glass does not have a real direct pointing interface, which makes mobile gaming. it impossible for us to implement our test application. This task is The application of Fingerpoint consists of three tasks, and each the baseline for comparing the task performance with our UbiPoint tasks last no longer than 5 minutes. In the first task, participants approach. In the third task, participants use UbiPoint to move the use UbiPoint to move the cursor and select boxes by clicking on cursor and perform box selection by clicking the physical button the mini-trackball on the body of Mad Gaze glasses. In a second on the Google Glass frame. In order to show the robustness of the system, we perform all experiments in a cluttered office environment, MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui with movement in the background, other individuals in the area, and can use two hands effectively in performing direct manipulation, exposure variation. All three tasks are counterbalanced to avoid inter- even when little or no training is provided [3]. Although UbiPoint nal threats to the validity of the results. Completion time is recorded used the left hand for cursor movement, i.e. subtle operations, and among all tasks, which is a commonly used objective metrics to their right hand for basic clicks, participants were able to perform understand the QoE of the proposed interaction techniques [25]. straightforwardly. On the contrary, participants acted on the mini- trackball meticulously. 5.3.2 Experiment Result. Figure 10 shows the performance of the users completing the first task (green), the second task (dark 5.4 Experiment 4: Usability Test on Text Input green), and the third task (orange). One-way ANOVA under three 5.4.1 Experiment Setup. This experiment aims at understanding task shows a significant statistical effect of the conditions (F = 2,72 the performance of Freehand and fast-repetitive modes in mid-air in- 25.298, p <0.01). The Bonferroni and Holm methods show that the terface, where text input is regarded as a fast and repetitive task [27]. performance of the techniques of Mini-trackball on Mad Gaze (the Participants are asked to select character keys across the QWERTY second task) is significantly different with the two other techniques virtual keyboard on MAD GAZE using our system-wide implementa- (p <0.01), in which UbiPoint applies on Mad Gaze (the first task) and tion of UbiPoint. The text material comes from Project Gutenberg11, Google Glass (the third task). However, no statistical significance and we randomly select word phrases from ‘Letters to a Friend’ between Task 1 and Task 3 is computed by the Bonferroni and Holm written by John Muir. The selected script is written in an easy-to- methods (p = 0.038), which implies the same techniques show no understand style so the experiment can be regarded as a reasonable performance difference on different low-end smartglasses. In other mock-up of everyday typing tasks. We perform this experiment on words, UbiPoint has consistent user performance across two devices, 5 participants (3 males and 2 females), recruited among the students although UbiPoint on these devices achieves slightly different FPS and staff of the university. English is the second language for all par- (Experiment 1). The average completion time and standard deviation ticipants, and none of them had tried Mad Gaze before. Regarding of the first task are 32.00 and 8.88 seconds. The average completion the screening process, the participants should have good vision to time and standard deviation of the second task are 48.00 and 14.32 receive the contents inside the digital overlay of the smartglasses. seconds. The average completion time and standard deviation of the We run 6 experiment sessions for each mode, and the two modes third task are 26.00 and 8.30 seconds. Some users complete the task are counterbalanced. Before running the experiment, we provide the as fast as 13 seconds with UbiPoint on Google Glass, compared to participants with a 10-minute training for each interaction method. a minimum time of 30 seconds when using Mad Gaze’s trackball. In each session, participants are asked to type words as they appear Similarly, no user completes the task in more than 60 seconds with on the screen of the smartglasses. We only allow the users to perform UbiPoint on Mad Gaze and Google Glass, where several users take single character input (no word completion). more than 70 seconds with the trackball. The proposed interaction approach of UbiPoint on Mad Gaze is, 5.4.2 Experiment Result. Figure 12 shows the results of charac- on average, 1.46 times faster in completing the task compared to ter input and erroneous input by the Fast-repetitive and Freehand the baseline solely hardware interaction (second task). Regarding modes in every session of 5 minutes. Assuming that the average Eng- Google Glass, UbiPoint is, on average, 1.82 times fastercompared lish word length is 5.10 characters, the average Word-Per-Minute to the second task. Users perform slightly faster with UbiPoint on (WPM) of the Fast-repetitive mode is 3.99 WPM with a standard Google Glass than on Mad Gaze. This is due to the glasses form deviation of 0.66 WPM, while the average Word-per-minute (WPM) factor that enables the participants to have a more comfortable hand of the Freehand mode is 0.99 WPM with a standard deviation of 0.30 posture for the click operation. Moreover, the large surface of the WPM. Participants in the 1st session commonly have a slower typing Google Glass touchpad allows for a wider area for tapping, compared speed than the later sessions. During the 1st session (S1), participants to MadGaze’s trackball. However, the difference in completion times show difficulties with hand-eye coordination for mid-air character in- for Tasks 1 and 3 remains small compared to Task 2. The disparity put. Once they get familiar with the gestural combinations, the usual in hardware interface (thus in the general system fluidity) does not QWERTY layout on a virtual keyboard enables them to enhance significantly influence the performance of UbiPoint users. The mid- their performance quickly. Therefore, participants show improved air direct pointing technique successfully shortens the time spent on performance fluctuating around 4.42 WPM (Fast-repetitive) and 1.21 reaching the designated location. WPM (Freehand) in the remaining sessions. The reasons for the improvement in task completion time are During the experiment sessions, a total of 3053 characters and twofold. First, the majority of participants experience difficulties in 760 characters are typed by Fast-repetitive and Freehand modes controlling the cursor precisely with the mini-trackball and conse- with respective error rates of 11.20% and 13.42%, respectively. Both quently need to re-locate the cursor repeatedly. Additionally, as both Fast-repetitive and Freehand modes show decreasing error rates the tap and click operations are performed on the mini-trackball, as participants have improved their eye-hand coordination in later some participants mistakenly trigger the cursor movement while sessions. The error rates of first-half (S1–S3) and second-half (S4 pressing the button down. These observations expose the deficien- –S6) sessions for Fast-repetitive mode are 13.57% and 9.30%, while cies of such a small interaction surface. UbiPoint alleviates these error rates of first-half (S1–S3) and second-half (S4 –S6) sessions issues by offloading some workload from mini-trackball to mid-air for Freehand mode are 15.88% and 11.85%. interaction interface. UbiPoint provides an intuitive human-smartglasses interaction by directly moving the fingertip to the designated location. Human users 11https://www.gutenberg.org/ UbiPoint: Towards Non-intrusive Mid-Air Interaction for Hardware Constrained Smart Glasses MMSys’20, June 8-11, 2020, Istanbul, Turkey

Freehand Fast Repetitive 5 5

4 4

3 3

2 2

1 1 Words Per Minute 0 0 1 2 3 4 5 6 1 2 3 4 5 6

15 15

10 10 Figure 13: Mid-air interaction on Wakkamole with UbiPoint. 5 5 Number of errors 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Session number Session number After trying the prototypical application, the participants have suggested many potential uses of UbiPoint. For instance, the Fast- Figure 12: UbiPoint character input speed with Fast-repetitive repetitive mode can be extended into mid-air drawing, text typing on and Freehand modes (5 participants). a virtual keyboard and gestural (iconic) input based on hand-stroke drawing (circle, triangle, square), where the input is initiated when the user’s finger taps on the hardware interface and terminated once the finger tap is released. We also compare the character input performance of UbiPoint with other recently proposed selection-based methods on smartglasses: 1) PalmType with external sensors(4.60 WPM) [13], 2) 7 CONCLUSION 1Line keyboard (4.20 WPM) [14], 3) 1D Handwriting (4.67 WPM) In this paper, we presented UbiPoint: a lightweight, non-intrusive, [14], 4) External touch controller (4.88 WPM) [9]. The fast-repetitive seamless, mid-air interaction technique for human-smartglasses in- mode demonstrates a comparable performance with the other latest teraction. UbiPoint does not rely on any additional sensor or periph- works in the literature. The average performance rates (WPM) of eral besides the embedded monocular camera. UbiPoint has been Fast-repetitive mode increase based on the Power Law of Learning, implemented on Google Glass and Mad Gaze, and can potentially fitting to a curve at R2 = 0.9953, where the associated power curve be applied to other headsets. Our evaluation shows that UbiPoint is a is y = 3.0987x0.2227. If the trend continues, character input with fast- real-time, accurate, affordable, and efficient mid-air interaction tech- repetitive mode could reach around 6 WPM and 7 WPM after a nique with a high level of user satisfaction. It achieves an average of total of 20 and 40 sessions, respectively. The average performance 20 fps on lower-grade hardware, which is higher than the minimum rates (WPM) of Freehand mode also show a curve at R2 = 0.9889, requirements of real-system interaction. It consumes an additional while the associated power curve is y = 0.5764x0.463. If the trend 14.18% of energy and occupies 19.30% of the CPU. UbiPoint proved continues, character input with Freehand mode could reach between easy to use. Users performed 1.82 times faster than the hardware 2–3 WPM. It is important to note that WPM can be further improved interface. UbiPoint relies on two interaction modes: Freehanded, with a predictive keyboard. using gestures for navigation in a WIMP system, and Fast Repetitive, providing hardware clicks for interactions requiring rapid repetition 6 APPLICATION SCENARIO of actions such as text input. Users double their speed when using the fast repetitive mode compared to the freehand mode for text We present the capabilities of system-wide implementation of Ubi- typing, and achieved comparable performance the existing literature. Point in a video12, including the gestures described Section 4.5. The In future works, we will further develop the gestures to com- video demonstrates the character input for a website URL and navi- pletely free UbiPoint from the hardware interface. Solutions such as gating the application main page with Freehand mode. The swipe multipoint detection would transform UbiPoint into a literal touch- gestures toward the edge and corner are applied in the scrolling of pad. Finally, other tracking methods can be implemented to improve application main page. We also create a game called ‘Wakkamole’, performance and battery usage. in which the user can intuitively use their finger to beat the character, namely ‘Doctoral Mole’. Finally, we include an example of smartglasses with UbiPoint as a control aid for presentation in public 8 ACKNOWLEDGMENT areas. The research is supported by Academy of Finland 6Genesis Flagship project [grant number: 318927] and 5GEAR project [grant number: 12Demonstration video: https://drive.google.com/file/d/14gCMTi-5LQedzDPz- 319669], and Project 16214817 from the Research Grants Council efjsoNCQq-_U6Xi/view of Hong Kong. MMSys’20, June 8-11, 2020, Istanbul, Turkey Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui

REFERENCES [22] Eleanor Jones, Jason Alexander, Andreas Andreou, Pourang Irani, and Sriram [1] M. Abramowitz and I.A. Stegun. 1972. Handbook of mathematical functions: with Subramanian. 2010. GesText: Accelerometer-based Gestural Text-entry Systems. formulas, graphs, and mathematical tables. Courier Dover Publications (55). In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI ’10). [2] Gilles Bailly, Jörg Müller, Michael Rohs, Daniel Wigdor, and Sven Kratz. 2012. ACM, 2173–2182. ShoeSense: A New Perspective on Gestural Interaction and Wearable Applications. [23] S. Lawrence and W. Brett. 2013. Comparison of Gestural, Touch, and Mouse In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI ’12). Interaction with Fitts’ Law. In Proc. of the 25th Australian Computer-Human ACM, 1239–1248. Interaction Conf.: Augmentation, Application, Innovation, Collaboration (OzCHI [3] W. Buxton and B. Myers. 1986. A Study in Two-handed Input. In Proc. of the ’13). ACM, 119–122. SIGCHI Conf. on Human Factors in Computing Systems (Boston, Massachusetts, [24] Lik Hang Lee, Tristan Braud, Farshid Hassani Bijarbooneh, and Pan Hui. 2019. USA) (CHI ’86). TiPoint: Detecting Fingertip for Mid-Air Interaction on Computational Resource [4] Hang Zhi Cheng. 2017. http://www.thetechrevolutionist.com/2016/01/review-of- Constrained Smartglasses. In Proceedings of the 23rd International Symposium epson-moverio-bt-200-vrar.html. [Online; accessed 10-February-2017]. on Wearable Computers (London, United Kingdom) (ISWC ’19). Association for [5] Epson. 2017. Epson Moverio BT-200. http://www.epson.com/cgi-bin/Store/jsp/ Computing Machinery, New York, NY, USA, 118–122. https://doi.org/10.1145/ Landing/moverio-augmented-reality-smart-glasses.do?UseCookie=yes 3341163.3347723 [6] Goel et al. 2015. Tongue-in-Cheek: Using Wireless Signals to Enable Non- [25] Lik Hang Lee, Tristan Braud, Kit Yung Lam, Yui Pan Yau, and Pan Hui. 2020. Intrusive and Flexible Facial Gestures Detection. In Proc. of the 33rd Annual ACM From seen to unseen: Designing keyboard-less interfaces for text entry on the con- Conf. on Human Factors in Computing Systems (CHI ’15). strained screen real estate of Augmented Reality headsets. Pervasive and Mobile [7] Georgiana Simion et al. 2016. Fingertip-based real time tracking and gesture Computing 64 (2020), 101148. https://doi.org/10.1016/j.pmcj.2020.101148 [26] Lik Hang Lee and Pan Hui. 2018. Interaction Methods for Smart Glasses: A recognition for natural user interfaces. 13 (01 2016), 189–204. Survey. IEEE Access 6 (2018), 28712–32. [8] Istance et al. 2008. Snap Clutch, a Moded Approach to Solving the Midas Touch [27] Lik Hang Lee, Kit Yung Lam, Tong Li, Tristan Braud, Xiang Su, and Pan Hui. Problem. 38 (2008). 2019. Quadmetric Optimized Thumb-to-Finger Interaction for Force Assisted [9] Jacek et al. 2016. Performance Analysis of Interaction between Smart Glasses One-Handed Text Entry on Mobile Headsets. Proc. ACM Interact. Mob. Wearable and Smart Objects Using Image-Based Object Identification. Int. Journal of Ubiquitous Technol. 3, 3, Article 94 (Sept. 2019), 27 pages. https://doi.org/10. Distributed Sensor Networks 12, 3 (2016), 10–27. 1145/3351252 [10] P. Hamette et al. 2002. FingerMouse: A Wearable Hand Tracking System. (01 [28] L. H. Lee, K. Yung Lam, Y. P. Yau, T. Braud, and P. Hui. 2019. HIBEY: Hide 2002). the Keyboard in Augmented Reality. In 2019 IEEE International Conference on [11] R. Gang et al. 2013. 3D selection with freehand gesture. Computers & Graphics Pervasive Computing and Communications (PerCom). 1–10. https://doi.org/10. 37, 3 (2013), 101–120. 1109/PERCOM.2019.8767420 [12] Schuchert et al. 2012. Sensing Visual Attention Using an Interactive Bidirectional [29] L. H. Lee, Y. Zhu, Y. P. Yau, T. Braud, X. Su, and P Hui. 2020. One-thumb Text HMD. In Proc. of the 4th Workshop on Eye Gaze in Intelligent Human Machine Acquisition on Force-assisted Miniature Interfaces for Mobile Headsets. In 2020 Interaction (Gaze-In ’12). ACM, Article 16, 3 pages. IEEE International Conference on Pervasive Computing and Communications [13] Wang et al. 2015. PalmType: Using Palms As Keyboards for Smart Glasses. In (PerCom). 1–10. Proc. of the 17th Int. Conf. on Human-Computer Interaction with Mobile Devices [30] T. Lee and T. Hollerer. 2007. Handy AR: Markerless Inspection of Augmented Re- and Services (MobileHCI ’15). ACM, 153–160. ality Objects Using Fingertip Tracking. In 2007 11th IEEE Int. Sym. on Wearable [14] Yu et al. 2016. One-Dimensional Handwriting: Inputting Letters and Words on Computers. 83–90. Smart Glasses. In Proc. of the 2016 CHI Conf. on Human Factors in Computing [31] I Scott MacKenzie. 1992. Fitts’ law as a research and design tool in human- Systems (San Jose, California, USA) (CHI ’16). ACM, 71–82. computer interaction. Human-computer interaction 7, 1 (1992), 91–139. [15] Yichao Huang et al. 2016. A Pointing Gesture Based Egocentric Interaction [32] Microsoft. 2017. Microsoft Hololens. https://www.microsoft.com/microsoft- System: Dataset, Approach and Application. (2016). hololens/en-us [16] Y.-T. Hsieh et al. 2016. Designing a Willing-to-Use-in-Public Hand Gestural [33] K. Sobottka and I. Pitas. 1998. A novel method for automatic face segmentation, Interaction Technique for Smart Glasses. In Proc. of the 2016 CHI Conf. on facial feature extraction and tracking. Signal Processing: Image Communication Human Factors in Computing Systems (CHI ’16). 12, 3 (1998), 263 – 281. [17] Zheng et al. 2015. Eye-Wearable Technology for Machine Maintenance: Effects [34] T. E. Starner. 2002. Attention, memory, and wearable interfaces. IEEE Pervasive of Display Position and Hands-free Operation. In Proc. of the 33rd Annual ACM Computing 1, 4 (Oct 2002). Conf. on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, [35] Ying-Chao Tung, Chun-Yen Hsu, Han-Yu Wang, Silvia Chyou, Jhe-Wei Lin, Pei- USA, 2125–2134. Jung Wu, Andries Valstar, and Mike Y. Chen. 2015. User-Defined Game Input for [18] Alphabet Inc. (Google). 2017. Google Glass. https://developers.google.com/glass/ Smart Glasses in Public Space. In Proc. of the 33rd Annual ACM Conf. on Human [19] Faizan Haque, Mathieu Nancel, and Daniel Vogel. 2015. Myopoint: Pointing and Factors in Computing Systems (CHI ’15). ACM, 3327–3336. Clicking Using Forearm Mounted Electromyography and Inertial Motion Sensors. [36] F. Vernier and L. Nigay. 2001. A Framework for the Combination and Characteri- In Proc. of the 33rd Annual ACM Conf. on Human Factors in Computing Systems zation of Output Modalities. In Proc. of the 7th Int. Conf. on Design, Specification, (CHI ’15). ACM, 3653–3656. and Verification of Interactive Systems (Limerick, Ireland) (DSV-IS’00). Springer- [20] Z. Huang, W.. Li, and P. Hui. 2015. Ubii: Towards Seamless Interaction Between Verlag, 35–50. Digital and Physical Worlds. In Proc. of the 23rd ACM Int. Conf. on Multimedia [37] Shanhe Yi, Zhengrui Qin, E Novak, Y Yin, and Q Li. 2016. Glassgesture: Ex- (MM ’15). ploring head gesture interface of smart glasses. INFOCOM, 2016 Proc. IEEE [21] Robert J.K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S. Horn, Orit (2016). Shaer, Erin Treacy Solovey, and Jamie Zigelbaum. 2008. Reality-based Interaction: [38] Sang Ho Yoon, Ke Huo, Vinh P. Nguyen, and Karthik Ramani. 2015. TIMMi: A Framework for post-WIMP Interfaces. In Proc. of the SIGCHI Conf. on Human Finger-worn Textile Input Device with Multimodal Sensing in Mobile Interaction. Factors in Computing Systems (CHI ’08). ACM, New York, NY, USA, 201–210. In Proc. of the Ninth Int. Conf. on Tangible, Embedded, and Embodied Interaction (TEI ’15). ACM, 269–272.