THE UNIVERSITY OF NEW SOUTH WALES
Towards a privacy-aware mixed reality present
Jaybie A. de Guzman (5138924) [email protected]
Athesisinfulfillmentoftherequirementsforthedegreeof Doctor of Philosophy
Supervisors: Prof. Aruna Seneviratne Dr. Kanchana Thilakarathna
School of Electrical Engineering and Telecommunications
Faculty of Engineering
December 2020 Thesis/Dissertation Sheet
Surname/Family Name : De Guzman Given Name/s : Jaybie Abbreviation for degree as give in the University calendar : PhD Faculty : Engineering School : Electrical Engineering and Telecommunications Thesis Title : Towards a privacy-aware mixed reality present
Abstract 350 words maximum:
Mixed reality (MR) technology development is now gaining momentum due to advances in computer vision, sensor fusion, and realistic display technologies. However, concerns on potential security and privacy risks are continuously being pointed out: for example, on how sensitive information can be captured by sensors and accessed by untrusted third-party applications, or how the reliability of the augmented virtual outputs can be ensured. With most of the earlier research and development focused on delivering the promise of MR, these privacy and security implications are yet to be thoroughly investigated; thus, in an extensive literature review, we present an ex- position of the latest security and privacy work on MR (as well as other MR-related technology), and group them into five data-centric categories. The exposition shows that most of these concerns and their accompanying (proposed) protection approaches were primarily focused on traditional information channels or spaces, i.e. images, video, audio and so on, and not on actual MR-specific information channels, such as 3D spatial data which latest MR devices and platforms are now utilizing. There- fore, there is a need to investigate the potential privacy leakage in 3D data used in MR platforms and, correspondingly, design a privacy-preserving mechanism for these types of data. Firstly, we demonstrate the privacy leakage from spatial data utilized in MR. Secondly, we present a heuristic or empirical measure that can signify the spatial privacy risk a captured space has. Thirdly, we propose to leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility. Lastly, we demonstrate a visual access control mechanism as a data-flow targeted measure which can be utilised in conjunction with other data-centric protection measures.
Declaration relating to disposition of project thesis/dissertation
I hereby grant to the University of New South Wales or its agents a non-exclusive licence to archive and to make available (including to members of the public) my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known. I acknowledge that I retain all intellectual property rights which subsist in my thesis or dissertation, such as copyright and patent rights, subject to applicable law. I also retain the right to use all or part of my thesis or dissertation in future works (such as articles or books).
…………………………………………………………… ……….……………………...…….…13-Jan-2021 Signature Date The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years can be made when submitting the final copies of your thesis to the UNSW Library. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’
Signed ……………………………………………......
13-Jan-2021 Date ……………………………………………...... INCLUSION OF PUBLICATIONS STATEMENT
UNSW is supportive of candidates publishing their research results during their candidature as detailed in the UNSW Thesis Examination Procedure.
Publications can be used in their thesis in lieu of a Chapter if: • The candidate contributed greater than 50% of the content in the publication and is the “primary author”, ie. the candidate was responsible primarily for the planning, execution and preparation of the work for publication • The candidate has approval to include the publication in their thesis in lieu of a Chapter from their supervisor and Postgraduate Coordinator. • The publication is not subject to any obligations or contractual agreements with a third party that would constrain its inclusion in the thesis
Please indicate whether this thesis contains published material or not:
This thesis contains no publications, either published or submitted for publication ☐ (if this box is checked, you may delete all the material on page 2)
Some of the work described in this thesis has been published and it has been documented in the relevant Chapters with acknowledgement ☒ (if this box is checked, you may delete all the material on page 2)
This thesis has publications (either published or submitted for publication) ☐ incorporated into it in lieu of a chapter and the details are presented below
CANDIDATE’S DECLARATION I declare that: • I have complied with the UNSW Thesis Examination Procedure • where I have used a publication in lieu of a Chapter, the listed publication(s) below meet(s) the requirements to be included in the thesis. Candidate’s Name Signature Date (dd/mm/yy) Jaybie A. de Guzman 13/01/2021
COPYRIGHT STATEMENT
‘I hereby grant the University of New South Wales or its agents a non-exclusive licence to archive and to make available (including to members of the public) my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known. I acknowledge that I retain all intellectual property rights which subsist in my thesis or dissertation, such as copyright and patent rights, subject to applicable law. I also retain the right to use all or part of my thesis or dissertation in future works (such as articles or books).’
‘For any substantial portions of copyright material used in this thesis, written permission for use has been obtained, or the copyright material is removed from the final public version of the thesis.’
Signed ……………………………………………......
Date ……………………………………………...... 13-Jan-2021 ......
AUTHENTICITY STATEMENT ‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis.’
Signed ……………………………………………......
Date ……………………………………………...... 13-Jan-2021 ......
Abstract
Mixed reality (MR) technology development is now gaining momentum due to ad- vances in computer vision, sensor fusion, and realistic display technologies. However, concerns on potential security and privacy risks are continuously being pointed out: for example, on how sensitive information can be captured by sensors and accessed by untrusted third-party applications, or how the reliability of the augmented virtual outputs can be ensured. With most of the earlier research and development focused on delivering the promise of MR, these privacy and security implications are yet to be thoroughly investigated; thus, in an extensive literature review, we present an ex- position of the latest security and privacy work on MR (as well as other MR-related technology), and group them into five data-centric categories. The exposition shows that most of these concerns and their accompanying (proposed) protection approaches were primarily focused on traditional information channels or spaces, i.e. images, video, audio and so on, and not on actual MR-specific information channels, such as 3D spatial data which latest MR devices and platforms are now utilizing. There- fore, there is a need to investigate the potential privacy leakage in 3D data used in MR platforms and, correspondingly, design a privacy-preserving mechanism for these types of data. Firstly, we demonstrate the privacy leakage from spatial data utilized in MR. Secondly, we present a heuristic or empirical measure that can signify the spatial privacy risk a captured space has. Thirdly, we propose to leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility. Lastly, we demonstrate a visual access control mechanism as a data-flow targeted measure which can be utilised in conjunction with other data-centric protection measures. Acknowledgements
I would like to express my sincerest gratitude to the University of New South Wales and its people for hosting and supporting my postgraduate journey. I would also like to express my gratitude to CSIRO’s Data61 for being my home for the most part of this journey, and, most especially, to the people in Data61 – my postgraduate research cohorts, to the people in the Information Security and Privacy group, and the other researchers, scientists, and sta↵ who made my stay in Data61 a delightful and unfor- gettable experience. I am likewise extremely grateful to the University of the Philip- pines Diliman, the Engineering Research and Development for Technology program of the Philippine Government’s Department of Science and Technology, and, most importantly, to the Electrical and Electronics Engineering Institute of UP Diliman for giving me the opportunity to pursue and generously supporting my postgraduate studies. Most of all, I would like to express my deepest gratitude to my supervisors, Aruna Seneviratne and Kanchana Thilakarathna, for their expert guidance. I would also like to thank my dearest friends in Sydney – travel buddies, running buddies, climbing buddies, lunch buddies, dinner buddies, basking-under-the-sun bud- dies, co↵ee buddies, and beer buddies – you’ve made my whole three and half year stay in Australia very colorful and exciting. Lastly, I would like to thank my family – my mother Amabel, and my brothers, Kiel, Kirby, and Angelo – and my partner, R.A., for being very supportive in all my endeavors in life – may it be academic or the outdoors.
i Contents
Acknowledgements i
Table of Contents ii
List of Figures v
List of Tables viii
List of Publications ix
1 Introduction 1 1.1 Motivation ...... 3 1.2 Contributions of the research ...... 5 1.3 List of Publications ...... 5 1.4 Manuscript Overview ...... 7
2 Literature Review 8 2.1 General Security and Privacy Requirements for MR ...... 9 2.2 Categorizing the Threats and Approaches ...... 10 2.3 Input Protection ...... 13 2.3.1 Passive Inputs: Targeted and Non-intended Latent Data . . . . 13 2.3.2 Gestures and other Active User Inputs ...... 17 2.4 Data Protection ...... 19 2.4.1 Protected Data Collection and Aggregation ...... 20 2.4.2 Protected Data Processing ...... 20 2.4.3 Protecting Data Storage ...... 23 2.5 Output Protection ...... 24 2.5.1 Output Reliability and User Safety ...... 25 2.5.2 Protecting Output Displays ...... 26 2.6 Protecting User Interactions ...... 27 2.6.1 Protecting Collaborative Interactions ...... 29 2.6.2 Protecting Sharing Initialization ...... 31 2.7 Device Protection ...... 32 2.7.1 Protecting Device Access ...... 32 2.7.2 Protecting Physical Interfaces ...... 33 2.8 Summary of Security and Privacy Approaches in MR ...... 35
ii CONTENTS iii
3 Spatial Privacy Problem in Mixed Reality 41 3.1 Spatial Privacy Framework ...... 43 3.2 3D Spatial Data ...... 43 3.3 Adversary Model ...... 44 3.4 Privacy Metrics ...... 46 3.5 Mixed Reality Functionality ...... 47
4 Spatial Privacy Leakage from 3D Mixed reality data 49 4.1 3D data from Mixed Reality devices ...... 50 4.2 Spatial Inference Attack ...... 53 4.2.1 Adversarial inference ...... 53 4.2.2 3D recognition methods ...... 53 4.2.3 Inference using 3d Descriptors: NN-matcher ...... 55 4.2.4 Inference using DNN: pointnetvlad ...... 59 4.3 Information Reduction Methods ...... 60 4.4 Evaluation Setup ...... 61 4.5 Spatial Inference Success ...... 63 4.5.1 Validating inference success over partial releases ...... 63 4.5.2 Spatial privacy through surface-to-plane generalization . . . . . 65 4.5.3 Utility in terms of QoS ...... 66 4.6 Detecting Spatial Inference Risk ...... 67 4.6.1 Computing the geometric shape functions ...... 67 4.6.2 Computing Spatial Complexity ...... 71 4.6.3 Spatial Complexity vs Inference Success ...... 71 4.6.4 Local Complexity vs Inference Success ...... 73
5 Conservative Plane Releasing for Spatial Privacy Protection 79 5.1 Conservative Plane Releasing ...... 80 5.2 Extending the Attack Scenario ...... 80 5.3 Inference Success with Successive Releasing ...... 81 5.4 Spatial Privacy with Conservative Releasing ...... 83 5.4.1 Utility with conservative releasing ...... 86 5.4.2 Utility vs Privacy ...... 88 5.4.3 Protection Properties of Conservative Releasing ...... 89
6 SafeMR: Object-level Abstraction 90 6.1 Visual Processing and Threat Model ...... 91 6.2 SafeMR: Object-level Abstraction ...... 92 6.2.1 System Architecture ...... 92 6.2.2 System Properties and Functionalities ...... 93 6.2.3 Implementation ...... 93 6.3 Performance Evaluation ...... 94 6.3.1 Validating the Vision Algorithms ...... 95 6.3.2 System Evaluation Setup ...... 95 6.3.3 Evaluation Metrics ...... 97 6.4 SafeMR Performance ...... 98 CONTENTS iv
6.4.1 Detection Utility & Secrecy ...... 98 6.4.2 Execution Time Performance ...... 100 6.4.3 Energy Consumption ...... 101 6.5 Provided Utility by SafeMR ...... 101 6.5.1 Resource Sharing Benefit ...... 102 6.6 Protection Properties of SafeMR ...... 102
7 Conclusions 104
Bibliography 108
A Definitions of the General Security and Privacy Properties 124
B Preliminary work on 3D Description and Inference 127 B.1 Preliminary 3D privacy problem ...... 127 B.2 Describing the 3D space ...... 128 B.2.1 Self-similarity-based 3D descriptors ...... 129 B.2.2 Spin Image 3D descriptors ...... 130 B.3 Inferring the 3D space ...... 131 B.3.1 Bayesian Inference model using the point cloud ...... 131 B.3.2 Inference using the rotation-invariant descriptors...... 131 B.3.3 Validating the inference models...... 132 B.4 Memory compactness of descriptors and inference models ...... 134
C Plane Generalization 136 List of Figures
1.1 A visualised artistic imagining of a “hyper-realisitic” world. Screenshot from Keiichi Matsuda’s YouTube video: https://youtu.be/YJg02ivYzSs 1 1.2 Mixed Reality pipeline: (Top) the immersive experience as viewed by the user; and (Bottom) the main processing pipeline of (1) detection, (2) transformation, and (3) rendering...... 2 1.3 Overhead view (bottom) of the HoloLens-captured 3D point cloud of an example environment; the 2D-RGB view (top-left) of a sample region; the 3D surface plot (top-right) of the 3D point cloud of the sample region. 3
2.1 (a) Mixed reality environment and (b) the data flow diagram . . . . . 11 2.2 A data-centric categorization of the various security and privacy work or approaches on mixed reality and related technologies...... 12 2.3 Shows a generic block diagram that inserts an intermediary protec- tion layer between the applications and device resources. (Data flows to and from third-party applications are now limited or less-privileged as represented by the broken arrows.) ...... 13 2.4 Example strategies for input protection: 1) information reduction or partial sanitization, e.g. from RGB facial information to facial outline only; 2) complete sanitization or blocking; or 3) skeletal information instead of raw hand video capture. (The broken arrow indicates less privileged information flow.) ...... 14 2.5 Generic block diagrams of two example data protection approaches: 1) cryptographic technique using secure multi-party computation where two or more parties exchange secrets (1.1 and 1.3) to extract combined knowledge (1.2 and 1.4) without the need for divulging or decrypting each others data share; and 2) personal data stores with “trusted” applets. 21 2.6 Shared Spaces ...... 28 2.7 Sample interface and display protection strategies: 1) inserting a polar- izer to prevent or block display leakage; and 2) visual cryptography, e.g. using secret augmentations (2.2) through decryption (2.1) of encrypted public interfaces (2.0). All elements to the left of the optical display element are considered vulnerable to external inference or capture. . . 34
v LIST OF FIGURES vi
3.1 MR pipeline (center) shows an MR function that transforms the G detected spatial map to the rendered output ; an adversarial Si Y pipeline (bottom) is shown in parallel with an attacker having access J to (1) historically collected spaces to infer information (i.e. a hy- I pothesis ) about the (2) current user space ; while an intermediary H S privacy protection mechanism (top) is inserted that transforms the M raw MR data to a privacy preserving version ˜ ...... 42 Si Si 3.2 An oriented point with position vectorp ˆ = x, y, z and normal vector { } nˆ = n ,n ,n . A group of these oriented points constitute a 3D point { x y z} cloud. A mesh [triangle] information can also be provided to indicate how these points are put together to form surfaces...... 44
4.1 HoloLens-captured 3D point clouds of the 7 collected environments (left); a 3D surface of a sample space (bottom-right), and its 2D-RGB view (top-right)...... 51 4.2 Visualized example spatial data captured by HoloLens and ARCore: . 52 4.3 Adversarial Process: Step 1 involves the building of the reference database from historical maps, and the training of pointnetvlad’s deep neural network; Step 2 is inference where the reference database is queried to match an unknown point cloud to get a hypothesis H about ’s S S identity or label I...... 54 4.4 Example (a) partial releases with (b) generalization ...... 60 4.5 Overall inference success in terms of F1 score ...... 63 4.6 Heatmap of Inference performance Per-space in terms of F1 score with annotated values at r = 1.0, 2.0, 3.0 ...... 64 { } 4.7 One-time partially released RANSAC-generalized spaces vs varying radii: (top) inter-space and (bottom) intra-space privacy ...... 65 4.8 QoS Q vs varying radius r...... 67 4.9 Distribution of the per-space similarity measures – d, v, and ⇠ – of our gathered spaces from HoloLens (Holo-Raw and Holo-Gen) and ARCore. (For the distributions of d, and v, we plot the moving average with width 3 for a smoother histogram.) ...... 68 4.10 Heatmap of per-space spatial complexity in terms of our chosen met- rics: d, v, and ⇠...... 70 4.11 (a-b) The aprioridistribution of the complexity value d given in- ference success C,i.e. P ( C), and (c-d) the posteriori likelihood of d| inference success C given local ,i.e.P (C )...... 74 d | d 4.11 (Continuation) (e-f) The aprioridistribution of the complexity value given inference success C,i.e. P ( C), and (g-h) the posteriori v v| likelihood of inference success C given local ,i.e.P (C )...... 75 v | v 4.11 (Continuation) (i-j) The aprioridistribution of the complexity value given inference success C,i.e. P ( C), and (k-l) the posteriori ⇠ ⇠| likelihood of inference success C given local ,i.e.P (C )...... 76 ⇠ | ⇠ 5.1 Example of conservative plane releasing...... 80 LIST OF FIGURES vii
5.2 Successively released generalized partial spaces: (top) inter-space and (bottom) intra-space privacy ...... 82 5.3 Average INTER-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) ...... 84 5.4 Average INTRA-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) ...... 85 5.5 Average QoS Q of conservatively released planes over successive releasing 87 5.6 Intersection map of Q 0.2 and ⇧ 0.5...... 88 1 6.1 Diminishing information: a) the raw visual capture; b) the target is cropped out but still with complete visual information of the target; c) only the bounding box of the target is exposed; d) only the centroid of the target is exposed; and e) only the binary presence, whether the target is within view or not, is exposed...... 91 6.2 Proposed visual processing architecture with object-level abstraction SafeMR inserted as an intermediary layer between the core APIs and the third-party applications...... 92 6.3 SafeMR demo showing di↵erent privilege levels ...... 94 6.4 Comparing the employed detection algorithms in terms of per-frame processing and number of feature matches: OpenCV-SIFT, OpenCV- ORB, and TensorFlow Object Detection API. TF-OD does not expose the number of feature matches...... 95 6.5 Varying abstraction mode: without (left) and with (right) SafeMR.The privileged views show actual detected objects while the larger views show which objects (or their information) are provided to applications. 96 6.6 CDF of the detection hits and secret hits ...... 99 6.7 Average overall frame processing time in seconds ( standard deviation). (Processed frame size is 500x500) ...... 100 ± 6.8 Comparing performance based on input frame size (number of tasks are indicated at the bottom of the bars) ...... 100
7.1 Overall system diagram showing how both SafeMR and data manipula- tions can be integrated within an intermediary layer of protection. . . 104
B.1 3D coordinate systems ...... 129 B.2 Inference performance heatmaps of the di↵erent 3D description ap- proaches ...... 133 B.3 Performance of the di↵erent 3D description/inference for di↵erent resolutions 134 B.4 Used memory by inference models and descriptors extracted from dif- ferent point cloud resolutions...... 135 List of Tables
2.1 Combined security and privacy properties and their corresponding threats 9 2.2 Summary of MR approaches that have been discussed, and which se- curity and privacy properties they provide to which data flow element: data flow, process, storage, and/or entity. The entity can be ⌃ ⇤ 4 the data itself, the user as the originator of the data, or the adversary (say, identifiability of an adversary as a security provision) ...... 36 2.2 (Continuation)...... 37 2.2 (Continuation)...... 38 2.2 (Continuation)...... 39
3.1 Notation Map ...... 43
4.1 Correlation coe cient of the three metrics and overall F1 score (with varying metric parameters (neighbors or pairs) and query space size r) 72
6.1 Average Detection Hit Rate ( stdev) (Processed frame size is 500x500) 98 ± 7.1 Our two proposed approaches, and which security and privacy proper- ties they provide to which data flow element: data flow, process, ⌃ storage, and/or entity. (As presented in Table 2.2) ...... 105 ⇤ 4
viii List of Publications
[de Guzman et al., 2019d] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Security and privacy approaches in mixed re- ality: A literature survey.” ACM Computing Surveys (CSUR) 52.6 (2019): 110.
[de Guzman et al., 2019b] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “A First Look into Privacy Leakage in 3D Mixed Reality Data.” European Symposium on Research in Computer Security. Springer, Cham, 2019.
[de Guzman et al., 2019c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Safemr: Privacy-aware visual information protection for mobile mixed reality.” IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.
[de Guzman et al., 2020c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Spatial Privacy Leakage in 3D Mixed Real- ity Data.” Cyber Defence Next Generation Technology and Science Conference 2020 (CDNG 2020). CSIRO, 2020. (Ac- cepted)
ix Chapter 1
Introduction
Figure 1.1: A visualised artistic imagining of a “hyper-realisitic” world. Screenshot from Keiichi Matsuda’s YouTube video: https://youtu.be/YJg02ivYzSs
The future with mixed reality (MR) is now. In recent years, there have been an uptake in the release of MR applications such as the gaming application PokemonGo in 2016 as well as dedicated head-mounted display (HMD) devices such as the Microsoft Hololens (with pre-production units also released in 2016) and the Magic Leap (with the Magic Leap One revealed in 2017). Although there are varying consensus on the definition of MR and whether it is “merged” instead of “mixed”, we refer to MR as the combination of aspects from augmented reality (AR) and virtual reality (VR) that deliver rich services and immersive experiences, and allow interaction of real objects with synthetic virtual objects and vice versa. By combining the synthetic presence
1 1. Introduction 2
Immersive User Experience
Example Output User space 3 e.g. Pokémon on the desk
spatial map of the user space 2 Main MR 1 processing Intended MR Function Output pipeline
Figure 1.2: Mixed Reality pipeline: (Top) the immersive experience as viewed by the user; and (Bottom) the main processing pipeline of (1) detection, (2) transformation, and (3) rendering. o↵ered by VR and the extension of the real world by AR, MR enables a virtually endless suite of applications that is not o↵ered by current AR and VR platforms, devices, and applications. Figure 1.1 shows a possible MR near-future “where physical and virtual realities have merged, and the [environment] is saturated in media” [Matsuda, 2016]. These MR experiences are brought to reality by recent developments, primarily, in sensor fusion, computer vision (particularly in object sensing, and tracking), human- computer interactions (HCI), and realistic display technologies (such as projections, and holograms). Figure 1.2 shows a generic pipeline for MR processing. These MR platforms cap- ture the environment, primarily, through vision sensors such as cameras with depth sensors. The captured visual or spatial information are processed to construct a spa- tial mapping or digital representation of the environment. This mapping captures the structural features of the environment, and, of course, the objects in the envi- ronment. This allows the machine to understand the environment and detect the information-of-interest, which can be a structural feature, visual target, or even user gesture. Then, the MR application or function extracts the necessary information such as surface orientation and location that informs where the virtual object can potentially be augmented–e↵ectively transforming the detected information to a form that is necessary in delivering the output. Finally, the intended output is rendered on to the scene to make it seem like it inhabits the real world as it is viewed through a display. 1. Introduction 3
2D-view of sample region 3D surface plot of sample region
Overhead view of a 3D point cloud plot of an example space
Figure 1.3: Overhead view (bottom) of the HoloLens-captured 3D point cloud of an example environment; the 2D-RGB view (top-left) of a sample region; the 3D surface plot (top-right) of the 3D point cloud of the sample region.
Moreover, these spatial maps can be utilized in conjunction with image, video, audio, and other sensor data to create a fully-immersive MR experience. As a result, MR allows users to interact with machines and each other in a totally di↵erent manner: for example, using gestures in the air instead of swiping in screens or tapping on keys. The output of our interactions, also, will no longer be confined within a screen. Instead, outputs will now be mixed with our real-world experience, and soon we may not be able to tell what is real and what is synthetic.
1.1 Motivation
Given these capabilities, MR users face even greater risks as richer information can be gathered using a wide variety of sensors. For example, the Microsoft HoloLens 1. Introduction 4 has a multitude of visual sensors: four (4) cameras for environment understanding, a separate depth sensor, two (2) infrared cameras for eye tracking, and another cam- era for view capture. Figure 1.3 shows an example spatial map captured using a HoloLens. These spatial maps are usually stored as an unordered list of 3D points and may sometimes be accompanied by triangle mesh information to represent surfaces. Furthermore, these maps are arguably more lightweight than video despite containing accurate representations of user spaces. Moreover, despite these capabilities being seemingly necessary in delivering the promise of MR, not all MR functionalities or services require extremely rich informa- tion. Privacy concerns are further exacerbated by recent advances in machine learning such as near real-time visual object detection which enables inference beyond the in- tended functionality [Oh et al., 2016]. Furthermore, once raw visual data have been made available to applications and services, users may no longer have control over how these data are further utilized [Roesner et al., 2014a]. For example, visual sensors in the MR device can subtly capture images and video without the knowledge of those around the user.1 It has been demonstrated how easy it is to use a simple facial recognition algorithm to match live-captured photos with publicly available photos on-line (from on-line social networks such as Facebook) and extract personal information such as names and social security numbers [Acquisti, 2011]. Various endeavor have highlighted these risks over captured visual data and, likewise, various protection mechanisms have been posed. However, it is not only visual data that poses risks, but also the spatial maps that provides the necessary environment understanding to MR platforms. This capability further poses unfore- seen privacy risks for users. Once these captured 3D maps have been revealed to untrusted parties, potentially sensitive spatial information about the users’ space are disclosed. Adversaries can vary from a benign background service that delivers unso- licited advertisements based on the objects detected from the user’s surroundings to malevolent burglars who are able to map the user’s house, and, perhaps, the locations and dimensions of specific objects in their house based on the released 3D data. Fur- thermore, turning o↵ GPS tracking for location privacy may no longer be su cient once the user starts using MR applications that can expose their locations through the 3D and visual data that are exposed. (For example, Google has unveiled their Visual Positioning Service (or VPS) using visual and 3D data to locate users – an o↵shoot of Project Tango – during their 2018 I/O keynote event.) Therefore, we introduce spatial privacy which we pertain to the protection and ensuring the privacy of the user spatial information captured by MR devices and other
1This violates bystander privacy–the unauthorized capture of information about the other users or ‘bystanders’. 1. Introduction 5 related platforms that utilize spatial data. Our work primarily focuses on ensuring spatial privacy by, first, quantifying and exposing the spatial privacy leakage, and, then, presenting counter measures.
1.2 Contributions of the research
Given the risks we have mentioned, there is a need to investigate the potential pri- vacy leakage in 3D data used in MR platforms and, correspondingly, design privacy- preserving mechanisms for MR data. To this end, we pose the following major contri- butions of this work:
1. As primary contribution, we present a comprehensive review of the various works and literature in security and privacy in MR, and demonstrate the gaps that still needs to be addressed: specifically, none of the available approaches in the literature addresses the spatial privacy issues with MR.
2. Then, we demonstrate the privacy leakage from spatial data captured and uti- lized in MR:
(a) specifically, we present an empirical measure that can signify the spatial privacy risk a captured space has – this leads the way towards early assess- ment of spatial privacy risk for user spaces; and (b) demonstrate how the risks persist even after implementing spatial general- izations (as a rudimentary privacy preservation by information reduction).
3. Lastly, we present two MR-targeted protection measures:
(a) we leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility; and (b) we present a visual access control mechanism in the form of object-level abstraction as a data flow-centric measure. We have demonstrated the practical feasibility of this abstraction on real devices.
1.3 List of Publications
As a result of this work, we have listed the publications in the preliminary pages and list them again below: 1. Introduction 6
[de Guzman et al., 2019d] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Security and privacy approaches in mixed re- ality: A literature survey.” ACM Computing Surveys (CSUR) 52.6 (2019): 110.
[de Guzman et al., 2019b] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “A First Look into Privacy Leakage in 3D Mixed Reality Data.” European Symposium on Research in Computer Security. Springer, Cham, 2019.
[de Guzman et al., 2019c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Safemr: Privacy-aware visual information protection for mobile mixed reality.” IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.
[de Guzman et al., 2020c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Spatial Privacy Leakage in 3D Mixed Real- ity Data.” Cyber Defence Next Generation Technology and Science Conference 2020 (CDNG 2020). CSIRO, 2020. (Ac- cepted)
We also have the following works under review:
Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Analysing Spatial Inference Risk Over Mixed Reality Spatial Data.” submitted to Proceed- ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/Ubicomp).
Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Con- servative Plane Releasing for Spatial Privacy Protection in Mixed Reality.” to be submitted to IEEE Transactions on Mobile Computing (TMC).
We also presented the following demonstrations and/or posters:
“Demo: Privacy-aware visual information protection for mobile mixed reality.” presented as a demonstration at the IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.
“Utilizing ‘simple’ object-level abstraction for visual information access control in mixed reality”, presented as a poster in the 2018 UNSW Postgraduate Research Symposium. 1. Introduction 7
1.4 Manuscript Overview
We have organized the subsequent chapters to focus on each of the contributions specified in Section 1.2.
• In Chapter 2, we present the exposition of security and privacy related work on MR and related technologies. The majority of this chapter has been published in ACM Computing Surveys [de Guzman et al., 2019d].
• Then, we formalize the spatial privacy problem in Chapter 3 which is based on the framework initially described in [de Guzman et al., 2019b] and expanded in [de Guzman et al., 2020b].
• In Chapter 4, we proceed with the demonstration of the privacy leakage (as demonstrated in [de Guzman et al., 2019b]), and the presentation of an empirical measure of spatial privacy risk (as presented in [de Guzman et al., 2020a]).
• In Chapter 5, we investigated and evaluated the viability of conservative plane releasing as a data-centric protection measure (as proposed in [de Guzman et al., 2020b]).
• And, in Chapter 6, we present a visual information control mechanism as a data flow-centric protection measure (as demonstrated in [de Guzman et al., 2019c, de Guzman et al., 2019a]).
• Lastly, we conclude this work in Chapter 7. Chapter 2
Literature Review
Various surveys on MR have focused on identifying the challenges and, thus, the neces- sary research and development to realize the technology. The various early challenges – such as matching the real and virtual displays, aligning the virtual objects with the real world, and the various errors that need to be addressed such as optical dis- tortion, misalignment, and tracking – have been discussed in [Azuma, 1997]. It was complemented with a following survey that focuses on the enabling technologies, in- terfacing, and visualization [Azuma et al., 2001]. A much more recent survey updated the challenges in MR systems to be performance, alignment, interaction, mobility, and visualization [Rabbi and Ullah, 2013]. A review of the various head-mounted display (HMD) technologies for consumer electronics was also presented in [Kress and Starner, 2013]. Another study looked into a specific type of AR, mobile AR, and the di↵erent technologies that enable mobility with AR [Chatzopoulos et al., 2017]. While all of these di↵erent challenges and technologies are important to enable AR, none of these studies have focused onto the fundamental issues of security and privacy in AR or MR.1 A few others have pointed out the non-technical issues such as ethical consid- erations [Heimo et al., 2014] and value-sensitive design approaches [Friedman and Kahn Jr, 2000] that highlights the need to consider data ownership, privacy, secrecy, and integrity. Another recent study has focused on the potential perceptual and sen- sory threats that can arise from MR outputs such as photosensitive epilepsy and motion-induced blindness [Baldassi et al., 2018]. An earlier work emphasized the three aspects for protection in AR – input, data access, and output – over varying system complexity (from single to multiple applications, and, eventually, to multiple systems) [Roesner et al., 2014b]. We expand from these three aspects, and include in-
1This chapter uses material from our published survey on security and privacy approaches in MR [de Guzman et al., 2019d].
8 2. Literature Review 9
Table 2.1: Combined security and privacy properties and their corresponding threats
Model
Property Threat Microsoft’s PriS 2008 LINDDUN SDL 2006 2011
Integrity Tampering
! X Non-repudiation Repudiation X Availability Denial of Service X Authorization Elevation of Privilege XX Authentication Spoofing XX
Security-oriented Identification Anonymity X Confidentiality Disclosure of Information XXX
Privacy-oriented Anonymity & Pseudonymity Identifiability XX Unlinkability Linkability XX Unobservability & Undetectability Detectability XX Plausible Deniability Non-repudiation X Content Awareness Unawareness X ! Policy & Consent Compliance Non-compliance X teraction, and device protection as equally important aspects and include discussions for both. Lastly, we will identify the provided security and privacy properties relevant to these approaches as well as which data flow elements (i.e. data flow, process, stor- age, and/or data entity) are targeted, and present them in a summary table to show the distribution of strategies of protection among the properties. We list the di↵erent security and privacy properties in Section 2.1, and explain the categorization used to sort the di↵erent approaches in Section 2.2. For each category, we discuss the corresponding approaches and which properties they address from Sec- tion 2.3 to 2.7. Lastly, we conclude this review in Section 2.8 with a table summarizing the approaches and highlighting the remaining gaps.
2.1 General Security and Privacy Requirements for MR
We derive general security and privacy requirements from three models and combined them to have an over-arching model from which we can refer or qualify the di↵erent ap- proaches (both defensive and o↵ensive strategies) that will be discussed in this chapter. Table 2.1 lists the thirteen combined security and privacy properties from Microsoft’s Security Development Lifecycle (or SDL) [Howard and Lipner, 2006], PriS [Kalloniatis et al., 2008], and LINDDUN [Deng et al., 2011] and their corresponding threats. The SDL has been popularly used by industry to elicit security threat scenarios and use cases. LINDDUN follows the SDL but with emphasis on privacy threats, while Pris presents another privacy framework but with desirable overlaps with SDL and LIND- DUN. We choose these three frameworks from the literature due to the overlapping 2. Literature Review 10 properties they have defined as well as the consistency of their property definitions. In the resulting combined list of properties, the first six are security-oriented while the remaining are considered as privacy-oriented. The confidentiality property is the only common among the three models and is considered as both a security and privacy property. Obviously, since SDL focuses primarily on security, all of its associated prop- erties target security. PriS has a balanced privacy- and security-targeted properties. On the other hand, LINDUNN’s properties are privacy-targeted, and are categorized into hard privacy (from confidentiality down to plausible deniability) and soft pri- vacy (content awareness, and policy and consent compliance). See Appendix A for a discussion of each property. Interestingly, some security properties are conversely considered as privacy threats: for example, non-repudiation is the “threat” to plausible deniability. This highlights the di↵erences in priority that an organization, user, or stakeholder can put into these properties or requirements. Nonetheless, these properties are not necessarily mutually exclusive and can be desired at the same time. Specifically, the target element to be protected provides an additional dimension on how these properties can be applied together. Namely, these properties can be applied to the following elements: data entities, data flow, process, and data storage as shown in Figure 2.1. For every approach that will be discussed from Sections 2.3 to 2.7, we will identify which properties they are trying to address. Moreover, there are other ‘soft’ properties (i.e. reliability, and safety.) that we will liberally use in the discussions.
2.2 Categorizing the Threats and Approaches
Figure 2.1 presents an example of an MR environment and shows how data flows from the observable environment (Figure 2.1a), and through the MR device (Figure 2.1a) to process inputs and deliver experiences. The left-half of Figure 2.1a shows the ‘view’ of the mixed reality device which, in this example, is a see-through MR head- mounted device (HMD), i.e. an MR eye-wear. Within the view are the physical objects which are “seen” by the MR device as indicated by the solid arrows. The synthetic augmentations are shown in the diagram which are represented by the broken arrows. A cloud- or web-based support service is also shown through which multiple MR users can collaborate or share MR experiences, say, through a social network which supports MR such as Snapchat and Pokemon Go. Figure 2.1b shows the data flow diagram which follows the MR processing pipeline of detection transformation rendering. Physical entities (e.g. desk, cup, or ! ! keyboard) from the environment are captured or detected. After detection, the result- ing entities will be transformed or processed to deliver services accordingly. Depending 2. Literature Review 11
mixed reality pet mixed reality glasses
1 unread SMS 1 New Email Supporting Web/Cloud Services and/or Social Networks
4 5 1 2 reality glasses reality View of the mixed mixed the of View
⁰C ~90 ~2g sugar 3
(a) A mixed reality environment (left) with the supporting data services (right) as well as example points of protection as labelled: (1) contents of the display monitor, (2) access to stored data, (3) virtual display for content, e.g. information about the contents of a smart mug, (4) collaborating with other users, and (5) device access to the mixed reality eye-wear.
Rendering Entities Rendered for Rendering mixed reality boundary Entities Storage
Legend External Transformation Entities ⃤ Entity
◯ Process Detection ⃞ Data Storage Primary Entities Data Flow: bi-directional physical or virtual or Detection input output observable space machine space Results
(b) A data flow diagram that follows the generic mixed reality processing pipeline of detection transformation rendering, and shows the data flow elements entities that are used! as inputs and/or! outputs for each processing step as well as the4 storage. ⇤ Figure 2.1: (a) Mixed reality environment and (b) the data flow diagram 2. Literature Review 12
Intrinsic Protection Input Protection Environment Protection User Input Protection
Data Aggregation Data Access Data Processing Protection Data Storage Security and Privacy Protected Rendering Approaches to Output Protection Reliable Outputs Mixed Reality Protected Displays
Interaction Protected Sharing Protection Protected Collaboration
Authentication Device Protection Physical Protection Figure 2.2: A data-centric categorization of the various security and privacy work or approaches on mixed reality and related technologies. on the service or application, di↵erent transformations are used. Finally, the results of the transformation are delivered to the user by rendering them (such as the virtual pet bird or the cup-contents indicator) through the device’s output interfaces. Figure 2.2 shows the five categories (and their subcategories) to which we have distributed the various related work. The first three categories are directly mapped to the associated risks with the main steps of the processing pipeline – protecting how applications, during the transformation stage, access real-world input data gathered during detection, which may be sensitive, and generate reliable outputs during render- ing. Di↵erently, the interaction protection and device protection approaches cannot be mapped along the pipeline unlike the other three as the intended target elements of these two categories transcend the pipeline. Representative points of the five aspects of protection are labelled in Figure 2.1a. The presented categorization does not exclusively delineate the five aspects, and it is important to note that the approaches can fall under more than one category or subcategory. 2. Literature Review 13
Trusted App 1 App
1 unread SMS Other 1 Outputs 1 New Email Data ... App 2 App 3D 2 Display Data
App 3 App + Depth + Camera 3
Intermediary Protection Layer Protection Intermediary Data ... . Legend: Privileged Less-privileged App N App Other Other output data flow Sensors N input data flow Data Mixed Reality Platform two-way data flow
Figure 2.3: Shows a generic block diagram that inserts an intermediary protection layer between the applications and device resources. (Data flows to and from third-party applications are now limited or less-privileged as represented by the broken arrows.) 2.3 Input Protection
This category focuses on the challenges in ensuring security and privacy of data that is gathered and inputted to the MR platform which can contain sensitive information. For example, in Figure 2.1a, the MR eye-wear can capture the sensitive information on the user’s desktop screen (labelled 1) such as e-mails, chat logs, and so on. These are user-sensitive information that needs to be protected. Moreover, the device can also capture information that may not be sensitive to the user but may be sensitive to other entities such as bystanders. This is called bystander privacy. Aside from readily sensitive objects, the device may capture other objects in the environment that are seemingly benign (or subtle) and were not intended to be shared but can be used by adversaries to infer knowledge about the users/bystanders. We collectively call these inputs (i.e. objects, visual targets, and structural features) as passive, while those that are intentionally actuated and provided by users (e.g. gestures) as active inputs.
2.3.1 Passive Inputs: Targeted and Non-intended Latent Data
Aside from information disclosure (i.e. against confidentiality), the two other main threats to sensitive, personally-identifiable information during data capture are de- tectability and user content unawareness. Both stem from the fact that these MR systems collect a large amount of information, and among these are necessary and sensitive information alike. As more of these services becomes personalized, the sen- sitivity of these information increases. These threats are very evident with visual data. As MR requires the detection of targets, i.e. objects or contexts, in the real 2. Literature Review 14
1
2 Intermediary Intermediary Protection Layer Protection Input 3
Input Policy
Figure 2.4: Example strategies for input protection: 1) information reduction or partial sanitization, e.g. from RGB facial information to facial outline only; 2) complete sanitization or blocking; or 3) skeletal information instead of raw hand video capture. (The broken arrow indicates less privileged information flow.) environment, other non-necessary and latent but potentially sensitive information are captured as well.
Passive Input Protection Approaches The most common input protection ap- proaches usually involves the removal of latent and sensitive information from the input data stream. These approaches are generally called input sanitization techniques (see samples labelled 1 and 2 in Figure 2.4). These are usually implemented as an in- termediary layer between the sensor interfaces and the applications as shown in Figure 2.3. In general, this protection layer acts as an input access control mechanism aside from sanitization. These techniques can further be categorized according to the policy enforcement – whether intrinsic or extrinsic policies for protection are used.
1. Intrinsic input sanitization policies are usually user-defined: the user, device, or system itself imposes the protection policies that dictates the input sanitization that is applied. For example, the Darkly system [Jana et al., 2013b] for per- ceptual applications uses OpenCV in its intermediary input protection layer to implement a multi-level feature sanitation. The basis for the level or degree of sanitization are the user-defined policies. The users can impose di↵erent degrees of sensitivity permissions which a↵ects the amount of detail or features which can be provided to the applications, i.e. stricter policies mean less features are provided. For example, facial information can vary from showing facial feature contours (of eyes, nose, brows, mouth, and so on) to just the head contour de- pending on the user’s preferences. The user can actively control the level of information that is provided to the applications. Thus, aside from providing undetectability & unobservability, and content awareness to users, Darkly also provides a form of authorization through information access control, specifically a least privilege access control. 2. Literature Review 15
Context-based Sanitization. A context-based intrinsic sanitization frame- work [Zarepour et al., 2016] improves on the non-contextual policies of Darkly. It determines if there are sensitive objects in the captured images, like faces or car registration plates, and automatically implements sanitization. Sensitive features are sanitized by blurring them out, while images of sensitive locations (e.g. bathrooms) are deleted entirely. Similarly, PlaceAvoider [Templeman et al., 2014] also categorizes images as sensitive or not, depending on the fea- tures extracted from the image, but deletion is not automatic and still depends on the user. Despite the context-based nature of the sanitization, the policy that governs how to interpret the extracted contexts are still user-defined, thus, we consider both sanitization techniques as intrinsic. However, intrinsic policy en- forcement can be considered as self-policing which can potentially have a myopic view of privacy preferences of other users and objects. Furthermore, intrinsic policies can only protect the inputs that are explicitly identified in the policies. Video Sanitization. The previously discussed sanitization techniques were for generic capturing devices and were mostly sanitizing images and performs the sanitization after the image is stored. For MR platforms that require real- time video feeds, there is a need for live and on-the-fly sanitization of data. A privacy-sensitive visual monitoring [Szczuko, 2014] system was implemented by removing persons from a video surveillance feed and rendering 3D animated hu- manoids in place of the detected and visually-removed persons. Another privacy- aware live video analytic system called OpenFace-RTFace [Wang et al., 2017] focused on performing fast video sanitization by combining it with face recog- nition. OpenFace-RTFace system lies near the edge of the network, or on cloudlets. Similar approaches to edge or cloud-assisted information sanitization can potentially be utilized for MR.
2. Extrinsic input sanitization arises from the need for sensitive objects external to the user that are not considered by the intrinsic policies; thus, policies, e.g. privacy preferences, are received from the environment. An early implementa- tion [Truong et al., 2005] involved outright capture interference to prevent sen- sitive objects from being captured by unauthorized visual capturing devices. A camera-projector set up is used. The camera detects unauthorized visual capture devices, and the projector beams a directed light source to “blind” the unautho- rized device. This technique can be generalized as a form of a physical access control, or, specifically, a deterrent to physical or visual access. However, this implementation requires a dedicated set up for every sensitive space or object, and the light beams can be disruptive to regular operation. 2. Literature Review 16
Other approaches involves the use of existing communication channels or infrastructure for endorsing or communicating policies to capture devices, and to ensure that enforcement is less disruptive. The goal was to implement a fine-grained permission layer to “automatically” grant or deny access to contin- uous sensing or capture of any real-world object. A simple implementation on a privacy-aware see-through system [Hayashi et al., 2010] allowed other users de- tected to be blurred out or sanitized and shown as human icons only if the viewer is not their friend. However, this requires that users have access to the shared database and explicitly identify friends. Furthermore, enabling virtually anyone or, in this case, anything to specify policies opens new risks such as tampering, and malicious policies. To address authenticity issues in this so called world-driven access control, policies can be transmitted as digital certificates [Roesner et al., 2014c] using a public key infrastructure (PKI). PKI provides cryptographic protection to me- dia access and sanitization policy transmission. However, the use of a shared database requires that all possible users’ or sensitive objects’ privacy preferences have to be pushed to this shared database. Furthermore, it excludes or, unin- tentionally, leaves out users or objects that are not part of the database which defeats the purpose of a world-driven protection. I-pic [Aditya et al., 2016] removes the involvement of shared databases. Instead users endorse privacy choices via a peer-to-peer approach using Blue- tooth Low Energy (BLE) devices. However, I-pic is only a capture-or-no system. PrivacyCamera [Li et al., 2016a] is another peer-to-peer approach but is not limited to BLE. Also, it performs face blurring, instead of just capture-or-no, us- ing endorsed GPS information to determine if sensitive users are within camera view. On the other hand, Cardea [Shu et al., 2016] allows users to use hand gestures to endorse privacy choices. In Cardea, users can show their palms to signal protection while a peace-sign to signal no need for protection. These three approaches are targeted at bystander privacy protection, i.e. facial information sanitization. MarkIt [Raval et al., 2014] can provide protection to any user or object through the use of privacy markers and gestures (similar to Cardea)toen- dorse privacy preferences to cameras. It was integrated to Android’s camera subsystem to prevent applications from leaking private information [Raval et al., 2016] by sanitizing sensitive media. This is a step closer to automatic extrinsic input sanitization, but it requires visual markers for detecting sensitive objects. Furthermore, all these extrinsic approaches have only been targeted for visual 2. Literature Review 17
capture applications and not with AR- or MR-specific ones.
3. Structural Abstraction Other MR environments incorporates any surface or medium as a possible output display medium. For example, when a wall is used as a dis- play surface in an MR environment, the applications that use it can potentially capture the objects or other latent and/or sensitive information within the wall during the detection process. This specific case intersects very well with the input category because what is compromised here is the sensitive information that can be captured in trying to determine the possible surfaces for displaying. Applications that require such displays do not need to know what the con- tents on the wall are. It only has to know that there is a surface that can be used as a display. Protected output rendering protects the medium and, by extension, whatever is in the medium. Least privilege has been used in this context [Vilk et al., 2014]. For example, in a room-scale MR environment, only the skeletal information of the room, and the location and orientation of the detected sur- faces (or display devices) is made known to the applications that wish to display content on these display surfaces [Vilk et al., 2015]. This example of room-scale MR environments is usually used for multii-user collaborations.
2.3.2 Gestures and other Active User Inputs
Another essential input that needs to be protected are user gestures. We put a separate emphasis on this as gesture inputs entails a ‘direct’ command to the system, while the previous latent and user inputs do not necessarily invoke commands. Currently, the most widely adopted user input interfaces are tactile such as the keyboard, mouse, and touch interfaces. However, these current tactile inputs are limited by the dimension2 of space that they are interacting with and some MR devices now don’t have such interfaces. Also, these input interface types are prone to a more physical threat such as external inference or shoulder-surfing attacks. From which, threats such as spoofing, denial of service, or tampering may arise. Furthermore, there is a necessity for new user input interfaces to allow three- dimensional inputs. Early approaches used gloves [Dorfmuller-Ulhaas and Schmalstieg, 2001,Thomas and Piekarski, 2002] that can determine hand movements, but advances in computer vision have led to tether- and glove-free 3D interactions. Gesture inference from smart watch movement have also been explored particularly on finger-writing inference [Xu et al., 2015]. Vision-based natural user interfaces (NUI), such as the Leap
2Keyboards and other input pads can be considered as one-dimensional interfaces, while the mouse and the touch interfaces provides two-dimensional space interactions with limited third dimension using scroll, pan, and zoom capabilities. 2. Literature Review 18
Motion [Zhao and Seah, 2016] and Microsoft Xbox Kinect, have long been integrated with MR systems to allow users to interact with virtual objects beyond two dimensions. This allows the use of body movement or gestures as input channels and move away from keypad and keyboards. However, the use of visual capture to detect user gestures or using smart watch movement to detect keyboard strokes means that applications that require gesture inputs can inadvertently capture other sensitive inputs [Maiti et al., 2016]. Similar latent privacy risks such as detectability and content unawareness arise. Thus, as new ways of interacting in MR are being explored, security and privacy should also be maintained.
Protection through abstraction Prepose [Figueiredo et al., 2016] provides se- cure gesture detection and recognition as an intermediary layer (as in Figure 2.3). The Prepose core only sends gesture events to the applications, which e↵ectively removes the necessity for untrusted applications to have access to the raw input feed. Similar to Darkly,itprovidesleast privilege access control to applications, that is, only the necessary gesture event information is transmitted to the third party applications and not the raw gesture feed. Some work prior to Prepose implemented the similar idea of inserting a hierar- chical recognizer [Jana et al., 2013a] as an intermediary input protection layer. They inserted Recognizers to the Xbox Kinect to address input sanitization as well as to provide input access control. The Recognizer policy is user-defined, thus, an intrinsic approach. Similarly, the goal is to implement a least privilege approach to application access to inputs – applications are only given the least amount of information necessary to run. For example, a dance game in Xbox, e.g. Dance Central or Just Dance, only needs body skeletal (similar to sample labelled 3 in Figure 2.4) movement information, and it does not need facial information, thus, the dance games are only provided with the moving skeletal information and not the raw video feed of the user while playing. To handle multiple levels of input policies, the Recognizer implements a hierarchy of privileges in a tree structure, with the root having highest privilege, i.e. access to RGB and depth information, and the leaves having lesser privileges, i.e. access to skeletal information. Another recent work demonstrated how the visual processing and network access of a mobile AR/MR application can be siloed and have the visual information abstracted to protect it from malicious MR applications [Jensen et al., 2019]. SemaDroid [Xu and Zhu, 2015], on the other hand, is a device level protection approach. It is a privacy-aware sensor management framework that extends the current sensor management framework of Android and allows users to specify and control fine- grained permissions to applications accessing sensors. Just like the other abstraction 2. Literature Review 19 strategies, it is implemented as an intermediary protection layer that provides users application access control or authorization to sensors and sensor data. What di↵ers is its application of auditing and reporting of potential leakage and applying them to a privacy bargain. This allows users to ‘trade’ their data or privacy in exchange for services from the applications. There are a significant number of work on privacy bargain and the larger area of privacy economics, and we refer the readers to Acquisti’s work [Acquisti et al., 2016].
Remaining Challenges in Input Protection
Most of the approaches discussed so far are founded on the idea of least privilege. However, it requires that the intermediary layer, for example the Recognizers,must know what type of inputs or objects the di↵erent applications will require. Pre- pose addresses this for future gestures but not for future objects. For example, an MR painting application may require the detection of di↵erent types of brushes but the current recognizer does not know how to ‘see’ or detect the brushes. Extrinsic approaches like MarkIt try to address this by using markers to tell which objects can and cannot be seen. What seemingly arises now is the need to have a dynamic abstraction and/or sanitization of both pre-determined and future sensitive objects. Nevertheless, data-level techniques can be employed to further leverage abstractions as a protection measure not just for active inputs but also for passive and latent inputs.
2.4 Data Protection
Data from multiple sensors or sources are aggregated, processed, and stored usually in cloud servers and databases. Applications, then, need to access the data in order to deliver output in the form of user-consumable information or services. However, almost all widely used computing platforms allows applications to collect and store data individually (as shown in the access of supporting data services labelled 2 in Figure 2.1a) and the users may have no control over their data once it has been collected and stored by these applications. Majority of security and privacy risks have been raised concerning the access and use of user data by third party agents, particularly, on data gathered from wearable [Felt et al., 2012], mobile [Lee et al., 2015], and on- line activity [Ren et al., 2016]. Thus, MR systems faces even greater risks as richer information can be gathered using a wide variety of sensitive sensors–i.e. visual data from which spatial mapping information can be extracted to determine spatial features such as surfaces or physical objects on which augmentations are rendered over. For data protection, there are a lengthy list of properties that needs to be maintained such 2. Literature Review 20 as integrity, availability, confidentiality, unlinkability, anonymity & pseudonymity, and plausible deniability among others. Generally, the aim of privacy-preservation is to allow services or third-party ap- plications to learn without leaking unnecessary and/or personally identifiable infor- mation. Usually, they use privacy definitions such as k-anonymity [Samarati and Sweeney, 1998, Samarati, 2001], and di↵erential privacy [Dwork et al., 2014, McSh- erry and Talwar, 2007]. However, most of the measures proposed and discussed in the wider literature were implemented on generic systems and not necessarily on MR systems. Furthermore, most of the approaches we will discuss have been aimed at traditional visual media, i.e. images and video. While MR still relies heavily on visual data, extracted 3D spatial data is now primarily utilized to represent spatial under- standing which arguably poses more risks. Thus, we present an exposition of data protection approaches on the following data flow aspects: (1) data aggregation, (2) privacy-preserving data processing, and (3) protected data storage and access.
2.4.1 Protected Data Collection and Aggregation
Essentially, data collection also falls under the input category but we focus on data after sensing and how systems, applications, and services handle data afterwards. Pro- tected data collection and aggregation approaches are also implemented as an inter- mediate layer as in Figure 2.3. Usually, data manipulation or similar mechanisms are run on this intermediary layer to provide a privacy guarantee, e.g. di↵erential privacy or k-anonymity, among released data. RAPPOR or randomized response [Erlingsson et al., 2014] is an example of a di↵erentially-private data collection and aggregation algorithm. It is primarily applied for privacy-preserving crowd-sourced information such as those collected by Google for their Maps services. Privacy-preserving data ag- gregation (PDA) has also been adopted for information collection systems [He et al., 2007, He et al., 2011] with multiple data collection or sensor points, such as wireless sensor networks or body area networks. Overall, the goal of privacy-preserving data collection and aggregation is to get aggregate statistics or information without di- vulging individual information; thus providing anonymity, unlinkability, and plausible deniability between the aggregate information (as well as its derivative processes and further resulting information) and the data source entity, i.e. a user.
2.4.2 Protected Data Processing
After collection, most services will have to process the data immediately to deliver outputs in real-time. Thus, similar to data collection, the same privacy threats of information disclosure, linkability, detectability, and identifiability holds. During pro- 2. Literature Review 21
Trusted 1.4 1.0 1 App 1 App 1.1
Other Other 1 1.2 Outputs Data Other Party Other ... Party App 2 App Data
3D 1.3 2 Display 2 Secure Data Multi-party Computation Applet 2
App 3 App + Depth + Camera Applet 3 3 Data ... . Personal Legend: Applet N Privileged Less-privileged
Data Store . App N App Other Other
Sensors output data flow N input data flow Mixed Reality Platform Data two-way data flow
Figure 2.5: Generic block diagrams of two example data protection approaches: 1) cryptographic technique using secure multi-party computation where two or more parties exchange secrets (1.1 and 1.3) to extract combined knowledge (1.2 and 1.4) without the need for divulging or decrypting each others data share; and 2) personal data stores with “trusted” applets. cessing, third-party applications or services can directly access user data which may contain sensitive or personal information if no protection measures are implemented. The subsequent exposition of protection approaches presents a collection of MR-related work particularly on privacy-preserving and secure image and video processing.
1. Encryption-based techniques. Homomorphic encryption (HE) allows queries or computations over encrypted data. In visual data processing, this has been used for image feature extraction and matching for various uses such as image search, and object detection. He-Sift [Hsu et al., 2011] performs bit-reversing and local encryption to the raw image before feature description using SIFT.3 The goal was to make dominant features, which can be used for context inference, reces- sive. As a result, feature extraction, description, and matching are all performed in the encrypted domain. A major drawback with near full-homomorphism is the very slow computation time. SecSift [Qin et al., 2014, Qin et al., 2016] improves on the computation time of He-Sift by instead using a somewhat ho- momorphic encryption, i.e. order-preserving encryption. They split or distribute the SIFT feature computation tasks among a set of “independent, co-operative cloud servers to keep the outsourced computation procedures as simple as possi- ble and avoid utilizing homomorphic encryption.” Other improvements utilized big data computation techniques to expedite secure image processing such as
3SIFT or Scale-invariant Feature Transform is a popular image feature extraction and description algorithm [Lowe, 2004] 2. Literature Review 22
the use of a combination of MapReduce and ciphertext-policy attribute-based encryption [Zhang et al., 2014], or the use of Google’s Encrypted BigQuery Client for Paillier HE computations [Ziad et al., 2016]. However, this methods are algorithm-specific; that is, every algorithm that we desire to be privacy- preserving using homomorphism has to be re-engineered.
2. Secret Sharing or Secure Multi-party Computation. Data can be split among untrusted parties assuming that information can only be inferred when the dis- tributed parts are combined [Yao, 1986, Huang et al., 2011]. Secure multi-party computation (SMC) or secret sharing allows computation of data from two or more sources without necessarily knowing about the actual data each source has. The diagram labelled 1 in Figure 2.5 shows a possible SMC setup. A privacy-preserving photo-sharing service has been designed using a two-party secret sharing by “by splitting a photo into a public part, which contains most of the volume (in bytes) of the original, and a secret part which contains most of the original’s information” [Ra et al., 2013]. While a virtual cloth try-on service used secret sharing and secure two-party computation [Sekhavat, 2017]. The an- thropometric information, i.e. body measurements, of the user is split between the user’s mobile device and the server, and are both encrypted. The server has a database of clothing information. The server can then compute a 3D model of the user wearing the piece of clothing by combining the anthropometric in- formation and the clothing information to generate an encrypted output which is sent to the user device. The user device decrypts the result and combines it with the local secret to reveal the 3D model of the user “wearing” the piece of clothing.
3. Data Manipulation, Perturbations, and Transformation. Other MR-related privacy- preserving techniques have focused on facial de-identification using image ma- nipulation to achieve k-anonymity for providing identity privacy [Newton et al., 2005,Gross et al., 2006,Gross et al., 2008]. Succeeding face de-identification work has focused on balancing utility and privacy [Du et al., 2014]. While much recent work have leveraged generative adversarial networks for deceiving a potentially adversarial data collector, to de-identify faces but ensuring high demographic utility of the resulting de-identified face [Brkic et al., 2017, Wu et al., 2018]. The same manipulation can be extended over 3D spatial data that is utilized in MR systems. Instead of providing complete 3D spatial data, a sanitized or ‘salted’ virtual reconstruction of the physical space can be provided to third- party applications. For example, instead of showing the 3D capture of a table in the scene with all 3D data of the objects on the table, a generalized horizontal 2. Literature Review 23
platform or surface can be provided. The potentially sensitive objects on the table are thus kept confidential. A tunable parameter provides the balance be- tween sanitization and utility. Using this tunability, similar notions of privacy guarantee to di↵erential privacy and k-anonymity can be provided. However, this approach is yet to be realized but virtual reconstruction has been used to address delayed alignment issues in AR [Waegel, 2014]. This approach can work well with other detection and rendering strategies of sanitization and abstraction as well as in privacy-centred collaborative interactions (Section 2.6.1). It also opens the possibility to have an active defence strategy where ‘salted’ recon- structions are o↵ered as a honeypot to adversaries. 3D Data Transformation. A recent work demonstrated how original scenes from 3D point cloud data can be revealed usinig machine learning [Pittaluga et al., 2019]. As a counter-measure, a concurrent work designed privacy-preserving method of pose estimation to counter the scene revelation [Speciale et al., 2019]: 3D “line” clouds are used instead of 3D point clouds during pose estimation to obfuscate 3D structural information; however, this approach only addresses the pose estimation functionality and does not present the viability for surface or object detection which is necessary for a virtual object to be rendered or “anchored” onto.
Overall these privacy-preserving data processing techniques aim to provide the privacy properties of unlinkability, unobservability, and plausible deniability between the process (as well as its results) and the data source. Furthermore, the encryption- and secret sharing-based techniques further provide security properties of integrity and authorization as only the authorized parties can process the data while ensuring confidentiality through homomorphism. All these techniques complement each other and may be used simultaneously. Thus, any or all of these techniques can be applied to MR and it will only be a matter of whether the technique is appropriate for the amount of data and level of sensitivity of data that is tackled in MR environments.
2.4.3 Protecting Data Storage
After collection and aggregation, applications store user data on separate databases in which users have minimal or no control over. Privacy concerns on how these ap- plications use user data beyond the expected utility to the user have been posed [Ren et al., 2016, Lee et al., 2015, Felt et al., 2012]. Aside from these privacy threats, there are inherent security threats such as tampering, unauthorized access, and spoofing.To provide security against such threats, the Advanced Encryption Standard (or AES) has been specified as the industry standard. 2. Literature Review 24
When trustworthiness of third-party applications and services are not ensured, protected data storage solutions, such as personal data stores (PDS), with managed application access permission control is necessary. PDSs allows the users to have control over their data, and which applications have access to it. Figure 2.5 shows a generic block diagram (labelled 2) of how a PDS protects the user data by running it in a protected sand-box machine that can monitor the data that is provided to the applications. Usually, applet versions of the applications run within the sand-box. Various PDS implementations have been proposed such as the personal data vaults (PDV) [Mun et al., 2010], OpenPDS [de Montjoye et al., 2014], and the Databox [Crabtree et al., 2016]. Other generic protection approaches focused on encrypted fragmented data storage [Ciriani et al., 2010] or decentralized storage using blockchains [Zyskind et al., 2015]. As a result, PDS provides accountability and subsequently the non-repudiation security property as applications cannot deny that they have accessed the stored data. Privacy-preserving aggregation can also be implemented within the PDS to provide privacy properties of anonymity, unlinkability and plausible deniability between the released aggregate data and the user as a data source. For example, OpenPDS releases private aggregates or answers through its SafeAnswers interface.
Remaining Challenges in Data Protection
There are necessary modifications that applications have to partake in order to imple- ment these data protection strategies. Aside from implementation complexity addi- tional resources may be necessary such as the inherent need of memory, and compute capacity when employing encryption. There are attempts to eliminate the necessity of code modification, such as in GUPT [Mohan et al., 2012] which focuses on the sampling and aggregation process to ensure distribution of the di↵erential privacy budget and eliminating the need for costly encryption. Also, combining these tech- niques with protected sensor management and data storage to provide confidentiality through sanitization and authorized access control is promising.
2.5 Output Protection
After processing the data, applications send outputs to the mixed reality device to be displayed or rendered. However, an untrusted application who has access to outputs other than those needed for its functionality can potentially modify those outputs making them unreliable. For example, in the smart information hovering over the cup in Figure 2.1a, malicious applications can modify the sugar level information. Other adversarial output attacks include clickjacking which deceives users to ‘click- 2. Literature Review 25 ing’ on sensitive elements through transparent or misleading interfaces [Roesner et al., 2014b], and physiological attacks such as inducing epileptic seizures through a visual trigger [Baldassi et al., 2018]. Furthermore, when one application’s output is another application’s input, this necessitates multiple application access to an output object. For output protection, the integrity, non-repudiation, availability, and policy compli- ance as well as reliability properties has to be maintained. In general, there are three possible types of outputs in MR systems: real-world an- chored outputs, non-anchored outputs, and outputs of external displays. The first two types are both augmented outputs. The last type refers to outputs of other external displays which can be utilized by MR systems, and vice versa. Protecting these out- puts is of paramount importance aside from ensuring input and data protection. As a result, there are three aspects when it comes to output protection: output control, protected rendering, and protecting external displays.
2.5.1 Output Reliability and User Safety
Current MR systems have loose output access control. As a result, adversaries can potentially tamper or spoof outputs that can compromise user safety. Output control policies can be used as a guiding framework on how MR devices will handle outputs from third-party applications. This includes the management of rendering priority which could be in terms of synthetic object transparency, arrangement, occlusion, and other possible spatial attributes to combat attacks such as clickjacking. An output ac- cess control framework [Lebeck et al., 2016] with an object-level granularity have been proposed to make output handling enforcement easier. It can be implemented as an intermediary layer, as in Figure 2.3, and follows a set of output policies. In a follow up work, they presented a design framework [Lebeck et al., 2017] for output policy spec- ification and enforcement which combined output policies from Microsoft’s HoloLens Developer guidelines, and the U.S. Department of Transportation’s National High- way Tra c Safety Administration (NHTSA) for user safety in automobile-installed AR.4 They designed a prototype platform called Arya that ensures policy compliance, integrity, non-repudiation, availability, and authorization; that is, correct outputs are always available, an output’s originator cannot be denied, and only authorized applica- tions can produce such outputs. Succeeding work builds up on Arya’s weakness when it comes to dynamic and complex environments especially when various, di↵erently- sourced policies are required [Ahn et al., 2018]; they utilised reinforcement learning to determine the optimal policy enforcement assisted by fog-based servers. This approach
4Here are two example descriptions of the policies: (1) “Don’t obscure pedestrians or road signs” is inspired from the NHTSA; (2)“Don’t allow AR objects to occlude other AR objects” is inspired from the HoloLens’s guidelines. 2. Literature Review 26 reinforces the properties of integrity and availability in complex and dynamic environ- ments, and further provides confidentiality by doing processing at the edge instead of the cloud.
2.5.2 Protecting Output Displays
Output displays are vulnerable to physical inference threats or visual channel exploits such as shoulder-surfing attacks. These are the same threats to user inputs (Sec- tion 2.3.2) especially when the input and output interfaces are on the same medium or are integrated together such as on touch screens. To provide secrecy and privacy on certain sensitive contexts which requires output confidentiality (e.g. ATM bank transactions), MR can be leveraged. This time, MR capabilities are leveraged for output defense strategies.
1. Content hiding. EyeGuide [Eaddy et al., 2004] used a near-eye HMD to provide a navigation service that delivers secret and private navigation information aug- mented on a public map display. Because the EyeGuide display is practically secret, shoulder surfing is prevented. Other approaches involve the actual hiding of content. For example, VR- Codes [Woo et al., 2012] takes advantage of rolling shutter to hide codes from human eyes but can be detected by cameras at a specific frame rate. Shutter glasses can also be used to similarly hide displayed content [Yerazunis and Car- bone, 2002]. A similar approach has been used to hide AR tags in video [Lin et al., 2017]. This type of technique can hide content from human attackers but is still vulnerable to machine-aided inference or capture.
2. Visual cryptography. Secret display approaches have also been used in visual cryptographic techniques such as visual secret sharing (VSS) schemes. VSS allows the ‘mechanical’ decryption of secrets by overlaying the visual cipher with the visual key. However, classical VSS was aimed at printed content [Chang et al., 2010] and requires strict alignment which is di cult in AR and MR displays, particularly handhelds and HMDs. The VSS technique can be relaxed to use code-based secret sharing, e.g. barcodes, QR codes, and 2D barcodes. The ciphers are publicly viewable while the key is kept secret. An AR-device can then be used to read the cipher and augment the decrypted content over the cipher. This type of visual cryptography have been applied to both print [Simkin et al., 2014] and electronic displays [Lantz et al., 2015, Andrabi et al., 2015]. Electronic displays are, however, prone to attacks from malicious applica- tions which has access to the display. One of these possible attacks is cipher 2. Literature Review 27
rearrangement for multiple ciphers. To prevent such in untrusted electronic dis- plays, a visual ordinal cue [Fang and Chang, 2010] can be combined with the ciphers to provide the users immediate signal if ciphers have been rearranged.
These techniques can also be used to provide protection for sensitive content on displays during input sensing. Instead of providing privacy protection through post- capture sanitization, the captured ciphers will remain secure as long as the secret shares or keys are kept secure. Thus, even if the ciphers are captured during input sensing, the content stays secure. In general, these visual cryptography and content-hiding methods provide visual access control, i.e. authorization, and confidentiality in shared or public resources. More examples of this technique are discussed in Section 2.7.2 on device protection.
Remaining Challenges in Output Protection
Similar to input protection, output protection strategies can use the same abstraction approach applied as an intermediary access control layer (see Figure 2.3) between applications and output interfaces or rendering resources. To enforce these output abstractions, a reference policy framework has to exist through which the abstraction is applied. As a result, perhaps, the biggest challenge is the specification and enforcement of these policies particularly who will specify them and how will they be e↵ectively enforced. In the output side, risks and dangers are more imminent because adversaries are about to actuate or have already actuated the malicious response or output. Thus, these access control strategies and policies are necessary for output protection. Malicious inference or capture of outputs present the same threats as input infer- ence. Section 2.7.2 will focus on device-level protection approaches to output interfaces and displays.
2.6 Protecting User Interactions
In contrast to current widely adapted technologies like computers and smart phones, MR can enable entirely new and di↵erent ways of interacting with the world, with machines, and with other users. Figure 2.6a shows a screenshot from a demonstration video from Microsoft Research on their Holoportation project which allows virtual teleportation in real-time. Consequently, one of the key (yet latent) expectations with these kinds of services and functionalities is how users can have shared space experiences with assurances of security and privacy. Thus, we expand the coverage of protection to ensure protected sharing and collaborations in MR. Similar to data protection, there is a number of properties that is necessary in interaction protection 2. Literature Review 28
(a) Holoportation by Microsoft Research: an example shared space service. The person sitting (left) is “holoported” to the room with the person standing (right) using MR technology. Screenshot from https://youtu.be/7d59O6cfaM0.
Shared Space
User 1 User 2 Private Space
Private Space
(b) A simplified virtual (c) A possible separation in (d) A collaborative space with shared space diagram the underlying physical space a shared space and private which creates boundaries spaces. between users, and devices.
Figure 2.6: Shared Spaces namely non-repudiation, authorization, authentication, identifiability, and policy & consent compliance.
Threats during user interactions Concerns on the boundaries between physical and virtual spaces (Figure 2.1b) in MR and on the directionality of these boundaries have been raised [Benford et al., 1998]. The directionality can influence the balance of power, mutuality and privacy between users in shared spaces. For example, the boundary (labelled 1) in Figure 2.6c allows User 2 to receive full information (solid arrow labelled 2) from User 1 while User 1 receives partial information (broken arrow labelled 3) from User 2. The boundary enables an ‘imbalance of power’ which can 2. Literature Review 29 have potential privacy and ethical e↵ects on the users. For example, early observation work shows territoriality in collaborative tabletop workspaces [Scott et al., 2004]. Thus, the primary threat during user interactions are users themselves. Specifically, an adversarial user can potentially tamper, spoof, or repudiate malicious actions during these interactions. As a result, legitimate users may su↵er denial of service and may be unaware that their personal data may have been captured and then leaked.
2.6.1 Protecting Collaborative Interactions
Most of the approaches in this subcategory ensure the privacy properties of content awareness and policy and consent compliance.
1. Enabling user-originated policies. Emmie (Environmental Management for Multi- user Information Environments) [Butz et al., 1999] allows users to specify the privacy of certain information or objects through privacy lamps and vampire mir- rors [Butz et al., 1998]; the privacy lamps are virtual lamps that ‘emit’ a light cone in which users can put objects within the light cone to mark these objects as private; while vampire mirrors are used to determine privacy of objects by showing full reflections of public objects while private objects are either invisible or transparent. However, this measure only protect virtual or synthetic con- tent and does not provide protection to real-world objects. Similar user-enabled privacy has been demonstrated in RoomPlanner using hand gestures to en- force privacy, through private spaces and outputs, in a digital tabletop [Wu and Balakrishnan, 2003]. Kinected Conference [DeVincenzi et al., 2011] allows the participants to use gestures to impose a temporary private session during a video conference. Aside from that, they implemented synthetic focusing using Microsoft Kinect’s depth sensing capability where other participants are blurred in order to direct focus on a participant who is speaking, and augmented graphics hovering above the user’s heads to show their information such as name, shared documents, and speaking time. The augmented graphics serve as feed-through information to deliver signals that would have been available in a shared physical space but is not readily cross-conveyed between remote physical spaces.
2. Multi-user coordination policies Early work on mediating conflicts in digital workspaces explored the use of multi-user coordination policies [Morris et al., 2006a]. For example, to increase group awareness, they employed cooperative gestures which requires gesture contributions from more than one user to en- force a single command, such as clearing the entire screen when users do the erase gesture together [Morris et al., 2006b]. 2. Literature Review 30
3. Feed-through signalling. SecSpace [Reilly et al., 2014] explores a feed-through mechanism to allow a more natural approach to user management of privacy in a collaborative MR environment. Users in SecSpace are provided feed-through information that would allow them to negotiate their privacy preferences. Fig- ure 2.6c shows an example situation in which User n enters the shared space (labelled 4) on the same physical space as User 2 which triggers an alarm (la- belled 5) or notification for User 1. The notification serves as a feed-through signalling that crosses over the MR boundary. By informing participants of such information, an imbalance of power can be rebalanced through negotiations. Non-AR feed-through signalling have also been used in a non-shared space context like the candid interactions [Ens et al., 2015] which uses wearable bands that lights up in di↵erent colors depending on the smart-phone activity of the user, or other wearable icons that change shape, again, depending on which ap- plication the icon is associated to. However, the pervasive nature of these feed- through mechanisms can still pose security and privacy risks, thus, these mech- anisms should be regulated and properly managed. In addition, the necessary infrastructure, especially for SecSpace, to enable this pervasive feed-through system may be a barrier to wider adoption. A careful balance between the users’ privacy in a shared space and the utility of the space as a communication medium is ought to be sought.
4. Private and public space interactions. Competitive gaming demands secrecy and privacy in order to make strategies while performing other tasks in a shared environment. Thus, it is a very apt use case for implementing user protection in a shared space. Private Interaction Panels (or PIPs) demonstrates a gaming console functionality where a region that is defined within the PIP panel serves as a private region [Szalav´ari et al., 1998]. On the other hand, TouchSpace implements a larger room-scale MR game where users can switch between see- through AR and full VR [Cheok et al., 2002]. Moreover, Emmie’s privacy lamps and mirrors also act as private spaces. BragFish [Xu et al., 2008] implements a similar idea on privacy to that of the PIP with the use of a handheld AR device which has a camera that is used to “read” the markers associated to a certain game setting, and it frees the user from the bulky HMDs as in PIP and TouchSpace. The Gizmondo handheld device has also been used in another room-scale AR game [Mulloni et al., 2008]. Similarly, camera phones have been used as a handheld AR device in a table top marker-based setup for collaborative gaming [Henrysson et al., 2005].
Overall, the aim of these protected collaborative interactions in MR is to provide 2. Literature Review 31 confidentiality for relevant information that users may deem sensitive in a shared context, while a few also provided non-repudiation so that an action that a↵ects other users’ activity cannot be denied by the actor or subsequently used to identify them. Other approaches, such as cooperative gestures, also leads to ensuring availability of the shared task. Perhaps, an important aspect that has risen is the utilization of di↵erent portions of space with di↵erent functions during interactions, namely a public portion for shared objects or activities, and a private portion for user-sensitive objects or tasks. However, shared space platforms assumes that user can freely share and exchange content or information through an exisiting interaction channel. In the next section, we focus on how to protect the sharing channel in an MR context.
2.6.2 Protecting Sharing Initialization
All those shared space systems that were previously discussed rely on a unified ar- chitecture to enable interactions and sharing on the same channel. However, there might be cases that sharing is necessary but no pre-exisiting channel exists to support sharing. Thus, a sharing channel needs to be initialized. The same threats of spoofing and unauthorized access from Personal Area Networks, such as in ZigBee or Bluetooth PAN arises. Similar techniques to out-of-band channels can be used to achieve a secure channel initialization. LooksGoodToMe is an authentication protocol for device-to-device sharing [Gaebel et al., 2016]. It leverages on the camera/s and wireless capabilities of AR HMDs. Specifically, it uses the combination of distance information through wireless localization and facial recognition information to cross-authenticate users by simply looking at each other and initiate sharing. HoloPair, on the other hand, avoids the use of wireless localization, which may be unavailable and ine cient in devices, and instead utilizes exchange of visual cues between users to confirm the shared secret [Sluganovic et al., 2017]. Both uses the visual channel as an out-of-band channel.
Remaining Challenges in Sharing and Interactions
Perhaps, the most apparent challenge is the varying use cases in which users interact or share. A recent work exposes the various user concerns, such as technological misuse and access negotiation, that can arise on a multi-user MR environment [Lebeck et al., 2018]. Depending on the context or situation, privacy and security concerns, as well as the degree of concern, can vary. For example, feed-through signalling may be necessary in classroom scenarios to inform teachers when students enter and leave the classroom; however, there would also be occasions that it could be perceived to be too invasive or 2. Literature Review 32 counter-intuitive, for example during military negotiations in the field. Thus, there is a great deal of subjectivity to determine what is the most e↵ective protection mechanism during sharing or interactions. Perhaps, before everything else, we should ask first: “Who or what are we protecting?”.
2.7 Device Protection
This last category focuses on the actual physical MR device and its input and output interfaces. This implicitly protects data that is used in the above four aspects by ensuring device-level protection. Authentication, authorization, and identifiability are among the most important properties for device protection.
2.7.1 Protecting Device Access
The primary threats to device access are identity spoofing and unauthorized access. All approaches described below aim to provide protection against such threats. Some ap- proaches, particularly those using physiological or biometric information, also ensures identifiability of users in addition to authorization and authentication.
Novel Authentication Strategies Device access control ensures that authorized users are provided access while unauthorized ones are barred. Password still remains as the most utilized method for authentication [Dickinson, 2016]. To enhance protection, multi-factor authentication (MFA) is now being adopted, which uses two or more independent methods for authentication. It usually involves the use of the traditional password method coupled with, say, a dynamic key that can be sent to the user via SMS, email, or voice call. The two-factor variant has been recommended as a security enhancement, particularly for on-line services like E-mail, cloud storage, e-commerce, banking, and social networks. Aside from passwords are PIN- and pattern-based methods that are popular as mobile device authentication methods. A recent study [George et al., 2017] evaluated the usability and security of these established pin- and pattern-based authentication methods in virtual interfaces and showed comparable results in terms of execution time compared to the original non-virtual interface. The following sections look at other novel authentication methods that leverages existing and potential capabilities of MR devices.
1. Gesture- and Active Physiological-based Authentication. Other possible gestures that have been utilised for user identification for authentication includes finger and hand gestures using a 3D camera-based motion controller [Aslan et al., 2014], 2. Literature Review 33
combination of head and blinking gestures triggered by visual cues [Rogers et al., 2015], head-movements triggered by an auditory cue [Li et al., 2016b], and active physiological signals, such as breathing [Chauhan et al., 2017].
2. Passive Physiological-based Authentication. Passive methods include physiolog- ical or biometric signals such as the physiological-signal-based key agreement or (PSKA) [Venkatasubramanian et al., 2010] using PPG features locked in a fuzzy- vault for secure inter-sensor communications for body area networks or BAN. While SkullConduct [Schneegass et al., 2016] uses the bone conduction capa- bility of the Google Glass for user identification and authentication. All these novel methods show promise on how latent gestures, physiological signals, and device capabilities can be leveraged for user identification and authentication.
3. Multi-modal and/or Biometric Authentication combines two or more modes in a singular method. One multi-modal method combines facial, iris, and periocular information for user authentication [Raja et al., 2015]. GazeTouchPass com- bines gaze gestures and touch keys as a singular pass-key for smart phones to counter shoulder-surfing attacks on touch-based pass keys [Khamis et al., 2016]. These types of authentication methods can readily be applied to MR devices that has gaze tracking and other near-eye sensors.
2.7.2 Protecting Physical Interfaces
As discussed in Section 2.3.2 and Section 2.5.2, MR interfaces are vulnerable from malicious inference which leads to disclosure of input activity, and/or output display information. Currently available personal AR or MR see-through HMDs project con- tent through lenses. The displayed content on the see-through lenses can leak private information and be observed externally. Visual capture devices can be used to capture and extract information from the display leakage. External input interfaces su↵er from the same inference and side-channel attacks such as shoulder-surfing. There are optical and visual strategies that can be used to provide interface and activity confidentiality and unobservability. Figure 2.7 shows example strategies of optical blocking and visual cryptography.
1. Optical strategies have been proposed, such as the use of polarization on the outer layer (as in Figure 2.7 labelled 1), use of narrowband illumination, or a combination of the two to maximize display transmission while minimizing leak- age [Kohno et al., 2016]. Other capture protection strategies that have been tested on non-MR devices which allows objects to inherently or actively protect themselves. For example, the TaPS widgets use optical reflective properties of 2. Literature Review 34
polarizer 1 Trusted App 1 App 1.1
Other Other 1 Outputs Data ... 1.0 App 2 App
3D 2 Display Data
2.0 App 3 App + Depth + Camera 3 2.2 2.1 Data ... . 2 App N App Other Other
Sensors N optical display Mixed Reality Platform element Data
Figure 2.7: Sample interface and display protection strategies: 1) inserting a polarizer to prevent or block display leakage; and 2) visual cryptography, e.g. using secret augmentations (2.2) through decryption (2.1) of encrypted public interfaces (2.0). All elements to the left of the optical display element are considered vulnerable to external inference or capture.
a scattering foil to only show content at a certain viewing angle [M¨ollers and Borchers, 2011]. Active camouflaging techniques have also been used, particu- larly on mobile phones, which allows the screen to blend with its surrounding just like a chameleon [Pearson et al., 2017]. Both TaPS widgets and the chameleon- inspired camouflaging are physically hiding sensitive objects or information from visual capture. The content-hiding methods discussed in Section 2.5.2 to hide outputs are also optical strategies.
2. Visual cryptography and scrambling techniques for display protection have also been discussed in Section 2.5.2. The same can also be used for protecting sensi- tive input interfaces. EyeDecrypt [Forte et al., 2014] uses visual cryptography technique to protect input/output interfaces as shown in Figure 2.7 labelled 2. The publicly viewable input interface is encrypted (Figure 2.7 step 2.0), and the secret key is kept or known by the user. Through visual decryption (step 2.1), only the user can see the actual input interface through the AR display (step 2.2). Another AR-based approach secretly scrambles keyboard keys to hide typing activity from external inference [Maiti et al., 2017]. However, these techniques greatly su↵er from visual alignment issues, i.e. aligning the physical interface with the AR rendered objects. 2. Literature Review 35
Remaining Challenges in Device Interface Protection
Despite the use-cases with visual cryptography using AR or MR displays, the usability of this technique is still confined to specific sensitive use cases due to the requirements of alignment. Furthermore, this type of protection is only applicable to secrets that are pre-determined, specifically, information or activities that are known to be sen- sitive, such as password input or ATM PIN input. These techniques are helpful in providing security and privacy during such activities in shared or public space due to the secrecy provided by the near-eye displays which can perform the decryption and visual augmentation. Evidently, it only protects the output or displayed content of external displays but not the actual content which are displayed through the AR or MR device. We have presented both defensive and o↵ensive, as well as active and passive, strategies to device protection. Nonetheless, there are still numerous e↵orts on im- proving the input and output interfaces for these devices and it is opportune to consider in parallel the security and privacy implications of these new interfaces.
2.8 Summary of Security and Privacy Approaches in MR
Table 2.2 summarizes all of the approaches that have been discussed. The approaches are compared based on which security and privacy properties they address and to which elements these properties are provided for. We use the same symbols used in Figure 2.1b, namely for data flow, for process, for storage, and for entity. ⌃ ⇤ 4 In the next chapter (Chapter 4), we proceed with demonstrating the privacy leakage in MR data, and present a heuristic measure for spatial privacy risk before we present and evaluate proposed protection measures in Chapter 5 and Chapter 6.
Generalizations and gaps. Unsurprisingly, the majority of the approaches are targeting confidentiality,i.e.preventinginformation disclosure. Furthermore, the categorization roughly localised the targeted properties. It also revealed that some properties are rather specific to certain approaches, e.g. authentication is of course targeted by authentication approaches. Nonetheless, trends and clustering of target properties among the categories are evident. The first two major categories roughly target properties that are more privacy leaning. On the other hand, the last three categories were fairly balanced. Moreover, after confidentiality, the next most targeted properties are authorization, undetectability & unobservability, and policy & consent compliance. Consequently, it is evident that MR-targeted protection approaches, particularly 2. Literature Review 36
⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ Compliance
⌃ ⌃ ⌃ ⌃ ⌃ Awareness
4 4 4 4 4 4 4 4
Deniability Undetectability 4 4
⌃ ⇤ ⇤ ⌃ ⇤ ⌃ ⇤ ⇤ ⇤ ⌃ ⌃ ⇤ Unlinkability ⌃ 4 ⌃ ⌃ 4
⇤ ⇤ Anonymity ⌃ 4 ⌃ 44 44 44 ⌃ 4
⇤ ⇤ ⇤ ⇤ ⇤ Confidentiality 4 4 4 4 4 4 4 4 4 4 4 4 4 4
⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃
Identification Authentication
. The entity can be the data itself, the user as the Authorization of an adversary as a security provision)
entity Availability
4 Non-Repudiation non-MR context. ⌃ ⌃
identifiability Integrity ‡ ,storage and/or ‡ † ⇤ proto-MR context, or a ‡ † † † , process ‡ † ‡ [Roesner et al., [Szczuko, 2014] MR context, a ‡ ‡ [Zarepour et al., 2016] † † [Truong et al., 2005] data flow , [Hayashi et al., 2010] [Wang et al., 2017] ⌃ [Li et al., 2016a] [Templeman et al., 2014] [Jana et al., 2013a] [Xu and Zhu, 2015] [Figueiredo et al., 2016] [de Guzman et al., 2019c] ‡ [Raval et al., 2014, Raval et al., 2016] [Shu et al., 2016] [Jana et al., 2013b] ‡ originator of the data, or the adversary (say, [Aditya et al., 2016] SemaDroid Capture-resistant spaces MarkIt PrePose Recognizers 3D humanoids replace humans OpenFace/RTFace PlaceAvoider SafeMR I-pic PrivacyCamera Cardea Context-based sanitization See-through vision World-driven access control 2014c] Darkly Approach Input Protection Approaches The approaches have been applied to either an Table 2.2: Summary of MR approaches that have been discussed, and which security and privacy properties they provide to 2.3.2 2.3.1 2.3.1 2.3.2 2.3.2 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 Ch 6 which data flow element: Section
2. Literature Review 37 Compliance
⌃ ⌃ ⌃ ⇤
Awareness Deniability ⌃ ⌃ ⇤
⇤ ⇤ ⇤ ⇤ Undetectability