<<

THE UNIVERSITY OF NEW SOUTH WALES

Towards a privacy-aware mixed present

Jaybie A. de Guzman (5138924) [email protected]

Athesisinfulfillmentoftherequirementsforthedegreeof Doctor of Philosophy

Supervisors: Prof. Aruna Seneviratne Dr. Kanchana Thilakarathna

School of Electrical Engineering and Telecommunications

Faculty of Engineering

December 2020 Thesis/Dissertation Sheet

Surname/Family Name : De Guzman Given Name/s : Jaybie Abbreviation for degree as give in the University calendar : PhD Faculty : Engineering School : Electrical Engineering and Telecommunications Thesis Title : Towards a privacy-aware present

Abstract 350 words maximum:

Mixed reality (MR) technology development is now gaining momentum due to advances in , sensor fusion, and realistic display technologies. However, concerns on potential security and privacy risks are continuously being pointed out: for example, on how sensitive information can be captured by sensors and accessed by untrusted third-party applications, or how the reliability of the augmented virtual outputs can be ensured. With most of the earlier research and development focused on delivering the promise of MR, these privacy and security implications are yet to be thoroughly investigated; thus, in an extensive literature review, we present an ex- position of the latest security and privacy work on MR (as well as other MR-related technology), and group them into five data-centric categories. The exposition shows that most of these concerns and their accompanying (proposed) protection approaches were primarily focused on traditional information channels or spaces, i.e. images, video, audio and so on, and not on actual MR-specific information channels, such as 3D spatial data which latest MR devices and platforms are now utilizing. There- fore, there is a need to investigate the potential privacy leakage in 3D data used in MR platforms and, correspondingly, design a privacy-preserving mechanism for these types of data. Firstly, we demonstrate the privacy leakage from spatial data utilized in MR. Secondly, we present a heuristic or empirical measure that can signify the spatial privacy risk a captured space has. Thirdly, we propose to leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility. Lastly, we demonstrate a visual access control mechanism as a data-flow targeted measure which can be utilised in conjunction with other data-centric protection measures.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents a non-exclusive licence to archive and to make available (including to members of the public) my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known. I acknowledge that I retain all intellectual property rights which subsist in my thesis or dissertation, such as copyright and patent rights, subject to applicable law. I also retain the right to use all or part of my thesis or dissertation in future works (such as articles or books).

…………………………………………………………… ……….……………………...…….…13-Jan-2021 Signature Date The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years can be made when submitting the final copies of your thesis to the UNSW . Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

13-Jan-2021 Date ……………………………………………...... INCLUSION OF PUBLICATIONS STATEMENT

UNSW is supportive of candidates publishing their research results during their candidature as detailed in the UNSW Thesis Examination Procedure.

Publications can be used in their thesis in lieu of a Chapter if: • The candidate contributed greater than 50% of the content in the publication and is the “primary author”, ie. the candidate was responsible primarily for the planning, execution and preparation of the work for publication • The candidate has approval to include the publication in their thesis in lieu of a Chapter from their supervisor and Postgraduate Coordinator. • The publication is not subject to any obligations or contractual agreements with a third party that would constrain its inclusion in the thesis

Please indicate whether this thesis contains published material or not:

This thesis contains no publications, either published or submitted for publication ☐ (if this box is checked, you may delete all the material on page 2)

Some of the work described in this thesis has been published and it has been documented in the relevant Chapters with acknowledgement ☒ (if this box is checked, you may delete all the material on page 2)

This thesis has publications (either published or submitted for publication) ☐ incorporated into it in lieu of a chapter and the details are presented below

CANDIDATE’S DECLARATION I declare that: • I have complied with the UNSW Thesis Examination Procedure • where I have used a publication in lieu of a Chapter, the listed publication(s) below meet(s) the requirements to be included in the thesis. Candidate’s Name Signature Date (dd/mm/yy) Jaybie A. de Guzman 13/01/2021

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents a non-exclusive licence to archive and to make available (including to members of the public) my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known. I acknowledge that I retain all intellectual property rights which subsist in my thesis or dissertation, such as copyright and patent rights, subject to applicable law. I also retain the right to use all or part of my thesis or dissertation in future works (such as articles or books).’

‘For any substantial portions of copyright material used in this thesis, written permission for use has been obtained, or the copyright material is removed from the final public version of the thesis.’

Signed ……………………………………………......

Date ……………………………………………...... 13-Jan-2021 ......

AUTHENTICITY STATEMENT ‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis.’

Signed ……………………………………………......

Date ……………………………………………...... 13-Jan-2021 ......

Abstract

Mixed reality (MR) technology development is now gaining momentum due to ad- vances in computer vision, sensor fusion, and realistic display technologies. However, concerns on potential security and privacy risks are continuously being pointed out: for example, on how sensitive information can be captured by sensors and accessed by untrusted third-party applications, or how the reliability of the augmented virtual outputs can be ensured. With most of the earlier research and development focused on delivering the promise of MR, these privacy and security implications are yet to be thoroughly investigated; thus, in an extensive literature review, we present an ex- position of the latest security and privacy work on MR (as well as other MR-related technology), and group them into five data-centric categories. The exposition shows that most of these concerns and their accompanying (proposed) protection approaches were primarily focused on traditional information channels or spaces, i.e. images, video, audio and so on, and not on actual MR-specific information channels, such as 3D spatial data which latest MR devices and platforms are now utilizing. There- fore, there is a need to investigate the potential privacy leakage in 3D data used in MR platforms and, correspondingly, design a privacy-preserving mechanism for these types of data. Firstly, we demonstrate the privacy leakage from spatial data utilized in MR. Secondly, we present a heuristic or empirical measure that can signify the spatial privacy risk a captured space has. Thirdly, we propose to leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility. Lastly, we demonstrate a visual access control mechanism as a data-flow targeted measure which can be utilised in conjunction with other data-centric protection measures. Acknowledgements

I would like to express my sincerest gratitude to the University of New South Wales and its people for hosting and supporting my postgraduate journey. I would also like to express my gratitude to CSIRO’s Data61 for being my home for the most part of this journey, and, most especially, to the people in Data61 – my postgraduate research cohorts, to the people in the and Privacy group, and the other researchers, scientists, and sta↵ who made my stay in Data61 a delightful and unfor- gettable experience. I am likewise extremely grateful to the University of the Philip- pines Diliman, the Engineering Research and Development for Technology program of the Philippine Government’s Department of Science and Technology, and, most importantly, to the Electrical and Electronics Engineering Institute of UP Diliman for giving me the opportunity to pursue and generously supporting my postgraduate studies. Most of all, I would like to express my deepest gratitude to my supervisors, Aruna Seneviratne and Kanchana Thilakarathna, for their expert guidance. I would also like to thank my dearest friends in Sydney – travel buddies, running buddies, climbing buddies, lunch buddies, dinner buddies, basking-under-the-sun bud- dies, co↵ee buddies, and beer buddies – you’ve made my whole three and half year stay in Australia very colorful and exciting. Lastly, I would like to thank my family – my mother Amabel, and my brothers, Kiel, Kirby, and Angelo – and my partner, R.A., for being very supportive in all my endeavors in life – may it be academic or the outdoors.

i Contents

Acknowledgements i

Table of Contents ii

List of Figures v

List of Tables viii

List of Publications ix

1 Introduction 1 1.1 Motivation ...... 3 1.2 Contributions of the research ...... 5 1.3 List of Publications ...... 5 1.4 Manuscript Overview ...... 7

2 Literature Review 8 2.1 General Security and Privacy Requirements for MR ...... 9 2.2 Categorizing the Threats and Approaches ...... 10 2.3 Input Protection ...... 13 2.3.1 Passive Inputs: Targeted and Non-intended Latent Data . . . . 13 2.3.2 Gestures and other Active User Inputs ...... 17 2.4 Data Protection ...... 19 2.4.1 Protected Data Collection and Aggregation ...... 20 2.4.2 Protected Data Processing ...... 20 2.4.3 Protecting Data Storage ...... 23 2.5 Output Protection ...... 24 2.5.1 Output Reliability and User Safety ...... 25 2.5.2 Protecting Output Displays ...... 26 2.6 Protecting User Interactions ...... 27 2.6.1 Protecting Collaborative Interactions ...... 29 2.6.2 Protecting Sharing Initialization ...... 31 2.7 Device Protection ...... 32 2.7.1 Protecting Device Access ...... 32 2.7.2 Protecting Physical Interfaces ...... 33 2.8 Summary of Security and Privacy Approaches in MR ...... 35

ii CONTENTS iii

3 Spatial Privacy Problem in Mixed Reality 41 3.1 Spatial Privacy Framework ...... 43 3.2 3D Spatial Data ...... 43 3.3 Adversary Model ...... 44 3.4 Privacy Metrics ...... 46 3.5 Mixed Reality Functionality ...... 47

4 Spatial Privacy Leakage from 3D Mixed reality data 49 4.1 3D data from Mixed Reality devices ...... 50 4.2 Spatial Inference Attack ...... 53 4.2.1 Adversarial inference ...... 53 4.2.2 3D recognition methods ...... 53 4.2.3 Inference using 3d Descriptors: NN-matcher ...... 55 4.2.4 Inference using DNN: pointnetvlad ...... 59 4.3 Information Reduction Methods ...... 60 4.4 Evaluation Setup ...... 61 4.5 Spatial Inference Success ...... 63 4.5.1 Validating inference success over partial releases ...... 63 4.5.2 Spatial privacy through surface-to-plane generalization . . . . . 65 4.5.3 Utility in terms of QoS ...... 66 4.6 Detecting Spatial Inference Risk ...... 67 4.6.1 the geometric shape functions ...... 67 4.6.2 Computing Spatial Complexity ...... 71 4.6.3 Spatial Complexity vs Inference Success ...... 71 4.6.4 Local Complexity vs Inference Success ...... 73

5 Conservative Plane Releasing for Spatial Privacy Protection 79 5.1 Conservative Plane Releasing ...... 80 5.2 Extending the Attack Scenario ...... 80 5.3 Inference Success with Successive Releasing ...... 81 5.4 Spatial Privacy with Conservative Releasing ...... 83 5.4.1 Utility with conservative releasing ...... 86 5.4.2 Utility vs Privacy ...... 88 5.4.3 Protection Properties of Conservative Releasing ...... 89

6 SafeMR: Object-level Abstraction 90 6.1 Visual Processing and Threat Model ...... 91 6.2 SafeMR: Object-level Abstraction ...... 92 6.2.1 System Architecture ...... 92 6.2.2 System Properties and Functionalities ...... 93 6.2.3 Implementation ...... 93 6.3 Performance Evaluation ...... 94 6.3.1 Validating the Vision ...... 95 6.3.2 System Evaluation Setup ...... 95 6.3.3 Evaluation Metrics ...... 97 6.4 SafeMR Performance ...... 98 CONTENTS iv

6.4.1 Detection Utility & Secrecy ...... 98 6.4.2 Execution Time Performance ...... 100 6.4.3 Energy Consumption ...... 101 6.5 Provided Utility by SafeMR ...... 101 6.5.1 Resource Sharing Benefit ...... 102 6.6 Protection Properties of SafeMR ...... 102

7 Conclusions 104

Bibliography 108

A Definitions of the General Security and Privacy Properties 124

B Preliminary work on 3D Description and Inference 127 B.1 Preliminary 3D privacy problem ...... 127 B.2 Describing the 3D space ...... 128 B.2.1 Self-similarity-based 3D descriptors ...... 129 B.2.2 Spin Image 3D descriptors ...... 130 B.3 Inferring the 3D space ...... 131 B.3.1 Bayesian Inference model using the point cloud ...... 131 B.3.2 Inference using the rotation-invariant descriptors...... 131 B.3.3 Validating the inference models...... 132 B.4 Memory compactness of descriptors and inference models ...... 134

C Plane Generalization 136 List of Figures

1.1 A visualised artistic imagining of a “hyper-realisitic” world. Screenshot from Keiichi Matsuda’s YouTube video: https://youtu.be/YJg02ivYzSs 1 1.2 Mixed Reality pipeline: (Top) the immersive experience as viewed by the user; and (Bottom) the main processing pipeline of (1) detection, (2) transformation, and (3) rendering...... 2 1.3 Overhead view (bottom) of the HoloLens-captured 3D point cloud of an example environment; the 2D-RGB view (top-left) of a sample region; the 3D surface plot (top-right) of the 3D point cloud of the sample region. 3

2.1 (a) Mixed reality environment and (b) the data flow diagram . . . . . 11 2.2 A data-centric categorization of the various security and privacy work or approaches on mixed reality and related technologies...... 12 2.3 Shows a generic block diagram that inserts an intermediary protec- tion layer between the applications and device resources. (Data flows to and from third-party applications are now limited or less-privileged as represented by the broken arrows.) ...... 13 2.4 Example strategies for input protection: 1) information reduction or partial sanitization, e.g. from RGB facial information to facial outline only; 2) complete sanitization or blocking; or 3) skeletal information instead of raw hand video capture. (The broken arrow indicates less privileged information flow.) ...... 14 2.5 Generic block diagrams of two example data protection approaches: 1) cryptographic technique using secure multi-party computation where two or more parties exchange secrets (1.1 and 1.3) to extract combined knowledge (1.2 and 1.4) without the need for divulging or decrypting each others data share; and 2) personal data stores with “trusted” applets. 21 2.6 Shared Spaces ...... 28 2.7 Sample interface and display protection strategies: 1) inserting a polar- izer to prevent or block display leakage; and 2) visual , e.g. using secret augmentations (2.2) through decryption (2.1) of encrypted public interfaces (2.0). All elements to the left of the optical display element are considered vulnerable to external inference or capture. . . 34

v LIST OF FIGURES vi

3.1 MR pipeline (center) shows an MR function that transforms the G detected spatial map to the rendered output ; an adversarial Si Y pipeline (bottom) is shown in parallel with an attacker having access J to (1) historically collected spaces to infer information (i.e. a hy- I pothesis ) about the (2) current user space ; while an intermediary H S privacy protection mechanism (top) is inserted that transforms the M raw MR data to a privacy preserving version ˜ ...... 42 Si Si 3.2 An oriented point with position vectorp ˆ = x, y, z and normal vector { } nˆ = n ,n ,n . A group of these oriented points constitute a 3D point { x y z} cloud. A mesh [triangle] information can also be provided to indicate how these points are put together to form surfaces...... 44

4.1 HoloLens-captured 3D point clouds of the 7 collected environments (left); a 3D surface of a sample space (bottom-right), and its 2D-RGB view (top-right)...... 51 4.2 Visualized example spatial data captured by HoloLens and ARCore: . 52 4.3 Adversarial : Step 1 involves the building of the reference from historical maps, and the training of pointnetvlad’s deep neural network; Step 2 is inference where the reference database is queried to match an unknown point cloud to get a hypothesis H about ’s S S identity or label I...... 54 4.4 Example (a) partial releases with (b) generalization ...... 60 4.5 Overall inference success in terms of F1 score ...... 63 4.6 Heatmap of Inference performance Per-space in terms of F1 score with annotated values at r = 1.0, 2.0, 3.0 ...... 64 { } 4.7 One-time partially released RANSAC-generalized spaces vs varying radii: (top) inter-space and (bottom) intra-space privacy ...... 65 4.8 QoS Q vs varying radius r...... 67 4.9 Distribution of the per-space similarity measures – d, v, and ⇠ – of our gathered spaces from HoloLens (Holo-Raw and Holo-Gen) and ARCore. (For the distributions of d, and v, we plot the moving average with width 3 for a smoother histogram.) ...... 68 4.10 Heatmap of per-space spatial complexity in terms of our chosen met- rics: d, v, and ⇠...... 70 4.11 (a-b) The aprioridistribution of the complexity value d given in- ference success C,i.e. P ( C), and (c-d) the posteriori likelihood of d| inference success C given local ,i.e.P (C )...... 74 d | d 4.11 (Continuation) (e-f) The aprioridistribution of the complexity value given inference success C,i.e. P ( C), and (g-h) the posteriori v v| likelihood of inference success C given local ,i.e.P (C )...... 75 v | v 4.11 (Continuation) (i-j) The aprioridistribution of the complexity value given inference success C,i.e. P ( C), and (k-l) the posteriori ⇠ ⇠| likelihood of inference success C given local ,i.e.P (C )...... 76 ⇠ | ⇠ 5.1 Example of conservative plane releasing...... 80 LIST OF FIGURES vii

5.2 Successively released generalized partial spaces: (top) inter-space and (bottom) intra-space privacy ...... 82 5.3 Average INTER-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) ...... 84 5.4 Average INTRA-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) ...... 85 5.5 Average QoS Q of conservatively released planes over successive releasing 87 5.6 Intersection map of Q 0.2 and ⇧ 0.5...... 88  1 6.1 Diminishing information: a) the raw visual capture; b) the target is cropped out but still with complete visual information of the target; c) only the bounding box of the target is exposed; d) only the centroid of the target is exposed; and e) only the binary presence, whether the target is within view or not, is exposed...... 91 6.2 Proposed visual processing architecture with object-level abstraction SafeMR inserted as an intermediary layer between the core APIs and the third-party applications...... 92 6.3 SafeMR demo showing di↵erent privilege levels ...... 94 6.4 Comparing the employed detection algorithms in terms of per-frame processing and number of feature matches: OpenCV-SIFT, OpenCV- ORB, and TensorFlow Object Detection API. TF-OD does not expose the number of feature matches...... 95 6.5 Varying abstraction mode: without (left) and with (right) SafeMR.The privileged views show actual detected objects while the larger views show which objects (or their information) are provided to applications. 96 6.6 CDF of the detection hits and secret hits ...... 99 6.7 Average overall frame processing time in seconds ( standard deviation). (Processed frame size is 500x500) ...... 100 ± 6.8 Comparing performance based on input frame size (number of tasks are indicated at the bottom of the bars) ...... 100

7.1 Overall system diagram showing how both SafeMR and data manipula- tions can be integrated within an intermediary layer of protection. . . 104

B.1 3D coordinate systems ...... 129 B.2 Inference performance heatmaps of the di↵erent 3D description ap- proaches ...... 133 B.3 Performance of the di↵erent 3D description/inference for di↵erent resolutions 134 B.4 Used memory by inference models and descriptors extracted from dif- ferent point cloud resolutions...... 135 List of Tables

2.1 Combined security and privacy properties and their corresponding threats 9 2.2 Summary of MR approaches that have been discussed, and which se- curity and privacy properties they provide to which data flow element: data flow, process, storage, and/or entity. The entity can be ⌃ ⇤ 4 the data itself, the user as the originator of the data, or the adversary (say, identifiability of an adversary as a security provision) ...... 36 2.2 (Continuation)...... 37 2.2 (Continuation)...... 38 2.2 (Continuation)...... 39

3.1 Notation Map ...... 43

4.1 Correlation coecient of the three metrics and overall F1 score (with varying metric parameters (neighbors or pairs) and query space size r) 72

6.1 Average Detection Hit Rate ( stdev) (Processed frame size is 500x500) 98 ± 7.1 Our two proposed approaches, and which security and privacy proper- ties they provide to which data flow element: data flow, process, ⌃ storage, and/or entity. (As presented in Table 2.2) ...... 105 ⇤ 4

viii List of Publications

[de Guzman et al., 2019d] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Security and privacy approaches in mixed re- ality: A literature survey.” ACM Computing Surveys (CSUR) 52.6 (2019): 110.

[de Guzman et al., 2019b] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “A First Look into Privacy Leakage in 3D Mixed Reality Data.” European Symposium on Research in . Springer, Cham, 2019.

[de Guzman et al., 2019c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Safemr: Privacy-aware visual information protection for mobile mixed reality.” IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.

[de Guzman et al., 2020c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Spatial Privacy Leakage in 3D Mixed Real- ity Data.” Cyber Defence Next Generation Technology and Science Conference 2020 (CDNG 2020). CSIRO, 2020. (Ac- cepted)

ix Chapter 1

Introduction

Figure 1.1: A visualised artistic imagining of a “hyper-realisitic” world. Screenshot from Keiichi Matsuda’s YouTube video: https://youtu.be/YJg02ivYzSs

The future with mixed reality (MR) is now. In recent years, there have been an uptake in the release of MR applications such as the gaming application PokemonGo in 2016 as well as dedicated head-mounted display (HMD) devices such as the Hololens (with pre-production units also released in 2016) and the (with the Magic Leap One revealed in 2017). Although there are varying consensus on the definition of MR and whether it is “merged” instead of “mixed”, we refer to MR as the combination of aspects from (AR) and (VR) that deliver rich services and immersive experiences, and allow interaction of real objects with synthetic virtual objects and vice versa. By combining the synthetic presence

1 1. Introduction 2

Immersive User Experience

Example Output User space 3 e.g. Pokémon on the desk

spatial map of the user space 2 Main MR 1 processing Intended MR Function Output pipeline

Figure 1.2: Mixed Reality pipeline: (Top) the immersive experience as viewed by the user; and (Bottom) the main processing pipeline of (1) detection, (2) transformation, and (3) rendering. o↵ered by VR and the extension of the real world by AR, MR enables a virtually endless suite of applications that is not o↵ered by current AR and VR platforms, devices, and applications. Figure 1.1 shows a possible MR near-future “where physical and virtual have merged, and the [environment] is saturated in media” [Matsuda, 2016]. These MR experiences are brought to reality by recent developments, primarily, in sensor fusion, computer vision (particularly in object sensing, and tracking), human- computer interactions (HCI), and realistic display technologies (such as projections, and holograms). Figure 1.2 shows a generic pipeline for MR processing. These MR platforms cap- ture the environment, primarily, through vision sensors such as cameras with depth sensors. The captured visual or spatial information are processed to construct a spa- tial mapping or digital representation of the environment. This mapping captures the structural features of the environment, and, of course, the objects in the envi- ronment. This allows the machine to understand the environment and detect the information-of-interest, which can be a structural feature, visual target, or even user gesture. Then, the MR application or function extracts the necessary information such as surface orientation and location that informs where the virtual object can potentially be augmented–e↵ectively transforming the detected information to a form that is necessary in delivering the output. Finally, the intended output is rendered on to the scene to make it seem like it inhabits the real world as it is viewed through a display. 1. Introduction 3

2D-view of sample region 3D surface plot of sample region

Overhead view of a 3D point cloud plot of an example space

Figure 1.3: Overhead view (bottom) of the HoloLens-captured 3D point cloud of an example environment; the 2D-RGB view (top-left) of a sample region; the 3D surface plot (top-right) of the 3D point cloud of the sample region.

Moreover, these spatial maps can be utilized in conjunction with image, video, audio, and other sensor data to create a fully-immersive MR experience. As a result, MR allows users to interact with machines and each other in a totally di↵erent manner: for example, using gestures in the air instead of swiping in screens or tapping on keys. The output of our interactions, also, will no longer be confined within a screen. Instead, outputs will now be mixed with our real-world experience, and soon we may not be able to tell what is real and what is synthetic.

1.1 Motivation

Given these capabilities, MR users face even greater risks as richer information can be gathered using a wide variety of sensors. For example, the Microsoft HoloLens 1. Introduction 4 has a multitude of visual sensors: four (4) cameras for environment understanding, a separate depth sensor, two (2) infrared cameras for , and another cam- era for view capture. Figure 1.3 shows an example spatial map captured using a HoloLens. These spatial maps are usually stored as an unordered list of 3D points and may sometimes be accompanied by triangle mesh information to represent surfaces. Furthermore, these maps are arguably more lightweight than video despite containing accurate representations of user spaces. Moreover, despite these capabilities being seemingly necessary in delivering the promise of MR, not all MR functionalities or services require extremely rich informa- tion. Privacy concerns are further exacerbated by recent advances in machine such as near real-time visual object detection which enables inference beyond the in- tended functionality [Oh et al., 2016]. Furthermore, once raw visual data have been made available to applications and services, users may no longer have control over how these data are further utilized [Roesner et al., 2014a]. For example, visual sensors in the MR device can subtly capture images and video without the knowledge of those around the user.1 It has been demonstrated how easy it is to use a simple facial recognition to match live-captured photos with publicly available photos on-line (from on-line social networks such as Facebook) and extract personal information such as names and social security numbers [Acquisti, 2011]. Various endeavor have highlighted these risks over captured visual data and, likewise, various protection mechanisms have been posed. However, it is not only visual data that poses risks, but also the spatial maps that provides the necessary environment understanding to MR platforms. This capability further poses unfore- seen privacy risks for users. Once these captured 3D maps have been revealed to untrusted parties, potentially sensitive spatial information about the users’ space are disclosed. Adversaries can vary from a benign background service that delivers unso- licited advertisements based on the objects detected from the user’s surroundings to malevolent burglars who are able to map the user’s house, and, perhaps, the locations and dimensions of specific objects in their house based on the released 3D data. Fur- thermore, turning o↵ GPS tracking for location privacy may no longer be sucient once the user starts using MR applications that can expose their locations through the 3D and visual data that are exposed. (For example, Google has unveiled their Visual Positioning Service (or VPS) using visual and 3D data to locate users – an o↵shoot of Project Tango – during their 2018 I/O keynote event.) Therefore, we introduce spatial privacy which we pertain to the protection and ensuring the privacy of the user spatial information captured by MR devices and other

1This violates bystander privacy–the unauthorized capture of information about the other users or ‘bystanders’. 1. Introduction 5 related platforms that utilize spatial data. Our work primarily focuses on ensuring spatial privacy by, first, quantifying and exposing the spatial privacy leakage, and, then, presenting counter measures.

1.2 Contributions of the research

Given the risks we have mentioned, there is a need to investigate the potential pri- vacy leakage in 3D data used in MR platforms and, correspondingly, design privacy- preserving mechanisms for MR data. To this end, we pose the following major contri- butions of this work:

1. As primary contribution, we present a comprehensive review of the various works and literature in security and privacy in MR, and demonstrate the gaps that still needs to be addressed: specifically, none of the available approaches in the literature addresses the spatial privacy issues with MR.

2. Then, we demonstrate the privacy leakage from spatial data captured and uti- lized in MR:

(a) specifically, we present an empirical measure that can signify the spatial privacy risk a captured space has – this leads the way towards early assess- ment of spatial privacy risk for user spaces; and (b) demonstrate how the risks persist even after implementing spatial general- izations (as a rudimentary privacy preservation by information reduction).

3. Lastly, we present two MR-targeted protection measures:

(a) we leverage surface-to-plane generalizations coupled with conservative plane releasing to provide spatial privacy – as a data-centric form of protection – while maintaining data utility; and (b) we present a visual access control mechanism in the form of object-level abstraction as a data flow-centric measure. We have demonstrated the practical feasibility of this abstraction on real devices.

1.3 List of Publications

As a result of this work, we have listed the publications in the preliminary pages and list them again below: 1. Introduction 6

[de Guzman et al., 2019d] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Security and privacy approaches in mixed re- ality: A literature survey.” ACM Computing Surveys (CSUR) 52.6 (2019): 110.

[de Guzman et al., 2019b] Jaybie A. de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “A First Look into Privacy Leakage in 3D Mixed Reality Data.” European Symposium on Research in Computer Security. Springer, Cham, 2019.

[de Guzman et al., 2019c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Safemr: Privacy-aware visual information protection for mobile mixed reality.” IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.

[de Guzman et al., 2020c] Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Spatial Privacy Leakage in 3D Mixed Real- ity Data.” Cyber Defence Next Generation Technology and Science Conference 2020 (CDNG 2020). CSIRO, 2020. (Ac- cepted)

We also have the following works under review:

Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Analysing Spatial Inference Risk Over Mixed Reality Spatial Data.” submitted to Proceed- ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/Ubicomp).

Jaybie A.de Guzman, Kanchana Thilakarathna, and Aruna Seneviratne. “Con- servative Plane Releasing for Spatial Privacy Protection in Mixed Reality.” to be submitted to IEEE Transactions on Mobile Computing (TMC).

We also presented the following demonstrations and/or posters:

“Demo: Privacy-aware visual information protection for mobile mixed reality.” presented as a demonstration at the IEEE 41st Conference on Local Computer Networks (LCN). IEEE, 2019.

“Utilizing ‘simple’ object-level abstraction for visual information access control in mixed reality”, presented as a poster in the 2018 UNSW Postgraduate Research Symposium. 1. Introduction 7

1.4 Manuscript Overview

We have organized the subsequent chapters to focus on each of the contributions specified in Section 1.2.

• In Chapter 2, we present the exposition of security and privacy related work on MR and related technologies. The majority of this chapter has been published in ACM Computing Surveys [de Guzman et al., 2019d].

• Then, we formalize the spatial privacy problem in Chapter 3 which is based on the framework initially described in [de Guzman et al., 2019b] and expanded in [de Guzman et al., 2020b].

• In Chapter 4, we proceed with the demonstration of the privacy leakage (as demonstrated in [de Guzman et al., 2019b]), and the presentation of an empirical measure of spatial privacy risk (as presented in [de Guzman et al., 2020a]).

• In Chapter 5, we investigated and evaluated the viability of conservative plane releasing as a data-centric protection measure (as proposed in [de Guzman et al., 2020b]).

• And, in Chapter 6, we present a visual information control mechanism as a data flow-centric protection measure (as demonstrated in [de Guzman et al., 2019c, de Guzman et al., 2019a]).

• Lastly, we conclude this work in Chapter 7. Chapter 2

Literature Review

Various surveys on MR have focused on identifying the challenges and, thus, the neces- sary research and development to realize the technology. The various early challenges – such as matching the real and virtual displays, aligning the virtual objects with the real world, and the various errors that need to be addressed such as optical dis- tortion, misalignment, and tracking – have been discussed in [Azuma, 1997]. It was complemented with a following survey that focuses on the enabling technologies, in- terfacing, and [Azuma et al., 2001]. A much more recent survey updated the challenges in MR systems to be performance, alignment, interaction, mobility, and visualization [Rabbi and Ullah, 2013]. A review of the various head-mounted display (HMD) technologies for consumer electronics was also presented in [Kress and Starner, 2013]. Another study looked into a specific type of AR, mobile AR, and the di↵erent technologies that enable mobility with AR [Chatzopoulos et al., 2017]. While all of these di↵erent challenges and technologies are important to enable AR, none of these studies have focused onto the fundamental issues of security and privacy in AR or MR.1 A few others have pointed out the non-technical issues such as ethical consid- erations [Heimo et al., 2014] and value-sensitive design approaches [Friedman and Kahn Jr, 2000] that highlights the need to consider data ownership, privacy, secrecy, and integrity. Another recent study has focused on the potential perceptual and sen- sory threats that can arise from MR outputs such as photosensitive epilepsy and motion-induced blindness [Baldassi et al., 2018]. An earlier work emphasized the three aspects for protection in AR – input, data access, and output – over varying system complexity (from single to multiple applications, and, eventually, to multiple systems) [Roesner et al., 2014b]. We expand from these three aspects, and include in-

1This chapter uses material from our published survey on security and privacy approaches in MR [de Guzman et al., 2019d].

8 2. Literature Review 9

Table 2.1: Combined security and privacy properties and their corresponding threats

Model

Property Threat Microsoft’s PriS 2008 LINDDUN SDL 2006 2011

Integrity Tampering

! X Non-repudiation Repudiation X Availability Denial of Service X Authorization Elevation of Privilege XX Authentication Spoofing XX

Security-oriented Identification Anonymity X Confidentiality Disclosure of Information XXX

Privacy-oriented Anonymity & Pseudonymity Identifiability XX Unlinkability Linkability XX Unobservability & Undetectability Detectability XX Plausible Deniability Non-repudiation X Content Awareness Unawareness X ! Policy & Consent Compliance Non-compliance X teraction, and device protection as equally important aspects and include discussions for both. Lastly, we will identify the provided security and privacy properties relevant to these approaches as well as which data flow elements (i.e. data flow, process, stor- age, and/or data entity) are targeted, and present them in a summary table to show the distribution of strategies of protection among the properties. We list the di↵erent security and privacy properties in Section 2.1, and explain the categorization used to sort the di↵erent approaches in Section 2.2. For each category, we discuss the corresponding approaches and which properties they address from Sec- tion 2.3 to 2.7. Lastly, we conclude this review in Section 2.8 with a table summarizing the approaches and highlighting the remaining gaps.

2.1 General Security and Privacy Requirements for MR

We derive general security and privacy requirements from three models and combined them to have an over-arching model from which we can refer or qualify the di↵erent ap- proaches (both defensive and o↵ensive strategies) that will be discussed in this chapter. Table 2.1 lists the thirteen combined security and privacy properties from Microsoft’s Security Development Lifecycle (or SDL) [Howard and Lipner, 2006], PriS [Kalloniatis et al., 2008], and LINDDUN [Deng et al., 2011] and their corresponding threats. The SDL has been popularly used by industry to elicit security threat scenarios and use cases. LINDDUN follows the SDL but with emphasis on privacy threats, while Pris presents another privacy framework but with desirable overlaps with SDL and LIND- DUN. We choose these three frameworks from the literature due to the overlapping 2. Literature Review 10 properties they have defined as well as the consistency of their property definitions. In the resulting combined list of properties, the first six are security-oriented while the remaining are considered as privacy-oriented. The confidentiality property is the only common among the three models and is considered as both a security and privacy property. Obviously, since SDL focuses primarily on security, all of its associated prop- erties target security. PriS has a balanced privacy- and security-targeted properties. On the other hand, LINDUNN’s properties are privacy-targeted, and are categorized into hard privacy (from confidentiality down to plausible deniability) and soft pri- vacy (content awareness, and policy and consent compliance). See Appendix A for a discussion of each property. Interestingly, some security properties are conversely considered as privacy threats: for example, non-repudiation is the “threat” to plausible deniability. This highlights the di↵erences in priority that an organization, user, or stakeholder can put into these properties or requirements. Nonetheless, these properties are not necessarily mutually exclusive and can be desired at the same time. Specifically, the target element to be protected provides an additional dimension on how these properties can be applied together. Namely, these properties can be applied to the following elements: data entities, data flow, process, and data storage as shown in Figure 2.1. For every approach that will be discussed from Sections 2.3 to 2.7, we will identify which properties they are trying to address. Moreover, there are other ‘soft’ properties (i.e. reliability, and safety.) that we will liberally use in the discussions.

2.2 Categorizing the Threats and Approaches

Figure 2.1 presents an example of an MR environment and shows how data flows from the observable environment (Figure 2.1a), and through the MR device (Figure 2.1a) to process inputs and deliver experiences. The left-half of Figure 2.1a shows the ‘view’ of the mixed reality device which, in this example, is a see-through MR head- mounted device (HMD), i.e. an MR eye-wear. Within the view are the physical objects which are “seen” by the MR device as indicated by the solid arrows. The synthetic augmentations are shown in the diagram which are represented by the broken arrows. A cloud- or web-based support service is also shown through which multiple MR users can collaborate or share MR experiences, say, through a social network which supports MR such as Snapchat and Pokemon Go. Figure 2.1b shows the data flow diagram which follows the MR processing pipeline of detection transformation rendering. Physical entities (e.g. desk, cup, or ! ! keyboard) from the environment are captured or detected. After detection, the result- ing entities will be transformed or processed to deliver services accordingly. Depending 2. Literature Review 11

mixed reality pet mixed reality glasses

1 unread SMS 1 New Email Supporting Web/Cloud Services and/or Social Networks

4 5 1 2 reality glasses reality View of the mixed mixed the of View

⁰C ~90 ~2g sugar 3

(a) A mixed reality environment (left) with the supporting data services (right) as well as example points of protection as labelled: (1) contents of the display monitor, (2) access to stored data, (3) virtual display for content, e.g. information about the contents of a smart mug, (4) collaborating with other users, and (5) device access to the mixed reality eye-wear.

Rendering Entities Rendered for Rendering mixed reality boundary Entities Storage

Legend External Transformation Entities ⃤ Entity

◯ Process Detection ⃞ Data Storage Primary Entities Data Flow: bi-directional physical or virtual or Detection input output observable space machine space Results

(b) A data flow diagram that follows the generic mixed reality processing pipeline of detection transformation rendering, and shows the data flow elements entities that are used! as inputs and/or! outputs for each processing step as well as the4 storage. ⇤ Figure 2.1: (a) Mixed reality environment and (b) the data flow diagram 2. Literature Review 12

Intrinsic Protection Input Protection Environment Protection User Input Protection

Data Aggregation Data Access Data Processing Protection Data Storage Security and Privacy Protected Rendering Approaches to Output Protection Reliable Outputs Mixed Reality Protected Displays

Interaction Protected Sharing Protection Protected Collaboration

Authentication Device Protection Physical Protection Figure 2.2: A data-centric categorization of the various security and privacy work or approaches on mixed reality and related technologies. on the service or application, di↵erent transformations are used. Finally, the results of the transformation are delivered to the user by rendering them (such as the virtual pet bird or the cup-contents indicator) through the device’s output interfaces. Figure 2.2 shows the five categories (and their subcategories) to which we have distributed the various related work. The first three categories are directly mapped to the associated risks with the main steps of the processing pipeline – protecting how applications, during the transformation stage, access real-world input data gathered during detection, which may be sensitive, and generate reliable outputs during render- ing. Di↵erently, the interaction protection and device protection approaches cannot be mapped along the pipeline unlike the other three as the intended target elements of these two categories transcend the pipeline. Representative points of the five aspects of protection are labelled in Figure 2.1a. The presented categorization does not exclusively delineate the five aspects, and it is important to note that the approaches can fall under more than one category or subcategory. 2. Literature Review 13

Trusted App 1 App

1 unread SMS Other 1 Outputs 1 New Email Data ... App 2 App 3D 2 Display Data

App 3 App + Depth + Camera 3

Intermediary Protection Layer Protection Intermediary Data ... . Legend: Privileged Less-privileged App N App Other Other output data flow Sensors N input data flow Data Mixed Reality Platform two-way data flow

Figure 2.3: Shows a generic block diagram that inserts an intermediary protection layer between the applications and device resources. (Data flows to and from third-party applications are now limited or less-privileged as represented by the broken arrows.) 2.3 Input Protection

This category focuses on the challenges in ensuring security and privacy of data that is gathered and inputted to the MR platform which can contain sensitive information. For example, in Figure 2.1a, the MR eye-wear can capture the sensitive information on the user’s desktop screen (labelled 1) such as e-mails, chat logs, and so on. These are user-sensitive information that needs to be protected. Moreover, the device can also capture information that may not be sensitive to the user but may be sensitive to other entities such as bystanders. This is called bystander privacy. Aside from readily sensitive objects, the device may capture other objects in the environment that are seemingly benign (or subtle) and were not intended to be shared but can be used by adversaries to infer knowledge about the users/bystanders. We collectively call these inputs (i.e. objects, visual targets, and structural features) as passive, while those that are intentionally actuated and provided by users (e.g. gestures) as active inputs.

2.3.1 Passive Inputs: Targeted and Non-intended Latent Data

Aside from information disclosure (i.e. against confidentiality), the two other main threats to sensitive, personally-identifiable information during data capture are de- tectability and user content unawareness. Both stem from the fact that these MR systems collect a large amount of information, and among these are necessary and sensitive information alike. As more of these services becomes personalized, the sen- sitivity of these information increases. These threats are very evident with visual data. As MR requires the detection of targets, i.e. objects or contexts, in the real 2. Literature Review 14

1

2 Intermediary Intermediary Protection Layer Protection Input 3

Input Policy

Figure 2.4: Example strategies for input protection: 1) information reduction or partial sanitization, e.g. from RGB facial information to facial outline only; 2) complete sanitization or blocking; or 3) skeletal information instead of raw hand video capture. (The broken arrow indicates less privileged information flow.) environment, other non-necessary and latent but potentially sensitive information are captured as well.

Passive Input Protection Approaches The most common input protection ap- proaches usually involves the removal of latent and sensitive information from the input data stream. These approaches are generally called input sanitization techniques (see samples labelled 1 and 2 in Figure 2.4). These are usually implemented as an in- termediary layer between the sensor interfaces and the applications as shown in Figure 2.3. In general, this protection layer acts as an input access control mechanism aside from sanitization. These techniques can further be categorized according to the policy enforcement – whether intrinsic or extrinsic policies for protection are used.

1. Intrinsic input sanitization policies are usually user-defined: the user, device, or system itself imposes the protection policies that dictates the input sanitization that is applied. For example, the Darkly system [Jana et al., 2013b] for per- ceptual applications uses OpenCV in its intermediary input protection layer to implement a multi-level feature sanitation. The basis for the level or degree of sanitization are the user-defined policies. The users can impose di↵erent degrees of sensitivity permissions which a↵ects the amount of detail or features which can be provided to the applications, i.e. stricter policies mean less features are provided. For example, facial information can vary from showing facial feature contours (of eyes, nose, brows, mouth, and so on) to just the head contour de- pending on the user’s preferences. The user can actively control the level of information that is provided to the applications. Thus, aside from providing undetectability & unobservability, and content awareness to users, Darkly also provides a form of authorization through information access control, specifically a least privilege access control. 2. Literature Review 15

Context-based Sanitization. A context-based intrinsic sanitization frame- work [Zarepour et al., 2016] improves on the non-contextual policies of Darkly. It determines if there are sensitive objects in the captured images, like faces or car registration plates, and automatically implements sanitization. Sensitive features are sanitized by blurring them out, while images of sensitive locations (e.g. bathrooms) are deleted entirely. Similarly, PlaceAvoider [Templeman et al., 2014] also categorizes images as sensitive or not, depending on the fea- tures extracted from the image, but deletion is not automatic and still depends on the user. Despite the context-based nature of the sanitization, the policy that governs how to interpret the extracted contexts are still user-defined, thus, we consider both sanitization techniques as intrinsic. However, intrinsic policy en- forcement can be considered as self-policing which can potentially have a myopic view of privacy preferences of other users and objects. Furthermore, intrinsic policies can only protect the inputs that are explicitly identified in the policies. Video Sanitization. The previously discussed sanitization techniques were for generic capturing devices and were mostly sanitizing images and performs the sanitization after the image is stored. For MR platforms that require real- time video feeds, there is a need for live and on-the-fly sanitization of data. A privacy-sensitive visual monitoring [Szczuko, 2014] system was implemented by removing persons from a video surveillance feed and rendering 3D animated hu- manoids in place of the detected and visually-removed persons. Another privacy- aware live video analytic system called OpenFace-RTFace [Wang et al., 2017] focused on performing fast video sanitization by combining it with face recog- nition. OpenFace-RTFace system lies near the edge of the network, or on cloudlets. Similar approaches to edge or cloud-assisted information sanitization can potentially be utilized for MR.

2. Extrinsic input sanitization arises from the need for sensitive objects external to the user that are not considered by the intrinsic policies; thus, policies, e.g. privacy preferences, are received from the environment. An early implementa- tion [Truong et al., 2005] involved outright capture interference to prevent sen- sitive objects from being captured by unauthorized visual capturing devices. A camera-projector set up is used. The camera detects unauthorized visual capture devices, and the projector beams a directed light source to “blind” the unautho- rized device. This technique can be generalized as a form of a physical access control, or, specifically, a deterrent to physical or visual access. However, this implementation requires a dedicated set up for every sensitive space or object, and the light beams can be disruptive to regular operation. 2. Literature Review 16

Other approaches involves the use of existing channels or infrastructure for endorsing or communicating policies to capture devices, and to ensure that enforcement is less disruptive. The goal was to implement a fine-grained permission layer to “automatically” grant or deny access to contin- uous sensing or capture of any real-world object. A simple implementation on a privacy-aware see-through system [Hayashi et al., 2010] allowed other users de- tected to be blurred out or sanitized and shown as human icons only if the viewer is not their friend. However, this requires that users have access to the shared database and explicitly identify friends. Furthermore, enabling virtually anyone or, in this case, anything to specify policies opens new risks such as tampering, and malicious policies. To address authenticity issues in this so called world-driven access control, policies can be transmitted as digital certificates [Roesner et al., 2014c] using a public key infrastructure (PKI). PKI provides cryptographic protection to me- dia access and sanitization policy transmission. However, the use of a shared database requires that all possible users’ or sensitive objects’ privacy preferences have to be pushed to this shared database. Furthermore, it excludes or, unin- tentionally, leaves out users or objects that are not part of the database which defeats the purpose of a world-driven protection. I-pic [Aditya et al., 2016] removes the involvement of shared . Instead users endorse privacy choices via a peer-to-peer approach using Blue- tooth Low Energy (BLE) devices. However, I-pic is only a capture-or-no system. PrivacyCamera [Li et al., 2016a] is another peer-to-peer approach but is not limited to BLE. Also, it performs face blurring, instead of just capture-or-no, us- ing endorsed GPS information to determine if sensitive users are within camera view. On the other hand, Cardea [Shu et al., 2016] allows users to use hand gestures to endorse privacy choices. In Cardea, users can show their palms to signal protection while a peace-sign to signal no need for protection. These three approaches are targeted at bystander privacy protection, i.e. facial information sanitization. MarkIt [Raval et al., 2014] can provide protection to any user or object through the use of privacy markers and gestures (similar to Cardea)toen- dorse privacy preferences to cameras. It was integrated to Android’s camera subsystem to prevent applications from leaking private information [Raval et al., 2016] by sanitizing sensitive media. This is a step closer to automatic extrinsic input sanitization, but it requires visual markers for detecting sensitive objects. Furthermore, all these extrinsic approaches have only been targeted for visual 2. Literature Review 17

capture applications and not with AR- or MR-specific ones.

3. Structural Abstraction Other MR environments incorporates any surface or medium as a possible output display medium. For example, when a wall is used as a dis- play surface in an MR environment, the applications that use it can potentially capture the objects or other latent and/or sensitive information within the wall during the detection process. This specific case intersects very well with the input category because what is compromised here is the sensitive information that can be captured in trying to determine the possible surfaces for displaying. Applications that require such displays do not need to know what the con- tents on the wall are. It only has to know that there is a surface that can be used as a display. Protected output rendering protects the medium and, by extension, whatever is in the medium. Least privilege has been used in this context [Vilk et al., 2014]. For example, in a room-scale MR environment, only the skeletal information of the room, and the location and orientation of the detected sur- faces (or display devices) is made known to the applications that wish to display content on these display surfaces [Vilk et al., 2015]. This example of room-scale MR environments is usually used for multii-user collaborations.

2.3.2 Gestures and other Active User Inputs

Another essential input that needs to be protected are user gestures. We put a separate emphasis on this as gesture inputs entails a ‘direct’ command to the system, while the previous latent and user inputs do not necessarily invoke commands. Currently, the most widely adopted user input interfaces are tactile such as the keyboard, mouse, and touch interfaces. However, these current tactile inputs are limited by the dimension2 of space that they are interacting with and some MR devices now don’t have such interfaces. Also, these input interface types are prone to a more physical threat such as external inference or shoulder-surfing attacks. From which, threats such as spoofing, denial of service, or tampering may arise. Furthermore, there is a necessity for new user input interfaces to allow three- dimensional inputs. Early approaches used gloves [Dorfmuller-Ulhaas and Schmalstieg, 2001,Thomas and Piekarski, 2002] that can determine hand movements, but advances in computer vision have led to tether- and glove-free 3D interactions. Gesture inference from smart watch movement have also been explored particularly on finger-writing inference [Xu et al., 2015]. Vision-based natural user interfaces (NUI), such as the Leap

2Keyboards and other input pads can be considered as one-dimensional interfaces, while the mouse and the touch interfaces provides two-dimensional space interactions with limited third dimension using scroll, pan, and zoom capabilities. 2. Literature Review 18

Motion [Zhao and Seah, 2016] and Microsoft Xbox Kinect, have long been integrated with MR systems to allow users to interact with virtual objects beyond two dimensions. This allows the use of body movement or gestures as input channels and move away from keypad and keyboards. However, the use of visual capture to detect user gestures or using smart watch movement to detect keyboard strokes means that applications that require gesture inputs can inadvertently capture other sensitive inputs [Maiti et al., 2016]. Similar latent privacy risks such as detectability and content unawareness arise. Thus, as new ways of interacting in MR are being explored, security and privacy should also be maintained.

Protection through abstraction Prepose [Figueiredo et al., 2016] provides se- cure gesture detection and recognition as an intermediary layer (as in Figure 2.3). The Prepose core only sends gesture events to the applications, which e↵ectively removes the necessity for untrusted applications to have access to the raw input feed. Similar to Darkly,itprovidesleast privilege access control to applications, that is, only the necessary gesture event information is transmitted to the third party applications and not the raw gesture feed. Some work prior to Prepose implemented the similar idea of inserting a hierar- chical recognizer [Jana et al., 2013a] as an intermediary input protection layer. They inserted Recognizers to the Xbox Kinect to address input sanitization as well as to provide input access control. The Recognizer policy is user-defined, thus, an intrinsic approach. Similarly, the goal is to implement a least privilege approach to application access to inputs – applications are only given the least amount of information necessary to run. For example, a dance game in Xbox, e.g. Dance Central or Just Dance, only needs body skeletal (similar to sample labelled 3 in Figure 2.4) movement information, and it does not need facial information, thus, the dance games are only provided with the moving skeletal information and not the raw video feed of the user while playing. To handle multiple levels of input policies, the Recognizer implements a hierarchy of privileges in a tree structure, with the root having highest privilege, i.e. access to RGB and depth information, and the leaves having lesser privileges, i.e. access to skeletal information. Another recent work demonstrated how the visual processing and network access of a mobile AR/MR application can be siloed and have the visual information abstracted to protect it from malicious MR applications [Jensen et al., 2019]. SemaDroid [Xu and Zhu, 2015], on the other hand, is a device level protection approach. It is a privacy-aware sensor management framework that extends the current sensor management framework of Android and allows users to specify and control fine- grained permissions to applications accessing sensors. Just like the other abstraction 2. Literature Review 19 strategies, it is implemented as an intermediary protection layer that provides users application access control or authorization to sensors and sensor data. What di↵ers is its application of auditing and reporting of potential leakage and applying them to a privacy bargain. This allows users to ‘trade’ their data or privacy in exchange for services from the applications. There are a significant number of work on privacy bargain and the larger area of privacy economics, and we refer the readers to Acquisti’s work [Acquisti et al., 2016].

Remaining Challenges in Input Protection

Most of the approaches discussed so far are founded on the idea of least privilege. However, it requires that the intermediary layer, for example the Recognizers,must know what type of inputs or objects the di↵erent applications will require. Pre- pose addresses this for future gestures but not for future objects. For example, an MR painting application may require the detection of di↵erent types of brushes but the current recognizer does not know how to ‘see’ or detect the brushes. Extrinsic approaches like MarkIt try to address this by using markers to tell which objects can and cannot be seen. What seemingly arises now is the need to have a dynamic abstraction and/or sanitization of both pre-determined and future sensitive objects. Nevertheless, data-level techniques can be employed to further leverage abstractions as a protection measure not just for active inputs but also for passive and latent inputs.

2.4 Data Protection

Data from multiple sensors or sources are aggregated, processed, and stored usually in cloud servers and databases. Applications, then, need to access the data in order to deliver output in the form of user-consumable information or services. However, almost all widely used computing platforms allows applications to collect and store data individually (as shown in the access of supporting data services labelled 2 in Figure 2.1a) and the users may have no control over their data once it has been collected and stored by these applications. Majority of security and privacy risks have been raised concerning the access and use of user data by third party agents, particularly, on data gathered from wearable [Felt et al., 2012], mobile [Lee et al., 2015], and on- line activity [Ren et al., 2016]. Thus, MR systems faces even greater risks as richer information can be gathered using a wide variety of sensitive sensors–i.e. visual data from which spatial mapping information can be extracted to determine spatial features such as surfaces or physical objects on which augmentations are rendered over. For data protection, there are a lengthy list of properties that needs to be maintained such 2. Literature Review 20 as integrity, availability, confidentiality, unlinkability, anonymity & pseudonymity, and plausible deniability among others. Generally, the aim of privacy-preservation is to allow services or third-party ap- plications to learn without leaking unnecessary and/or personally identifiable infor- mation. Usually, they use privacy definitions such as k-anonymity [Samarati and Sweeney, 1998, Samarati, 2001], and di↵erential privacy [Dwork et al., 2014, McSh- erry and Talwar, 2007]. However, most of the measures proposed and discussed in the wider literature were implemented on generic systems and not necessarily on MR systems. Furthermore, most of the approaches we will discuss have been aimed at traditional visual media, i.e. images and video. While MR still relies heavily on visual data, extracted 3D spatial data is now primarily utilized to represent spatial under- standing which arguably poses more risks. Thus, we present an exposition of data protection approaches on the following data flow aspects: (1) data aggregation, (2) privacy-preserving data processing, and (3) protected data storage and access.

2.4.1 Protected Data Collection and Aggregation

Essentially, data collection also falls under the input category but we focus on data after sensing and how systems, applications, and services handle data afterwards. Pro- tected data collection and aggregation approaches are also implemented as an inter- mediate layer as in Figure 2.3. Usually, data manipulation or similar mechanisms are run on this intermediary layer to provide a privacy guarantee, e.g. di↵erential privacy or k-anonymity, among released data. RAPPOR or randomized response [Erlingsson et al., 2014] is an example of a di↵erentially-private data collection and aggregation algorithm. It is primarily applied for privacy-preserving crowd-sourced information such as those collected by Google for their Maps services. Privacy-preserving data ag- gregation (PDA) has also been adopted for information collection systems [He et al., 2007, He et al., 2011] with multiple data collection or sensor points, such as wireless sensor networks or body area networks. Overall, the goal of privacy-preserving data collection and aggregation is to get aggregate or information without di- vulging individual information; thus providing anonymity, unlinkability, and plausible deniability between the aggregate information (as well as its derivative processes and further resulting information) and the data source entity, i.e. a user.

2.4.2 Protected Data Processing

After collection, most services will have to process the data immediately to deliver outputs in real-time. Thus, similar to data collection, the same privacy threats of information disclosure, linkability, detectability, and identifiability holds. During pro- 2. Literature Review 21

Trusted 1.4 1.0 1 App 1 App 1.1

Other Other 1 1.2 Outputs Data Other Party Other ... Party App 2 App Data

3D 1.3 2 Display 2 Secure Data Multi-party Computation Applet 2

App 3 App + Depth + Camera Applet 3 3 Data ... . Personal Legend: Applet N Privileged Less-privileged

Data Store . App N App Other Other

Sensors output data flow N input data flow Mixed Reality Platform Data two-way data flow

Figure 2.5: Generic block diagrams of two example data protection approaches: 1) cryptographic technique using secure multi-party computation where two or more parties exchange secrets (1.1 and 1.3) to extract combined knowledge (1.2 and 1.4) without the need for divulging or decrypting each others data share; and 2) personal data stores with “trusted” applets. cessing, third-party applications or services can directly access user data which may contain sensitive or personal information if no protection measures are implemented. The subsequent exposition of protection approaches presents a collection of MR-related work particularly on privacy-preserving and secure image and video processing.

1. Encryption-based techniques. Homomorphic encryption (HE) allows queries or computations over encrypted data. In visual data processing, this has been used for image feature extraction and matching for various uses such as image search, and object detection. He-Sift [Hsu et al., 2011] performs bit-reversing and local encryption to the raw image before feature description using SIFT.3 The goal was to make dominant features, which can be used for context inference, reces- sive. As a result, feature extraction, description, and matching are all performed in the encrypted domain. A major drawback with near full-homomorphism is the very slow computation time. SecSift [Qin et al., 2014, Qin et al., 2016] improves on the computation time of He-Sift by instead using a somewhat ho- momorphic encryption, i.e. order-preserving encryption. They split or distribute the SIFT feature computation tasks among a set of “independent, co-operative cloud servers to keep the outsourced computation procedures as simple as possi- ble and avoid utilizing homomorphic encryption.” Other improvements utilized big data computation techniques to expedite secure image processing such as

3SIFT or Scale-invariant Feature Transform is a popular image feature extraction and description algorithm [Lowe, 2004] 2. Literature Review 22

the use of a combination of MapReduce and ciphertext-policy attribute-based encryption [Zhang et al., 2014], or the use of Google’s Encrypted BigQuery Client for Paillier HE computations [Ziad et al., 2016]. However, this methods are algorithm-specific; that is, every algorithm that we desire to be privacy- preserving using homomorphism has to be re-engineered.

2. Secret Sharing or Secure Multi-party Computation. Data can be split among untrusted parties assuming that information can only be inferred when the dis- tributed parts are combined [Yao, 1986, Huang et al., 2011]. Secure multi-party computation (SMC) or secret sharing allows computation of data from two or more sources without necessarily knowing about the actual data each source has. The diagram labelled 1 in Figure 2.5 shows a possible SMC setup. A privacy-preserving photo-sharing service has been designed using a two-party secret sharing by “by splitting a photo into a public part, which contains most of the volume (in bytes) of the original, and a secret part which contains most of the original’s information” [Ra et al., 2013]. While a virtual cloth try-on service used secret sharing and secure two-party computation [Sekhavat, 2017]. The an- thropometric information, i.e. body measurements, of the user is split between the user’s and the server, and are both encrypted. The server has a database of clothing information. The server can then compute a 3D model of the user wearing the piece of clothing by combining the anthropometric in- formation and the clothing information to generate an encrypted output which is sent to the user device. The user device decrypts the result and combines it with the local secret to reveal the 3D model of the user “wearing” the piece of clothing.

3. Data Manipulation, Perturbations, and Transformation. Other MR-related privacy- preserving techniques have focused on facial de-identification using image ma- nipulation to achieve k-anonymity for providing identity privacy [Newton et al., 2005,Gross et al., 2006,Gross et al., 2008]. Succeeding face de-identification work has focused on balancing utility and privacy [Du et al., 2014]. While much recent work have leveraged generative adversarial networks for deceiving a potentially adversarial data collector, to de-identify faces but ensuring high demographic utility of the resulting de-identified face [Brkic et al., 2017, Wu et al., 2018]. The same manipulation can be extended over 3D spatial data that is utilized in MR systems. Instead of providing complete 3D spatial data, a sanitized or ‘salted’ virtual reconstruction of the physical space can be provided to third- party applications. For example, instead of showing the 3D capture of a table in the scene with all 3D data of the objects on the table, a generalized horizontal 2. Literature Review 23

platform or surface can be provided. The potentially sensitive objects on the table are thus kept confidential. A tunable parameter provides the balance be- tween sanitization and utility. Using this tunability, similar notions of privacy guarantee to di↵erential privacy and k-anonymity can be provided. However, this approach is yet to be realized but virtual reconstruction has been used to address delayed alignment issues in AR [Waegel, 2014]. This approach can work well with other detection and rendering strategies of sanitization and abstraction as well as in privacy-centred collaborative interactions (Section 2.6.1). It also opens the possibility to have an active defence strategy where ‘salted’ recon- structions are o↵ered as a honeypot to adversaries. 3D Data Transformation. A recent work demonstrated how original scenes from 3D point cloud data can be revealed usinig [Pittaluga et al., 2019]. As a counter-measure, a concurrent work designed privacy-preserving method of pose estimation to counter the scene revelation [Speciale et al., 2019]: 3D “line” clouds are used instead of 3D point clouds during pose estimation to obfuscate 3D structural information; however, this approach only addresses the pose estimation functionality and does not present the viability for surface or object detection which is necessary for a virtual object to be rendered or “anchored” onto.

Overall these privacy-preserving data processing techniques aim to provide the privacy properties of unlinkability, unobservability, and plausible deniability between the process (as well as its results) and the data source. Furthermore, the encryption- and secret sharing-based techniques further provide security properties of integrity and authorization as only the authorized parties can process the data while ensuring confidentiality through homomorphism. All these techniques complement each other and may be used simultaneously. Thus, any or all of these techniques can be applied to MR and it will only be a matter of whether the technique is appropriate for the amount of data and level of sensitivity of data that is tackled in MR environments.

2.4.3 Protecting Data Storage

After collection and aggregation, applications store user data on separate databases in which users have minimal or no control over. Privacy concerns on how these ap- plications use user data beyond the expected utility to the user have been posed [Ren et al., 2016, Lee et al., 2015, Felt et al., 2012]. Aside from these privacy threats, there are inherent security threats such as tampering, unauthorized access, and spoofing.To provide security against such threats, the Advanced Encryption Standard (or AES) has been specified as the industry standard. 2. Literature Review 24

When trustworthiness of third-party applications and services are not ensured, protected data storage solutions, such as personal data stores (PDS), with managed application access permission control is necessary. PDSs allows the users to have control over their data, and which applications have access to it. Figure 2.5 shows a generic block diagram (labelled 2) of how a PDS protects the user data by running it in a protected sand-box machine that can monitor the data that is provided to the applications. Usually, applet versions of the applications run within the sand-box. Various PDS implementations have been proposed such as the personal data vaults (PDV) [Mun et al., 2010], OpenPDS [de Montjoye et al., 2014], and the Databox [Crabtree et al., 2016]. Other generic protection approaches focused on encrypted fragmented data storage [Ciriani et al., 2010] or decentralized storage using [Zyskind et al., 2015]. As a result, PDS provides accountability and subsequently the non-repudiation security property as applications cannot deny that they have accessed the stored data. Privacy-preserving aggregation can also be implemented within the PDS to provide privacy properties of anonymity, unlinkability and plausible deniability between the released aggregate data and the user as a data source. For example, OpenPDS releases private aggregates or answers through its SafeAnswers interface.

Remaining Challenges in Data Protection

There are necessary modifications that applications have to partake in order to imple- ment these data protection strategies. Aside from implementation complexity addi- tional resources may be necessary such as the inherent need of memory, and compute capacity when employing encryption. There are attempts to eliminate the necessity of code modification, such as in GUPT [Mohan et al., 2012] which focuses on the sampling and aggregation process to ensure distribution of the di↵erential privacy budget and eliminating the need for costly encryption. Also, combining these tech- niques with protected sensor management and data storage to provide confidentiality through sanitization and authorized access control is promising.

2.5 Output Protection

After processing the data, applications send outputs to the mixed reality device to be displayed or rendered. However, an untrusted application who has access to outputs other than those needed for its functionality can potentially modify those outputs making them unreliable. For example, in the smart information hovering over the cup in Figure 2.1a, malicious applications can modify the sugar level information. Other adversarial output attacks include clickjacking which deceives users to ‘click- 2. Literature Review 25 ing’ on sensitive elements through transparent or misleading interfaces [Roesner et al., 2014b], and physiological attacks such as inducing epileptic seizures through a visual trigger [Baldassi et al., 2018]. Furthermore, when one application’s output is another application’s input, this necessitates multiple application access to an output object. For output protection, the integrity, non-repudiation, availability, and policy compli- ance as well as reliability properties has to be maintained. In general, there are three possible types of outputs in MR systems: real-world an- chored outputs, non-anchored outputs, and outputs of external displays. The first two types are both augmented outputs. The last type refers to outputs of other external displays which can be utilized by MR systems, and vice versa. Protecting these out- puts is of paramount importance aside from ensuring input and data protection. As a result, there are three aspects when it comes to output protection: output control, protected rendering, and protecting external displays.

2.5.1 Output Reliability and User Safety

Current MR systems have loose output access control. As a result, adversaries can potentially tamper or spoof outputs that can compromise user safety. Output control policies can be used as a guiding framework on how MR devices will handle outputs from third-party applications. This includes the management of rendering priority which could be in terms of synthetic object transparency, arrangement, occlusion, and other possible spatial attributes to combat attacks such as clickjacking. An output ac- cess control framework [Lebeck et al., 2016] with an object-level granularity have been proposed to make output handling enforcement easier. It can be implemented as an intermediary layer, as in Figure 2.3, and follows a set of output policies. In a follow up work, they presented a design framework [Lebeck et al., 2017] for output policy spec- ification and enforcement which combined output policies from Microsoft’s HoloLens Developer guidelines, and the U.S. Department of Transportation’s National High- way Trac Safety Administration (NHTSA) for user safety in automobile-installed AR.4 They designed a prototype platform called Arya that ensures policy compliance, integrity, non-repudiation, availability, and authorization; that is, correct outputs are always available, an output’s originator cannot be denied, and only authorized applica- tions can produce such outputs. Succeeding work builds up on Arya’s weakness when it comes to dynamic and complex environments especially when various, di↵erently- sourced policies are required [Ahn et al., 2018]; they utilised to determine the optimal policy enforcement assisted by fog-based servers. This approach

4Here are two example descriptions of the policies: (1) “Don’t obscure pedestrians or road signs” is inspired from the NHTSA; (2)“Don’t allow AR objects to occlude other AR objects” is inspired from the HoloLens’s guidelines. 2. Literature Review 26 reinforces the properties of integrity and availability in complex and dynamic environ- ments, and further provides confidentiality by doing processing at the edge instead of the cloud.

2.5.2 Protecting Output Displays

Output displays are vulnerable to physical inference threats or visual channel exploits such as shoulder-surfing attacks. These are the same threats to user inputs (Sec- tion 2.3.2) especially when the input and output interfaces are on the same medium or are integrated together such as on touch screens. To provide secrecy and privacy on certain sensitive contexts which requires output confidentiality (e.g. ATM bank transactions), MR can be leveraged. This time, MR capabilities are leveraged for output defense strategies.

1. Content hiding. EyeGuide [Eaddy et al., 2004] used a near-eye HMD to provide a navigation service that delivers secret and private navigation information aug- mented on a public map display. Because the EyeGuide display is practically secret, shoulder surfing is prevented. Other approaches involve the actual hiding of content. For example, VR- Codes [Woo et al., 2012] takes advantage of rolling shutter to hide codes from human eyes but can be detected by cameras at a specific frame rate. Shutter glasses can also be used to similarly hide displayed content [Yerazunis and Car- bone, 2002]. A similar approach has been used to hide AR tags in video [Lin et al., 2017]. This type of technique can hide content from human attackers but is still vulnerable to machine-aided inference or capture.

2. Visual cryptography. Secret display approaches have also been used in visual cryptographic techniques such as visual secret sharing (VSS) schemes. VSS allows the ‘mechanical’ decryption of secrets by overlaying the visual cipher with the visual key. However, classical VSS was aimed at printed content [Chang et al., 2010] and requires strict alignment which is dicult in AR and MR displays, particularly handhelds and HMDs. The VSS technique can be relaxed to use code-based secret sharing, e.g. barcodes, QR codes, and 2D barcodes. The ciphers are publicly viewable while the key is kept secret. An AR-device can then be used to read the cipher and augment the decrypted content over the cipher. This type of visual cryptography have been applied to both print [Simkin et al., 2014] and electronic displays [Lantz et al., 2015, Andrabi et al., 2015]. Electronic displays are, however, prone to attacks from malicious applica- tions which has access to the display. One of these possible attacks is cipher 2. Literature Review 27

rearrangement for multiple ciphers. To prevent such in untrusted electronic dis- plays, a visual ordinal cue [Fang and Chang, 2010] can be combined with the ciphers to provide the users immediate signal if ciphers have been rearranged.

These techniques can also be used to provide protection for sensitive content on displays during input sensing. Instead of providing privacy protection through post- capture sanitization, the captured ciphers will remain secure as long as the secret shares or keys are kept secure. Thus, even if the ciphers are captured during input sensing, the content stays secure. In general, these visual cryptography and content-hiding methods provide visual access control, i.e. authorization, and confidentiality in shared or public resources. More examples of this technique are discussed in Section 2.7.2 on device protection.

Remaining Challenges in Output Protection

Similar to input protection, output protection strategies can use the same abstraction approach applied as an intermediary access control layer (see Figure 2.3) between applications and output interfaces or rendering resources. To enforce these output abstractions, a reference policy framework has to exist through which the abstraction is applied. As a result, perhaps, the biggest challenge is the specification and enforcement of these policies particularly who will specify them and how will they be e↵ectively enforced. In the output side, risks and dangers are more imminent because adversaries are about to actuate or have already actuated the malicious response or output. Thus, these access control strategies and policies are necessary for output protection. Malicious inference or capture of outputs present the same threats as input infer- ence. Section 2.7.2 will focus on device-level protection approaches to output interfaces and displays.

2.6 Protecting User Interactions

In contrast to current widely adapted technologies like computers and smart phones, MR can enable entirely new and di↵erent ways of interacting with the world, with machines, and with other users. Figure 2.6a shows a screenshot from a demonstration video from Microsoft Research on their Holoportation project which allows virtual teleportation in real-time. Consequently, one of the key (yet latent) expectations with these kinds of services and functionalities is how users can have shared space experiences with assurances of security and privacy. Thus, we expand the coverage of protection to ensure protected sharing and collaborations in MR. Similar to data protection, there is a number of properties that is necessary in interaction protection 2. Literature Review 28

(a) Holoportation by Microsoft Research: an example shared space service. The person sitting (left) is “holoported” to the room with the person standing (right) using MR technology. Screenshot from https://youtu.be/7d59O6cfaM0.

Shared Space

User 1 User 2 Private Space

Private Space

(b) A simplified virtual (c) A possible separation in (d) A collaborative space with shared space diagram the underlying physical space a shared space and private which creates boundaries spaces. between users, and devices.

Figure 2.6: Shared Spaces namely non-repudiation, authorization, authentication, identifiability, and policy & consent compliance.

Threats during user interactions Concerns on the boundaries between physical and virtual spaces (Figure 2.1b) in MR and on the directionality of these boundaries have been raised [Benford et al., 1998]. The directionality can influence the balance of power, mutuality and privacy between users in shared spaces. For example, the boundary (labelled 1) in Figure 2.6c allows User 2 to receive full information (solid arrow labelled 2) from User 1 while User 1 receives partial information (broken arrow labelled 3) from User 2. The boundary enables an ‘imbalance of power’ which can 2. Literature Review 29 have potential privacy and ethical e↵ects on the users. For example, early observation work shows territoriality in collaborative tabletop workspaces [Scott et al., 2004]. Thus, the primary threat during user interactions are users themselves. Specifically, an adversarial user can potentially tamper, spoof, or repudiate malicious actions during these interactions. As a result, legitimate users may su↵er denial of service and may be unaware that their personal data may have been captured and then leaked.

2.6.1 Protecting Collaborative Interactions

Most of the approaches in this subcategory ensure the privacy properties of content awareness and policy and consent compliance.

1. Enabling user-originated policies. Emmie (Environmental Management for Multi- user Information Environments) [Butz et al., 1999] allows users to specify the privacy of certain information or objects through privacy lamps and vampire mir- rors [Butz et al., 1998]; the privacy lamps are virtual lamps that ‘emit’ a light cone in which users can put objects within the light cone to mark these objects as private; while vampire mirrors are used to determine privacy of objects by showing full reflections of public objects while private objects are either invisible or transparent. However, this measure only protect virtual or synthetic con- tent and does not provide protection to real-world objects. Similar user-enabled privacy has been demonstrated in RoomPlanner using hand gestures to en- force privacy, through private spaces and outputs, in a digital tabletop [Wu and Balakrishnan, 2003]. Kinected Conference [DeVincenzi et al., 2011] allows the participants to use gestures to impose a temporary private session during a video conference. Aside from that, they implemented synthetic focusing using Microsoft Kinect’s depth sensing capability where other participants are blurred in order to direct focus on a participant who is speaking, and augmented graphics hovering above the user’s heads to show their information such as name, shared documents, and speaking time. The augmented graphics serve as feed-through information to deliver signals that would have been available in a shared physical space but is not readily cross-conveyed between remote physical spaces.

2. Multi-user coordination policies Early work on mediating conflicts in digital workspaces explored the use of multi-user coordination policies [Morris et al., 2006a]. For example, to increase group awareness, they employed cooperative gestures which requires gesture contributions from more than one user to en- force a single command, such as clearing the entire screen when users do the erase gesture together [Morris et al., 2006b]. 2. Literature Review 30

3. Feed-through signalling. SecSpace [Reilly et al., 2014] explores a feed-through mechanism to allow a more natural approach to user management of privacy in a collaborative MR environment. Users in SecSpace are provided feed-through information that would allow them to negotiate their privacy preferences. Fig- ure 2.6c shows an example situation in which User n enters the shared space (labelled 4) on the same physical space as User 2 which triggers an alarm (la- belled 5) or notification for User 1. The notification serves as a feed-through signalling that crosses over the MR boundary. By informing participants of such information, an imbalance of power can be rebalanced through negotiations. Non-AR feed-through signalling have also been used in a non-shared space context like the candid interactions [Ens et al., 2015] which uses wearable bands that lights up in di↵erent colors depending on the smart-phone activity of the user, or other wearable icons that change shape, again, depending on which ap- plication the icon is associated to. However, the pervasive nature of these feed- through mechanisms can still pose security and privacy risks, thus, these mech- anisms should be regulated and properly managed. In addition, the necessary infrastructure, especially for SecSpace, to enable this pervasive feed-through system may be a barrier to wider adoption. A careful balance between the users’ privacy in a shared space and the utility of the space as a communication medium is ought to be sought.

4. Private and public space interactions. Competitive gaming demands secrecy and privacy in order to make strategies while performing other tasks in a shared environment. Thus, it is a very apt use case for implementing user protection in a shared space. Private Interaction Panels (or PIPs) demonstrates a gaming console functionality where a region that is defined within the PIP panel serves as a private region [Szalav´ari et al., 1998]. On the other hand, TouchSpace implements a larger room-scale MR game where users can switch between see- through AR and full VR [Cheok et al., 2002]. Moreover, Emmie’s privacy lamps and mirrors also act as private spaces. BragFish [Xu et al., 2008] implements a similar idea on privacy to that of the PIP with the use of a handheld AR device which has a camera that is used to “read” the markers associated to a certain game setting, and it frees the user from the bulky HMDs as in PIP and TouchSpace. The Gizmondo handheld device has also been used in another room-scale AR game [Mulloni et al., 2008]. Similarly, camera phones have been used as a handheld AR device in a table top marker-based setup for collaborative gaming [Henrysson et al., 2005].

Overall, the aim of these protected collaborative interactions in MR is to provide 2. Literature Review 31 confidentiality for relevant information that users may deem sensitive in a shared context, while a few also provided non-repudiation so that an action that a↵ects other users’ activity cannot be denied by the actor or subsequently used to identify them. Other approaches, such as cooperative gestures, also leads to ensuring availability of the shared task. Perhaps, an important aspect that has risen is the utilization of di↵erent portions of space with di↵erent functions during interactions, namely a public portion for shared objects or activities, and a private portion for user-sensitive objects or tasks. However, shared space platforms assumes that user can freely share and exchange content or information through an exisiting interaction channel. In the next section, we focus on how to protect the sharing channel in an MR context.

2.6.2 Protecting Sharing Initialization

All those shared space systems that were previously discussed rely on a unified ar- chitecture to enable interactions and sharing on the same channel. However, there might be cases that sharing is necessary but no pre-exisiting channel exists to support sharing. Thus, a sharing channel needs to be initialized. The same threats of spoofing and unauthorized access from Personal Area Networks, such as in ZigBee or Bluetooth PAN arises. Similar techniques to out-of-band channels can be used to achieve a secure channel initialization. LooksGoodToMe is an authentication protocol for device-to-device sharing [Gaebel et al., 2016]. It leverages on the camera/s and wireless capabilities of AR HMDs. Specifically, it uses the combination of distance information through wireless localization and facial recognition information to cross-authenticate users by simply looking at each other and initiate sharing. HoloPair, on the other hand, avoids the use of wireless localization, which may be unavailable and inecient in devices, and instead utilizes exchange of visual cues between users to confirm the shared secret [Sluganovic et al., 2017]. Both uses the visual channel as an out-of-band channel.

Remaining Challenges in Sharing and Interactions

Perhaps, the most apparent challenge is the varying use cases in which users interact or share. A recent work exposes the various user concerns, such as technological misuse and access negotiation, that can arise on a multi-user MR environment [Lebeck et al., 2018]. Depending on the context or situation, privacy and security concerns, as well as the degree of concern, can vary. For example, feed-through signalling may be necessary in classroom scenarios to inform teachers when students enter and leave the classroom; however, there would also be occasions that it could be perceived to be too invasive or 2. Literature Review 32 counter-intuitive, for example during military negotiations in the field. Thus, there is a great deal of subjectivity to determine what is the most e↵ective protection mechanism during sharing or interactions. Perhaps, before everything else, we should ask first: “Who or what are we protecting?”.

2.7 Device Protection

This last category focuses on the actual physical MR device and its input and output interfaces. This implicitly protects data that is used in the above four aspects by ensuring device-level protection. Authentication, authorization, and identifiability are among the most important properties for device protection.

2.7.1 Protecting Device Access

The primary threats to device access are identity spoofing and unauthorized access. All approaches described below aim to provide protection against such threats. Some ap- proaches, particularly those using physiological or biometric information, also ensures identifiability of users in addition to authorization and authentication.

Novel Authentication Strategies Device access control ensures that authorized users are provided access while unauthorized ones are barred. Password still remains as the most utilized method for authentication [Dickinson, 2016]. To enhance protection, multi-factor authentication (MFA) is now being adopted, which uses two or more independent methods for authentication. It usually involves the use of the traditional password method coupled with, say, a dynamic key that can be sent to the user via SMS, email, or voice call. The two-factor variant has been recommended as a security enhancement, particularly for on-line services like E-mail, cloud storage, e-commerce, banking, and social networks. Aside from passwords are PIN- and pattern-based methods that are popular as mobile device authentication methods. A recent study [George et al., 2017] evaluated the usability and security of these established pin- and pattern-based authentication methods in virtual interfaces and showed comparable results in terms of execution time compared to the original non-virtual interface. The following sections look at other novel authentication methods that leverages existing and potential capabilities of MR devices.

1. Gesture- and Active Physiological-based Authentication. Other possible gestures that have been utilised for user identification for authentication includes finger and hand gestures using a 3D camera-based motion controller [Aslan et al., 2014], 2. Literature Review 33

combination of head and blinking gestures triggered by visual cues [Rogers et al., 2015], head-movements triggered by an auditory cue [Li et al., 2016b], and active physiological signals, such as breathing [Chauhan et al., 2017].

2. Passive Physiological-based Authentication. Passive methods include physiolog- ical or biometric signals such as the physiological-signal-based key agreement or (PSKA) [Venkatasubramanian et al., 2010] using PPG features locked in a fuzzy- vault for secure inter-sensor for body area networks or BAN. While SkullConduct [Schneegass et al., 2016] uses the bone conduction capa- bility of the for user identification and authentication. All these novel methods show promise on how latent gestures, physiological signals, and device capabilities can be leveraged for user identification and authentication.

3. Multi-modal and/or Biometric Authentication combines two or more modes in a singular method. One multi-modal method combines facial, iris, and periocular information for user authentication [Raja et al., 2015]. GazeTouchPass com- bines gaze gestures and touch keys as a singular pass-key for smart phones to counter shoulder-surfing attacks on touch-based pass keys [Khamis et al., 2016]. These types of authentication methods can readily be applied to MR devices that has gaze tracking and other near-eye sensors.

2.7.2 Protecting Physical Interfaces

As discussed in Section 2.3.2 and Section 2.5.2, MR interfaces are vulnerable from malicious inference which leads to disclosure of input activity, and/or output display information. Currently available personal AR or MR see-through HMDs project con- tent through lenses. The displayed content on the see-through lenses can leak private information and be observed externally. Visual capture devices can be used to capture and extract information from the display leakage. External input interfaces su↵er from the same inference and side-channel attacks such as shoulder-surfing. There are optical and visual strategies that can be used to provide interface and activity confidentiality and unobservability. Figure 2.7 shows example strategies of optical blocking and visual cryptography.

1. Optical strategies have been proposed, such as the use of polarization on the outer layer (as in Figure 2.7 labelled 1), use of narrowband illumination, or a combination of the two to maximize display transmission while minimizing leak- age [Kohno et al., 2016]. Other capture protection strategies that have been tested on non-MR devices which allows objects to inherently or actively protect themselves. For example, the TaPS widgets use optical reflective properties of 2. Literature Review 34

polarizer 1 Trusted App 1 App 1.1

Other Other 1 Outputs Data ... 1.0 App 2 App

3D 2 Display Data

2.0 App 3 App + Depth + Camera 3 2.2 2.1 Data ... . 2 App N App Other Other

Sensors N optical display Mixed Reality Platform element Data

Figure 2.7: Sample interface and display protection strategies: 1) inserting a polarizer to prevent or block display leakage; and 2) visual cryptography, e.g. using secret augmentations (2.2) through decryption (2.1) of encrypted public interfaces (2.0). All elements to the left of the optical display element are considered vulnerable to external inference or capture.

a scattering foil to only show content at a certain viewing angle [M¨ollers and Borchers, 2011]. Active camouflaging techniques have also been used, particu- larly on mobile phones, which allows the screen to blend with its surrounding just like a chameleon [Pearson et al., 2017]. Both TaPS widgets and the chameleon- inspired camouflaging are physically hiding sensitive objects or information from visual capture. The content-hiding methods discussed in Section 2.5.2 to hide outputs are also optical strategies.

2. Visual cryptography and scrambling techniques for display protection have also been discussed in Section 2.5.2. The same can also be used for protecting sensi- tive input interfaces. EyeDecrypt [Forte et al., 2014] uses visual cryptography technique to protect input/output interfaces as shown in Figure 2.7 labelled 2. The publicly viewable input interface is encrypted (Figure 2.7 step 2.0), and the secret key is kept or known by the user. Through visual decryption (step 2.1), only the user can see the actual input interface through the AR display (step 2.2). Another AR-based approach secretly scrambles keyboard keys to hide typing activity from external inference [Maiti et al., 2017]. However, these techniques greatly su↵er from visual alignment issues, i.e. aligning the physical interface with the AR rendered objects. 2. Literature Review 35

Remaining Challenges in Device Interface Protection

Despite the use-cases with visual cryptography using AR or MR displays, the usability of this technique is still confined to specific sensitive use cases due to the requirements of alignment. Furthermore, this type of protection is only applicable to secrets that are pre-determined, specifically, information or activities that are known to be sen- sitive, such as password input or ATM PIN input. These techniques are helpful in providing security and privacy during such activities in shared or public space due to the secrecy provided by the near-eye displays which can perform the decryption and visual augmentation. Evidently, it only protects the output or displayed content of external displays but not the actual content which are displayed through the AR or MR device. We have presented both defensive and o↵ensive, as well as active and passive, strategies to device protection. Nonetheless, there are still numerous e↵orts on im- proving the input and output interfaces for these devices and it is opportune to consider in parallel the security and privacy implications of these new interfaces.

2.8 Summary of Security and Privacy Approaches in MR

Table 2.2 summarizes all of the approaches that have been discussed. The approaches are compared based on which security and privacy properties they address and to which elements these properties are provided for. We use the same symbols used in Figure 2.1b, namely for data flow, for process, for storage, and for entity. ⌃ ⇤ 4 In the next chapter (Chapter 4), we proceed with demonstrating the privacy leakage in MR data, and present a heuristic measure for spatial privacy risk before we present and evaluate proposed protection measures in Chapter 5 and Chapter 6.

Generalizations and gaps. Unsurprisingly, the majority of the approaches are targeting confidentiality,i.e.preventinginformation disclosure. Furthermore, the categorization roughly localised the targeted properties. It also revealed that some properties are rather specific to certain approaches, e.g. authentication is of course targeted by authentication approaches. Nonetheless, trends and clustering of target properties among the categories are evident. The first two major categories roughly target properties that are more privacy leaning. On the other hand, the last three categories were fairly balanced. Moreover, after confidentiality, the next most targeted properties are authorization, undetectability & unobservability, and policy & consent compliance. Consequently, it is evident that MR-targeted protection approaches, particularly 2. Literature Review 36

⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ Compliance

⌃ ⌃ ⌃ ⌃ ⌃ Awareness

4 4 4 4 4 4 4 4

Deniability Undetectability 4 4

⌃ ⇤ ⇤ ⌃ ⇤ ⌃ ⇤ ⇤ ⇤ ⌃ ⌃ ⇤ Unlinkability ⌃ 4 ⌃ ⌃ 4

⇤ ⇤ Anonymity ⌃ 4 ⌃ 44 44 44 ⌃ 4

⇤ ⇤ ⇤ ⇤ ⇤ Confidentiality 4 4 4 4 4 4 4 4 4 4 4 4 4 4

⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃

Identification Authentication

. The entity can be the data itself, the user as the Authorization of an adversary as a security provision)

entity Availability

4 Non-Repudiation non-MR context. ⌃ ⌃

identifiability Integrity ‡ ,storage and/or ‡ † ⇤ proto-MR context, or a ‡ † † † , process ‡ † ‡ [Roesner et al., [Szczuko, 2014] MR context, a ‡ ‡ [Zarepour et al., 2016] † † [Truong et al., 2005] data flow , [Hayashi et al., 2010] [Wang et al., 2017] ⌃ [Li et al., 2016a] [Templeman et al., 2014] [Jana et al., 2013a] [Xu and Zhu, 2015] [Figueiredo et al., 2016] [de Guzman et al., 2019c] ‡ [Raval et al., 2014, Raval et al., 2016] [Shu et al., 2016] [Jana et al., 2013b] ‡ originator of the data, or the adversary (say, [Aditya et al., 2016] SemaDroid Capture-resistant spaces MarkIt PrePose Recognizers 3D humanoids replace humans OpenFace/RTFace PlaceAvoider SafeMR I-pic PrivacyCamera Cardea Context-based sanitization See-through vision World-driven access control 2014c] Darkly Approach Input Protection Approaches The approaches have been applied to either an Table 2.2: Summary of MR approaches that have been discussed, and which security and privacy properties they provide to 2.3.2 2.3.1 2.3.1 2.3.2 2.3.2 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 2.3.1 Ch 6 which data flow element: Section

2. Literature Review 37 Compliance

⌃ ⌃ ⌃ ⇤

Awareness Deniability ⌃ ⌃ ⇤

⇤ ⇤ ⇤ ⇤ Undetectability

⌃ ⌃ ⌃ ⇤ Unlinkability

⇤ ⇤ ⇤ ⇤ ⇤ ⇤ 4 ⇤ Anonymity 4 4 ⇤ 4 4 ⇤⌃ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤⌃ ⇤⌃ 4 ⇤ 4 4

⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ Confidentiality

⇤ ⇤ ⇤ ⇤ ⇤

Identification

Authentication ) Authorization

4 4 4 Availability

⇤ ⇤ ⇤ Non-Repudiation non-MR context. Continuation

⇤ Integrity ⇤⌃ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤⌃ ⇤⌃ Table 2.2: ( proto-MR context, or a † † [Speciale et al., [Du et al., 2014, [de Guzman et al., ‡ MR context, a ‡ † [Erlingsson et al., 2014] [Gross et al., 2006,Gross et al., [Jiang et al., 2017] [Sekhavat, 2017] [Ziad et al., 2016] [de Montjoye et al., 2014] [Crabtree et al., 2016] [Qin et al., 2014, Qin et al., 2016] [Hsu et al., 2011] HE-sift ‡ [Mun et al., 2010] [Ra et al., 2013] Privacy-preserving pose estimation 2019] k-anonymous faces 2008, Newton et al.,GAN-based 2005] face de-identification Brkic et al., 2017, Wu et al., 2018] PDV Leveled CryptoImg SecSIFT P3 Cloth Try-on OpenPDS DataBox Conservative Plane Releasing 2020b] ‡ HE-sift Randomized Response Approach Data Protection Approaches The approaches have been applied to either an 2.4.2 2.4.2 2.4.2 2.4.3 2.4.2 2.4.2 2.4.2 2.4.2 2.4.2 2.4.3 2.4.3 2.4.2 2.4.1 Ch 5 Section

2. Literature Review 38 Compliance

⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ Awareness

4 4 Deniability 4

⌃ Undetectability 4 4 4 4 4 4 4

⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ Unlinkability

⌃ Anonymity 4 4

⌃ ⌃ ⌃ Confidentiality 4 4 4

⌃ ⌃⌃ ⌃⌃ ⌃ ⌃ ⌃ ⌃⌃ ⌃⌃ ⌃⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃ ⌃

Identification

Authentication ) Authorization

Availability 4 4 4 4

⌃ ⌃ ⌃ ⌃ ⌃ Non-Repudiation 4 4 non-MR context. ⌃ ⌃ ⌃ ⌃ ⌃ ⌃

Continuation Integrity ⌃ ⌃ ⌃ ⌃ ⌃ ‡ ‡ ‡ ‡ ‡ Table 2.2: ( ‡ ‡ ‡ proto-MR context, or a † † † ‡ ‡ ‡ [Lantz et al., 2015] [Fang and Chang, ‡ ‡ MR context, a ‡ [Andrabi et al., 2015] ‡ [Lin et al., 2017] [Morris et al., 2006b,Morris [DeVincenzi et al., 2011] [Ens et al., 2015] [Gaebel et al., 2016] [Vilk et al., 2014,Vilk et al., 2015] [Wu and Balakrishnan, 2003] [Cheok et al., 2002] ‡ [Woo et al., 2012] [Eaddy et al., 2004] [Sluganovic et al., 2017] [Xu et al., 2008] [Reilly et al., 2014] [Butz et al., 1998, Butz et al., 1999] [Lebeck et al., 2016, Lebeck et al., 2017] ‡ [Szalav´ari and Gervautz, 1997, Szalav´ari Mobile visual2010] cryptography HoloPair EyeGuide VR Codes Psycho-visual modulation [Yerazunis and Carbone, 2002] Visual cryptography using AR Secret messages using AR Candid Interactions LooksGoodToMe SurroundWeb SecSpace BragFish Arya Emmie RoomPlanner Kinected Conference Cooperative Gestures et al., 2006a] Approach Output Protection Approaches Interaction Protection Approaches Pip et al., 1998] TouchSpace 2 The approaches have been applied to either an 2.5.2 2.6.2 2.5.2 2.5.2 2.5.2 2.5.2 2.5.2 2.6.1 2.6.2 2.6.1 2.6.1 2.5.1 2.6.1 2.6.1 2.6.1 2.6.1 2.6.1 2.6.1 Section

2. Literature Review 39

Compliance

Awareness

Deniability

Undetectability

Unlinkability

Anonymity Confidentiality

⌃⌃ ⌃⌃ ⌃ ⌃ ⌃ ⌃⌃ ⌃⌃ ⌃⌃ ⌃ ⌃ ⌃ ⌃ ⌃ Identification

4 4 4 Authentication

4 4 4 4 4 ) Authorization

44 44 4

Availability Non-Repudiation non-MR context.

Continuation Integrity ‡ † Table 2.2: ( ‡ proto-MR context, or a † † † [Raja et al., [Aslan et al., ‡ ‡ [Maiti et al., 2017] ‡ MR context, a [Rogers et al., 2015] ‡ [George et al., 2017] [Khamis et al., 2016] [Pearson et al., 2017] [Schneegass et al., 2016] [Li et al., 2016b] [Kohno et al., 2016] [Forte et al., 2014] [Venkatasubramanian et al., 2010] † † Widget [M¨ollersand Borchers, 2011] Taps Pska Chameleon-like EyeDecrypt Preventing keystroke inference SkullConduct Facial multi-modal2015] authentication GazeTouchPass Polarization Seamless & secure VR Mid-air authentication gestures 2014] Head and blinking gestures HeadBanger Approach Device Protection Approaches The approaches have been applied to either an 2.7.2 2.7.1 2.7.2 2.7.2 2.7.2 2.7.1 2.7.1 2.7.1 2.7.2 2.7.1 2.7.1 2.7.1 2.7.1 Section 2. Literature Review 40 those approaches under input protection, still primarily lack provisions for plausible deniability. It is of no surprise that data protection approaches (which are mostly generic and non-MR targeted, while a few are proto-MR or MR-related) are the ones that primarily target this property. Furthermore, majority of the data and input pro- tection approaches were applied over traditional media such as images and video, while current widely utilized MR platforms are capturing 3D spatial data to represent the physical space. Thus, there is a huge necessity to design and evaluate data protection approaches aimed at current and upcoming types of data utilized by MR systems.

Summary

In this chapter, we have collected, categorized, and reviewed various security and pri- vacy approaches in MR. We have raised various known and latent security and privacy risks associated with the functionalities of MR and gathered a comprehensive collection of security and privacy approaches on MR and related technologies. We have identified five aspects of protection namely input, data access, output, interactivity, and device integrity, and categorized the approaches according to these five aspects. Furthermore, we identify which security and privacy properties are targeted by the approaches us- ing a list of thirteen known properties. We, then, used these properties to present a high-level description of the approaches and use it to compare them. Among the properties, confidentiality, authorization and undetectability were the most targeted, while there is a considerable lack of provision for other properties. Furthermore, there is also a lack of data protection strategies that are targeted towards 3D MR data, specifically those that are employed by now widely used MR platforms. Therefore, it is opportune to design, investigate, and implement security and privacy mechanisms that can be integrated with existing and upcoming MR systems, while their utilization and adoption are still not widespread. In the subsequent chapters, we proceed with presenting our technical contributions. First, we lay down our theoretical framework in Chapter 3. Then, in Chapter 4 we proceed in demonstrating the privacy leakage in MR data, and present a heuristic measure for spatial privacy risk. In Chapter 5, we present and evaluate a data-centric type of approach that leverages MR data manipulation for privacy preservation. And, in Chapter 6, we present a visual information access control mechanism as an input data flow-centric protection. We inserted these two protection measures into Table 2.2 to see how our work compares to the rest of the literature. Chapter 3

Spatial Privacy Problem in Mixed Reality

Most mobile MR development platforms (i.e. ARCore, and ARKit) utilise a form of visual odometry combined with motion or inertial information to map the device’s position relative to the real-world, while dedicated HMDs (i.e. HoloLens) leverage multiple cameras with depth sensors to understand the environment and create a virtual 3D map. Once a good mapping has been created, the virtual map is shared with applications to allow synthetic or augmented content to interact with the physical world such as anchoring a virtual object on your desk. However, this environment understanding capability required by MR poses unforeseen privacy risks for users. Once these captured 3D maps have been revealed to untrusted parties, potentially sensitive spatial information about the users’ space are disclosed. Figure 3.1 shows an overview of how an adversarial application can perform such attacks by providing the intended function but also collecting the captured spatial data. In this chapter, we formalize our theoretical framework. We define the spatial privacy problem and the adversary model. We specify the privacy metrics to reveal the spatial privacy leakage and their e↵ectiveness, and the utility metrics to quantify the utility of the resulting MR data after privacy preserving transformations.1

Why 3D spatial data?

With images and video, what the “machine sees” is practically what the “user sees” and a great deal of privacy work have been done on these data forms. Contrariwise, in MR, the experience is exported as visual data (e.g. objects augmented on the user’s view) while its 3D nature, especially of the underlying data, is not exposed to the

1This chapter has been adapted from the framework initially described in [de Guzman et al., 2019b] and expanded in [de Guzman et al., 2020b].

41 3. Spatial Privacy Problem in Mixed Reality 42

~ Transformed Point Cloud Si Inserting Privacy M Privacy Preservation Mechanism

MR Device e.g. Microsoft Hololens G Intended Function

3D spatial map User space i Si Y : MR Output of the user space e.g. Pokémon on your desk 2 Adversarial application Adversarial Adversarial H: I|S 1 J Hypothesis: pipeline Attacker information I Historical or previously 3 about S collected spatial data

Figure 3.1: MR pipeline (center) shows an MR function that transforms the G detected spatial map to the rendered output ; an adversarial pipeline (bottom) Si Y is shown in parallel with an attacker having access to (1) historically collected J spaces to infer information (i.e. a hypothesis ) about the (2) current user space I H ; while an intermediary privacy protection mechanism (top) is inserted that S M transforms the raw MR data to a privacy preserving version ˜ . Si Si users: what the machine sees is di↵erent (arguably, even more) than what the user sees. That is, the digital representation of the physical world, the 3D point cloud data, is not exposed to the user. This inherent perceptual di↵erence creates a latency from user and, perhaps, a↵ects (or the lack thereof) how users perceive the sensitivity of 3D information. Furthermore, current MR platforms (i.e. Windows MR, ARCore and ARKit) di- rectly operates on these 3D spatial maps or point clouds and, so far, no privacy preser- vation is applied before providing these data to third party applications. Aside from the spatial structural information, the mapping can also include 3D maps of objects within the space. Moreover, most users are oblivious about the various information that are included in the spatial maps captured and stored by MR platforms. 3. Spatial Privacy Problem in Mixed Reality 43

Table 3.1: Notation Map

Notation Description point cloud representation of space labelled i; Si can be composed of subspaces , s S; and S ⇢ composed of oriented points p, p s p S 8 2 7! 8 2 privacy-preserving mechanism M transformed point cloud released by Si M intended functionality (i.e. MR app or service) G e output of the intended functionality Y G Q di↵erence of the transformed ˜ from S;S S S adversarial inferrer Je hypothesis of about an unknown query space i ; H J ⇤ producing an inter-space hypothesis i⇤ = i, and an intra-space hypothesis centroid cS ⇧1(S; S) inter-space privacy measure in terms of classification error ⇧2(S; S) intra-space privacy measure in terms of distance error e e 3.1 Spatial Privacy Framework

As shown in Fig. 3.1 and following the notation map in Tab. 3.1, we define a space represented by a point cloud identified by a label i which can be segmented into S a set of overlapping point cloud subspaces = s ,s ,...s . An MR functionality S { 1 2 n} produces an output , and from which we derive the utility or QoS function Q. G Y An adversarial inferrer produces a hypothesis to reveal the spatial location of J H an unknown space. Lastly, a privacy preserving mechanism transforms , or its M S subspaces S, to a privacy-preserving version ors ˜,i.e. : S or s S ors ˜. S⇢ S M 7! In the succeeding sections, we formalize the adversary model in §3.3, the privacy e e metrics in §3.4, and the functionality and utility metrics in §3.5. Before we proceed, we first introduce 3D data representation.

3.2 3D Spatial Data

There are various ways that MR-capable devices capture and compute 3D spatial data. Depending on the platform, the underlying environmental mapping approach would be any or combinations of the following: simultaneous localization and mapping (SLAM), visual odometry, and/or structure from motion (SfM). We direct the readers to [Saputra et al., 2018] for a comparison of these visual mapping algorithms. 3. Spatial Privacy Problem in Mixed Reality 44

{ , , } nx ny nz

{ , , } x y z

y-axis

z-axis x-axis Figure 3.2: An oriented point with position vectorp ˆ = x, y, z and normal vector { } nˆ = n ,n ,n . A group of these oriented points constitute a 3D point cloud. A { x y z} mesh [triangle] information can also be provided to indicate how these points are put together to form surfaces.

Regardless of the underlying mapping algorithm, the aim is to construct a 3D spatial map represented by a set fundamental geometric structures such as volumes, planes and points. The most common form are oriented points as shown in Figure 3.2.2 These are 3D points described by their x, y, z -position in space and usually accompa- { } nied by a normal vector n ,n ,n which informs of the orientation of the surface to { x y z} which the point belongs to; thus, each oriented point can be represented by a 6-element array x, y, z, n ,n ,n . (When normal vectors are not readily available, it is esti- { x y z} mated from the points themselves.) A collection of these oriented points together with an accompanying mesh information, which informs of how the points are connected to form surfaces, constitute a point cloud. The points may also be accompanied by pho- tometric information such as RGB or light intensity extracted from associated images or videos. For this work, we will only be focusing on the use of geometric information and leverage them for 3D place recognition as a form of adversarial attack.

3.3 Adversary Model

Current mechanisms in place to protect user data in MR platforms only employ ac- cess control approaches, such as pop-up permission requests with a binary option to allow applications to capture spatial data. Thus, once an application has accessed the spatial data, no further protection is provided. As shown in Fig. 3.1, any potentially adversarial application can access and store all captured 3D spatial maps. These ad- versaries may desire to infer the location of the users, or they may further infer user poses, movement in space, or even detect changes in user environment. Furthermore, in contrast to video and image capture, 3D data can provide a much more lightweight

2Without loss of generality, it is easy to show that oriented volumes and planes can be decomposed to a set of oriented points. 3. Spatial Privacy Problem in Mixed Reality 45 and near-truth representation of user spaces. In this work, we will focus on the spatial inference attack where the adversary aims to recognize the location of the user (as shown in Figure 3.1) on two levels given historical 3D data of user spaces:

1. the general location of the user (among the known ensemble of spaces) we call inter-space inference, and

2. the specific location of the user within the space we call intra-space inference.

We assume that the adversary’s aim is to individually infer user spaces; thus, an attacker will develop a di↵erent reference model for every user. And that it can only infer spaces that the user has historically been in. Moreover, we assume that the historically collected spatial data are tagged with ‘traditional’ location identity such as a GPS location. However, during the attack, we assume that ‘traditional’ location sources are turned o↵; thus, the attacker has no access to other location information aside from the revealed spatial data. Therefore, we demonstrate that adversaries can make location inference from spatial data even without access to traditional location information during the attack. Furthermore, using these attacks, we will investigate how good can a spatial complexity heuristic measure signify a space’s risk for spatial inference.

Defining adversarial inference The attacker produces a two-element hypoth- J ? esis for our two-level attack: inter-space location, i.e. unknown location i = i,to H ⇤ determine in which of the reference spaces the query space is; and intra-space location defined by a centroid c = x ,y ,z of an unknown point cloud S . Hypothesis S⇤ { ⇤ ⇤ ⇤} i⇤ H is defined as follows:

: ⇤ : i⇤ = i, and c = x, y, z , where (3.1) J Si ! H s⇤ { }

i⇤ = i = argmax LS⇤ (i⇤ = i), and (a) i 8 c = x⇤,y⇤,z⇤ c = x ,y ,z , (b) S⇤ { } 7! Sr { r r r} where L (i = i) is a likelihood function about the unknown space having a label S⇤ ⇤ S⇤ i = i, and x ,y ,z = k is the centroid of the best reference key point ⇤ { r r r} |{ Sr,b}|centroid matches. Intra-space inference is the primary step in tracking or localization. Aside from knowing the general location of a user, which can have a huge width (e.g. r 100m), ⇡ an attacker would also require the exact location of the user. For example, if an adversarial advertising service with various points of presence (PoP) can tell if one of 3. Spatial Privacy Problem in Mixed Reality 46 their PoP is within user view, then, the service can ship an advertisement to the PoP nearest to the user. Conversely, if the attacker wrongly infers the intra-space location of the user, then the attack has failed. The methods employed for inference are further discussed in §4.2.

3.4 Privacy Metrics

We pose spatial privacy as an extension of location privacy; that is, the spatial privacy preservation intends to prevent location inference from spatial data. Thus, we extend an error distance-based location privacy metric [Hoh and Gruteser, 2005] for our spatial privacy metric. A few works in the literature use a similar classification error-based metric for privacy [Wagner and Eckho↵, 2018, Shokri et al., 2011].

Defining the privacy metrics We define the inter-space privacy function ⇧1 in terms of the inter-space misclassification error rate of the inferrer . We treat the J correct classification event as a discrete random variable defined by a discrete delta function as follows

(i⇤,i)= 1ifi⇤ = i;0 if i⇤ = i { 6 } so that we can pose ⇧1 as the expectation of misclassification given an unknown point cloud like so S⇤ ⇧ (S⇤; S)= (1 (i⇤,i))L (i⇤ = i) (3.2) 1 S⇤ i i X8 X8 ⇤ where i is the hypothesis label of about the true label i of an unknown space , ⇤ J S⇤ and Ls⇤ (i⇤ = i) is the likelihood function. Likewise, we can also pose a secondary metric ⇧2 for intra-space privacy which captures how accurate an adversary can estimate a user’s position within a space; that is, the user, at any given time is at a specific subspace within the larger known space. We define the mean intra-space distance error as follows: 1 ⇧ (S; S)= d(c ,c ) (3.3) 2 S ,S : i = i S⇤ S ⇤ ⇤ S ,S:i =i |8{ }| 8{ ⇤X⇤ } e where the d(cS⇤ ,cS) represent the total distance error between the correct intra-space location c = x, y, z and the hypothesis intra-space location c = x ,y ,z = S { } S⇤ { ⇤ ⇤ ⇤} x ,y ,z for a given unknown space . The distance formula is as follows: { r r r} S⇤ d(c ,c )= c c S⇤ S || S⇤ S|| = (x x)2 +(y y)2 +(z z)2. (3.4) r r r p 3. Spatial Privacy Problem in Mixed Reality 47

3.5 Mixed Reality Functionality

Perhaps, the most common functionality in MR is the anchoring of virtual 3D objects on to real world surfaces (e.g. the floor, walls, or tables). At the minimum level, a static anchor only requires an x, y, z point with an orientation n ,n ,n .Aset { } { x y z} of these oriented points which forms a plane or surface can allow for dynamic virtual augmentations that move about the surface. Furthermore, a set of these planes or surfaces can constitute a space which, then, allows for augmentations that move about the space.

Defining the function utility For a given functionality , an e↵ective privacy G mechanism aims to make the resulting outputs from the raw point cloud and M Y S its privacy-preserving version similar, i.e. Y . Thus, we define utility as S YS ' S the Quality-of-Service (QoS) in terms of the output transformation error, QY ;YS , or e e S the di↵erence of the transformed output from the original output S: QY ;Y = YS Y eS S YS Y , which we aim to minimize. | S| e e For consistently anchored augmentations, some functionalities require near-truth e point cloud representations. This implies that we can directly compute the Q YS ;YS as the di↵erence QS;S of the point clouds themselves instead of the outputs likee so Q = S S . S;S | | e e e Defining the utility metric We can define QS;S as a utility metric by specifying it as a mean transformation error (QoS) with the following components e

QS;S = mean [↵ ( p p˜ )+ (1 n ~ p ~n p˜)] (3.5) p˜ S, p S · || || · · 8 2 8 2 e where the first component, i.e.e p p˜ ,isthepoint-wise Euclidean di↵erence of || || the true/raw point p from the transformed pointp ˜, while the second component are their point normal vector di↵erence using the di↵erence from 1 of their normal vector cosine similarity (i.e. n ~ ~n ). The coecients ↵ and are contribution weights where p · p˜ ↵, [0, 1] and ↵ + = 1. We set ↵, =0.5. To compute this point-wise di↵erences, 2 we have to first find the nearest neighbor pairs of each point in the transformed point 1nn cloud from the raw point cloud ,i.e.˜p S p S; thus, the di↵erences are S S 2 7! 2 computed between the 1nn p,˜ p -pair of points and is computed p˜ S to get the e { } e 8 2 mean di↵erence over the entire transformed point cloud . S e Moreover, we also specify an inequality constraint that defines the maximum e permissible transformation error as follows:

mean [↵ (1 p˜ p )+ (~n p˜ ~n p)] (3.6) p˜ S, p S · || || · ·  8 2 8 2 e 3. Spatial Privacy Problem in Mixed Reality 48

The can be used as a tunable parameter depending on the exactness required by the MR function: higher gamma implies that the MR function does not require released spaces to be very exact to the true space, while a small gamma implies exactness. Therefore, a desired mechanism produces that maximizes the privacy func- M S tions ⇧1 and ⇧2 while minimizing Q. Moreover, we will freely use the following e notation ⇧1(⇥) or ⇧2(⇥) where we indicate a set of parameters ⇥ that specifies the transformation . M

Summary

In this chapter, we have formalized the spatial privacy framework and has defined it around the spatial data that is captured by MR devices. Then, we specified the adversary model where an attacker attempts to identify the location of the spaces the user is in while using MR. We also defined a two-level privacy metric based on the inter-space (i.e. the general location of the user) and intra-space location (i.e. the specific location of the user within the inter-space location). These serve as basis for our proposed solutions in the subsequent chapters. In the next chapter (Chapter 4), we proceed with demonstrating the privacy leakage in MR data, and present a heuristic measure for spatial privacy risk before we present and evaluate proposed protection measures in Chapter 5 and Chapter 6. Chapter 4

Spatial Privacy Leakage from 3D Mixed reality data

In this chapter, we focus on the nascent risks from captured and collected 3D data used for MR processing.1 To demonstrate the privacy leakage, we present two adver- sary approaches that recognizes the space, i.e. inter-space, and also infers the user’s location within the space, i.e. intra-space, and apply these over various 3D spatial data captured using mobile AR/VR/MR platforms: i.e. Microsoft HoloLens, and An- droid with Google ARCore. To construct these attackers, we build up on existing 3D place recognizers that have been applied over 3D lidar data: an intrinsic descrip- tor matching-based 3D object recognizer, and a deep neural network-based 3D shape classifier. We modify them to operate in the scale on which 3D data is captured by MR platforms. We demonstrate how easy it is to extend these methods to be used as an attacker in the MR scenario whilst quantifying the privacy leakage. Then, we present a cosine-similarity-based spatial complexity metric, [0, 1], as a heuristic or 2 empirical measure that can signify the inference risk a captured space has (i.e. higher , higher risk). Our experimental evaluation shows that the spatial complexity mea- sure from the captured spatial map of a space can be utilised as an indicator of the space’s inference risk. Before demonstrating the spatial privacy leakage, first, we will focus on the actual 3D data that we will be utilizing and its properties.

1This chapter contains results from two works under review [de Guzman et al., 2020b,de Guzman et al., 2020a]

49 4. Spatial Privacy Leakage from 3D Mixed reality data 50

4.1 3D data from Mixed Reality devices

We gathered real 3D point cloud data from various environments using two MR plat- forms: a dedicated head-mounted Windows MR device, i.e. the Microsoft HoloLens, and Google Pixel 2 phone with ARCore. Both of these platforms (including Apple’s ARKit) utilize simultaneous localization and mapping (or SLAM) to create a 3D map of the environment. These maps are calculated from the device’s position and ori- entation (using an inertial measurement unit or IMU) in space relative to multiple visual features in the environment using at least one camera. We used the first gener- ation HoloLens which has 4 cameras and an additional depth camera.2 Similarly, the ARCore achieves spatial mapping and environment understanding using the device camera and motion sensor, but, as an MR platform for non-MR dedicated devices, it has to support multiple devices and make sure that they have sensitive motion tracking and adequate CPU power.3 As shown in Figure 4.1, our collected environments include the following spaces: a work space, a reception area, an oce kitchen or pantry, an apartment, a drive way, a hall way, and a stair well.

3D data from Microsoft HoloLens As a dedicated device with multiple visual sensors, the HoloLens produces a precise spatial mapping (relative to what ARCore can produce). Figure 4.2 shows example spatial data captured by the Hololens. The produced point cloud (Figure 4.2.b) includes normal vectors (not shown) and a tri- angle mesh information that informs how the points are connected to form surfaces (Figure 4.2.c). This makes it directly usable for delivering MR functionality.

3D data from ARCore On the other hand, ARCore produces less precise point clouds which are accompanied by per-point confidence values instead of normal vectors. Thus, the point cloud is not directly utilized; instead, ARCore detects Trackables, usually planes (or 2D image targets for ARCore’s AugmentedImage and CloudAnchor APIs) and 3D points, and provides their pose (position and orientation) to deliver MR functionality. Figure 4.2.e shows the convex polygon of trackable planes provided by ARCore while Figure 4.2.f has the raw point cloud overlaid; we also show a surface-to- plane generalization of HoloLens data in Figure 4.2.d for comparison with the planes captured by ARCore.

2For more details on Hololens 1st gen hardware, please check https://docs.microsoft.com/ en-us/hololens/hololens1-hardware (Accessed May 4, 2020) 3ARCore also supports Apple iPhones; check the list of supported devices at https://developers. google.com/ar/discover/supported-devices (Accessed May 2, 2020.) 4. Spatial Privacy Leakage from 3D Mixed reality data 51

2D-view of sample region

3D surface plot of sample region

Figure 4.1: HoloLens-captured 3D point clouds of the 7 collected environments (left); a 3D surface of a sample space (bottom-right), and its 2D-RGB view (top-right).

Gathered Dataset We employed both platforms to demonstrate the breadth of sensing capabilities of MR platforms. The spatial maps produced by the Hololens has a combined approximate surface area (i.e. including the vertical surfaces and other objects whose 3d maps are captured) is 1434.7m2. The combined memory size of the raw point clouds is 39.7MB. On the other hand, for ARCore, we collect the produced trackable planes as our spatial maps instead of the point clouds as they contain more meaningful representations of the spaces as we can see in the examples in Figure 4.2 (as well as in Figure 4.1). Since normal vectors are not readily provided, we estimated and oriented them towards the viewer or camera pose. The combined approximate surface area of the ARCore collected spaces is 546.37m2, and the memory size of the combined raw trackable plane collection is only 254kB.

Augmenting Raw Dataset with Generalized Spaces Surface-to-plane gener- alization can be used for sanitizing information, e.g. the 3D spatial map of small sensitive objects, that is below the generalization resolution. However, it has subse- quently been shown that such approach does not readily provide protection against 4. Spatial Privacy Leakage from 3D Mixed reality data 52

Actual Scene HoloLens (point cloud) HoloLens (3D surface)

(a) (b) (c)

HoloLens (surfaces-to-planes) ARCore (Trackable Planes) ARCore (Point Cloud)

(d) (e) (f)

Figure 4.2: Visualized example spatial data captured by HoloLens and ARCore: (a) photo of the scene; (b) the 3D point cloud captured by HoloLens; (c) the 3D surface visualization following the mesh information provided by HoloLens; (d) an example surface-to-plane generalization from the HoloLens spatial data; and (e) the convex polygon of the planes (f) overlaid with the point cloud captured by ARCore. spatial inference attacks [de Guzman et al., 2019c]. Nonetheless, we augment our gath- ered HoloLens-captured spatial maps with generalized versions to sanitize or remove smaller, potentially sensitive objects which have been inadvertently captured. We use the same random sample consensus or RANSAC plane generalization algo- rithm [Fischler and Bolles, 1981] employed in [de Guzman et al., 2019b]. The gener- alization introduces variations from the raw dataset as it forces points that satisfies a set position and orientation criteria to belong to the same plane; thus, the generalized set is e↵ectively between the raw Hololens captures and the ARCore captures in terms of complexity. We do not generalize ARCore data since they are already planar. Ulti- mately, we have three main data sets: raw captures from HoloLens (Holo-Raw), gen- eralized versions of the HoloLens dataset (Holo-Gen), and ARCore captures (ARCore) of the same set of spaces. 4. Spatial Privacy Leakage from 3D Mixed reality data 53

4.2 Spatial Inference Attack

As shown in Figure 3.1, any potentially adversarial application can access and store all captured 3D spatial maps. These adversaries may desire to, at the very least, infer the location of the users, or they may further infer user poses, movement in space, or even detect changes in user environment. Furthermore, in contrast to raw video and images, 3D spatial data can provide a lightweight and near-truth representation of the structure of user spaces.

4.2.1 Adversarial inference

In Section 3.3, we defined the adversarial inference model; here, we specify the actual attack method. As shown in Figure 4.3, inference is a two-step process: (1) the training of a reference model or creation of a dictionary using 3D description algorithms over the previously known spaces as reference, (2) and the inference of unknown spaces by testing the model, i.e. matching their 3D descriptors to that of the reference descriptors from step 1. We assume that the adversary has prior knowledge about the spaces which they can use as reference. Prior knowledge can be made available through historical or publicly available 3D spatial data of the user spaces, previously provided data by the user themselves or other users, or from a colluding application or service that has access to raw or higher resolution 3D data. Furthermore, we assume that the adversary is aware of the surface-to-plane generalizations (Section 4.1) that can be applied on released point cloud data and be able to adjust its attack accordingly.

4.2.2 3D recognition methods

We also looked in to the various methods in the literature that are utilized for 3D shape analysis and classification.

Feature-based 3D shape analysis

Most of 3D shape analysis methods measure similarity between two or more 3D data, i.e. point cloud data, with the goal of 3D object classification, and most of them have been designed and developed with a specific classification or recognition task in (such as protein structure classification). These approaches employ descrip- tion and inference by, first, computing 3D features and, then, utilizing these features for inference, say, by matching. For example, 3D Shape histograms [Ankerst et al., 1999] decomposes the complex 3D object into a predefined number of cylindrical and radial bins and compares these histograms as feature vectors for measuring dissimi- larity of 3D objects. The 3D shape histograms are then compared using quadratic 4. Spatial Privacy Leakage from 3D Mixed reality data 54

Step 1: Training and Building Reference Database

Reference Database i.e. 3D descriptors for NN-matcher, 3D sub maps for pointnetvlad Historical or previously captured 3D spatial maps pointnetvlad (deep NN training)

Step 2: Inference and/or Database Querying

Spatial Inference

pointnetvlad ? H: I|S or Point cloud of [unknown] space S NN-matcher

Figure 4.3: Adversarial Process: Step 1 involves the building of the reference database from historical maps, and the training of pointnetvlad’s deep neural network; Step 2 is inference where the reference database is queried to match an unknown point cloud to get a hypothesis H about ’s identity or label I. S S distance functions instead of Euclidean. On the other hand, parameterized 1D shape distributions – including Euclidean distance of random 3D point pairs, angle between three random 3D points – can be compared to measure 3D shape similarity with the Euclidean distance performing particularly well in shape classification [Osada et al., 2002]. This has been extended to a 2D distribution to include the dot product of the normal vectors [Ohbuchi et al., 2005]. A similar 2D distribution-based approach has been employed to measure vector field similarity [Dinh and Xu, 2008]. However, most of these early work extracts feature or similarity vectors as global, i.e. whole-of-object, and, consequently, assumes that object classes are homogeneous – for example, a “car” object class includes 3d data of various car models. Thus, these 3D similarity measures have to be adapted to be usable with heterogeneous 3D data such as a ”cluttered” 3D scene. Now, the challenge becomes the segmentation of 3D data to identify and sepa- rate di↵erent objects within the ”cluttered” scene. Other methods localized feature extraction which results to a set of local feature vectors to describe an object. Spin images [Johnson and Hebert, 1998, Johnson and Hebert, 1999] were one of the early popular approaches to 3D description for object recognition where they only rely on first-order normal vector information to construct cylindrical “spin” descriptors sim- ilar to shape histograms. Gestalt descriptors [Bosse and Zlot, 2009, Bosse and Zlot, 2013] are similar to the spin image descriptors but have been optimized for place 4. Spatial Privacy Leakage from 3D Mixed reality data 55 recognition but assumes that points can only rotate about the zenith-pointing-axis. Other improvements such as self-similarity descriptors [Huang and You, 2012] utilized curvature information for both key point selection and computation of a spherical descriptor. Another approach took inspiration from surface thermodynamics and uti- lized heat-kernel based features [Sun et al., 2009] which demonstrated better accuracy. However, these new approaches using curvature and heat-kernel are dependent on second-order computations which requires consistency on the shape structures. For a concise discussion and bench marking of di↵erent 3D description algorithms, we direct the reader to [Bronstein et al., 2010].

Machine learning-based

On the other hand, the latest 3D recognition approaches have employed machine learning and has been more successful in 3D shape segmentation. PointNet was the first to use deep learning, i.e. deep neural network (DNN), for 3D classification with point clouds as input [Qi et al., 2017a]. PointNet++ presented some improvements in the underlying DNN architecture [Qi et al., 2017b]. PointNetVLAD [Uy and Lee, 2018] combines PointNet with NetVLAD [Arandjelovic et al., 2016], which is an image-based place recognizer, to enable point clouds as input for large-scale place recognition.

Chosen Attack Methods We utilize two methods for our spatial inference attack: a descriptor nearest-neighbor matching approach we call NN-matcher (employing spin image descriptors as 3D descriptors), and a deep neural network (DNN) based ap- proach called pointnetvlad [Uy and Lee, 2018]. For the NN-matcher,weimprove upon the attacker originally utilized in [de Guzman et al., 2019b]. For the preliminary work on choosing the 3D descriiption method, please see the Appendix B. Fig. 4.3 shows a diagram of the overall process involved in the two inference methods. It has been previously demonstrated that spatial generalizations, i.e. 3D surfaces generalized as a set of planes, is an inadequate measure for spatial privacy [de Guzman et al., 2019b]. Furthermore, generalizations can easily be replicated by an adversary allowing it to adjust its attack accordingly. Thus, as shown in Fig. 4.3, for both methods, we augment the raw captured point cloud data with multiple generalized versions as a combined reference ensemble.

4.2.3 Inference using 3d Descriptors: NN-matcher

Step 1 for NN-matcher involves the construction of a reference set of descriptors from the historical data available augmented with generalized versions. A subset of ori- ented points for each space are selected and are called key points. Then, for each key 4. Spatial Privacy Leakage from 3D Mixed reality data 56 point, an intrinsic feature descriptor is computed. The key point selection and feature computation depends on the chosen algorithm. The result of this process results to the accumulation of a set of key point-feature pairs for each reference space like so: k , f = k ,f , k ,f ,... for any space . Then, for the inference step, we { i i} {{ i,0 i,0} { i,1 i,1} } Si determine the best match k , f among the reference set k , f , k , f ,... with { i i} {{ 1 1} { 2 2} } the key point-descriptor set of the unknown query space k , f { q q}

Inter-space inference. For inter-space inference, we utilize a deterministic matching- based approach using nearest-neighbor matching in the descriptor space, i.e. f { q} 7! f . First, we get the nearest neighbor distance ratios (NNDR) of all the candidate { i} reference descriptors, f f , f ,... , to that of the query descriptors f .We { i} 2 {{ 1} { 2} } { q} compute the NNDR like so: d(f ,f ) q,1 i,1 < t, d(fq,1,fi,2) where d( ) is some distance measure in the descriptor space, and descriptor f is the · i,1 nearest neighbor of the query descriptor fq,1 while fi,2 is its second nearest neighbor. If the NNDR falls below a set threshold t, then the candidate key point-descriptor pair

ki,1,fi,1 and query key point-descriptor pair k ,1,f ,1 is an acceptable matched { } { ⇤ ⇤ } pair. For our implementation, we use the Euclidean nearest neighbors, i.e. 2-nn, for the descriptor distance function d( ). Line 8 in Alg. 1 shows this 2nn distance computation · in the descriptor space which produces two arrays: Ei,q contains indices of the 2nn descriptors from reference space i,whileDi,q contains the distance values. Afterwards, the NNDR of the two best matching reference descriptors for every query descriptor are computed (line 10) from the maximum-normalized distance values Di,q. Originally, in [de Guzman et al., 2019b], an optional step (line 11) trims the NNDR matches and only keeps those below the set threshold t. Then, the set of matches are further trimmed (line 12, and 13) by only accepting the matches with unique key point matches, k q 7! ki; that is, every candidate key point-descriptor pair matches to a unique query key point. If there are query key points matched to the same candidate key points, we keep the pair with the lowest NNDR. The resulting key point matches for all reference spaces are kept (line 14). Finally, to determine the best matching candidate space among the reference Si spaces S ,S ,... ,weemployavoting mechanism which was adapted from place { 1 2 } recognition over 3D lidar data [Bosse and Zlot, 2009,Bosse and Zlot, 2013]. A weighted score ⇠i is computed (line 15) based on the product of the percentage of uniquely E indexed lowest ratio matches per candidate space, | i| , and that of the di↵erence from fq | | 1 of the mean of the NNDRs, mean(D[Ei]), of the unique (key point-wise) matches. The 4. Spatial Privacy Leakage from 3D Mixed reality data 57

Algorithm 1: NN-Matcher: Inter-space inference Data:

f1,k1 , f2,k2 ,..., fI ,kI the ensemble set of information from reference spaces where {{ } { } { }} fi,ki is the set of key point-descriptor pairs for reference space i { } fq,kq is the set of key point-descriptor pairs for a query space { } Result: i candidate reference space, and kq k1 the matched query and reference key ⇤ { 7! ⇤} points

1 J a 3D feature matcher, i.e. 2nn matcher 2 t is NNDR threshold = 0.9

3 Ei,q a 2d-array of 2nn indices

4 Di,q a 2d-array of 2nn distances // both E and D have shape Q 2, where Q is the ⇥ length of query descriptors fq

5 S = ⇠1,⇠2,... , the global matching scores of reference spaces i { } 6 K = kq k1 , kq k2 ,... are the set-pairs of matched query key points kq with the {{ 7! } { 7! } } key points ki of every reference space i

7 for fi,ki f1,k1 to fI ,kI do { } { } { } 8 Ei,q,Di,q = J(fi,fq)

9 Di,q max normalize(Di,q)

10 D Di,q[:, 0]/Di,q[:, 1] // the array of the same length as fq containing the

NNDR for every query feature with its top-2 matches among the reference features

11 (optional strict step) Ei,t Ei,q, where D1 < t // the subset of indices with ⇢ NNDR

12 kq kq,ki ki , where kq ki // the unique key point matches { ⇢ ⇢ } 7! 13 Ei Ei,T of kq,ki // the subset indices corresponding to the unique key ⇢ { } point matches

14 K K + kq,ki { } E 15 | i| ⇠i =(1 mean(D[Ei])) f · | q | 16 S S + ⇠i

17 i = argmax( ) ⇤ S reference space, with the highest score–akin to the argmax operation of Eq. 3.1a–is the best candidate space for the query space (line 17). The key point pairs of the resulting matched space is provided to the intra-space matcher to determine the intra-space location of the user.

Intra-space inference We improve upon the mechanism used in [de Guzman et al., 2019b] by extending the inference to intra-space location. Alg. 2 shows the pseudocode for the intra-space checking method. The resulting keypoint matches from inter-space inference are fed to the intra-space matcher to determine the intra-space location of the user. The collection of key point pairs is further trimmed (line 4) to only get the pair 4. Spatial Privacy Leakage from 3D Mixed reality data 58

Algorithm 2: NN-Matcher: Intra-space inference Data:

kq k the matched query and reference key points { 7! ⇤} the NNDR ratios of the matched key points D⇤

Result: c centroid of the matched reference key points ki ⇤ ⇤ 1 G( ) a graph · 2 t1 is NNDR threshold = 0.9

3 t2 is geometric similarity threshold = 0.95

4 kq,1 kq, ki ,1 ki , where D

5 Gq,1 = G(kq,1) // complete graph with elements of key point positions 3 pk kq,1( R ) as vertices, and edges defined by the point-to-point distance 2 2 vector vˆ R3. 2 6 G ,1 = G(k ,1) // we get the same for the reference key points; in matrix form, ⇤ ⇤ the dimensions of Gq,1 and G ,1 are the same and their elements should be ⇤ consistently ordered following their matched key points

7 Vq Edges(Gq,1) L2, || || V Edges(G ,1) L2 // we get the L2-norms of the distance vectors which are ⇤ || ⇤ || the edges of the complete graphs

8 Aq internal angles(Edges(Gq,1)),

A internal angles(Edges(G ,1)) // we get the internal angles formed by the ⇤ ⇤ distance vectors with each other

9 Sd = exp( 0.5 Vq V ) // we compute the distance similarity using an ⇤| ⇤| exponential function with rate = 0.5; Sd [0, 1] with 1 being the highest 2 similarity

10 = cosine similarity(Aq,A ) // we compute the cosine similarity of the two S ⇤ sets of internal angles; S [0, 1] with 1 being the highest similarity 2 11 Sintra = Sd S // combined product similarity measure · 12 kq,2 kq,1, k ,2 k ,1 , where Sintra t2 // only keep the key point pairs with { ⇢ ⇤ ⇢ ⇤ } Sintra t2 13 c = centroid(k ,2) ⇤ ⇤ 4. Spatial Privacy Leakage from 3D Mixed reality data 59 of reference and query descriptors whose nearest neighbor distance ratio (or NNDR)

Then, using their corresponding key points, as in kq k , we perform a geometric { } 7! { ⇤} structure consistency check which produces the best pairs of key points with consistent structural relationships; that is, the graph (or sub-graph) of the reference key points is similar to the graph (or sub-graph) of the corresponding set of query key points. This can be generalized to the NP-complete sub-graph isomorphism problem but we instead use a heuristic-based approach. For our implementation, we find the fully-connected graphs of both query (line 5) and matched reference (line 6) key points; the key points are the vertices while the edges are defined by the point-to-point vector connecting the key points. We compute the distances using L2-norm of the point-to-point distance vectors (line 7), and the internal angles (using cosine similarity) formed by the vectors (line 8). We compute a distance similarity measure (line 9) Sd, and an angular similarity measure (line 10) S. The product of the two measures forms the combined similarity metric Sintra (line 11). The vertices, i.e. key point pairs, with an acceptable Sintra 0.95 are the accepted structurally consistent intra-space key point pairs. Finally, we compute the intra-space distance d(c ,cq) of the true centroid cq of the query space to that of the centroid of ⇤ the matched reference space c within the query space as described by Eq. 3.4. ⇤

4.2.4 Inference using DNN: pointnetvlad

To produce a large-scale place recognizer with 3D point cloud as input, pointnetvlad combines the deep network 3D point cloud shape classifier PointNet [Qi et al., 2017a] with NetVLAD [Arandjelovic et al., 2016], which is a deep network image-based place recognizer. We adapted the operational scale of pointnetvlad to human-scale (sub- spaces at the order of 1m) as it was originally used over lidar-scale (sub-spaces at the order of at least 10m). We direct the reader to [Uy and Lee, 2018] for more details on pointnetvlad such as the DNN architecture.4 To train pointnetvlad, we split the reference point clouds to disjoint training and validation sets. The point cloud sets are further subdivided to similarly-sized overlap- ping submaps with a radius of 2m and are re-sampled to contain 4096 points. The submap intervals are 0.5m and covers all 3 axes (while the original pointnetvlad only covers the two ground axes). The PointNet layer produces local feature descriptors for each point in a submap, and, then, feed those to the NetVLAD layer to learn a global descriptor vector for the given submap. For inference, first, pointnetvlad creates a reference database of the global de-

4Original pointnetvlad code can be accessed at github.com/mikacuy/pointnetvlad. 4. Spatial Privacy Leakage from 3D Mixed reality data 60

(a) Sample partial spaces of a bigger space (b) Generalizing the partial spaces

Figure 4.4: Example (a) partial releases with (b) generalization scriptors of the combined raw and generalized spaces available. The reference dataset contains computed global descriptor, the inter-space label, and intra-space coordinates of the reference sub-spaces. Then, the point cloud of the query space will be divided into submaps and be directly fed to the pointnetvlad model which likewise produces a global descriptor of the query point cloud. To find the two-level spatial inference hypothesis labels, the top-1 nearest neighbor of the query global descriptor from the reference dataset is the inter- and intra-space hypothesis. We employ these two attack methods for spatial inference. We evaluate their performance over both HoloLens and ARCore data, and investigate the viability of a spatial complexity measure as an indicator of inference risk.

4.3 Information Reduction Methods

Directly releasing raw point clouds exposes all spatial information as well as structural information of sensitive objects within the space. A mechanism can be inserted, as shown in Fig. 3.1, along the MR processing pipeline to provide privacy protection. We present two baseline protection measures: namely, partial releasing, and surface- to-plane generalizations.

Partial Spaces To limit the amount of information released with the point clouds, we utilized partial releasing during inference validation in Section 4.5 to provide MR applications the least information necessary to deliver the desired functionality. With partial spaces, we only released segments of the space with varying radius. Fig. 4.4a shows an example space with 2 partial releases. Partial releasing can either be per- formed once or up to a predefined number of releases if more of the space is necessary for the MR application to provide its service. Then, succeeding revelations of the space are no longer provided to the MR application. Moreover, partial releasing can 4. Spatial Privacy Leakage from 3D Mixed reality data 61 be applied over raw or generalized spaces.

Surface-to-plane Generalization As discussed in Section 3.2, to deliver augmen- tations, MR platforms digitally maps the physical space to gain understanding of it. And, as discussed in Section 3.5, depending on the desired functionality, an MR ap- plication may require just a single oriented point as a static anchor, a single plane or surface, or a set of surfaces for dynamic augmentations. Thus, arguably, without a significant impact on the delivery of the desired MR functionality, any surface within a space can be generalized into a set of planes. Furthermore, surface-to-plane gener- alizations inadvertently sanitizes information that is below the desired generalization resolution. For example, a keyboard on a desk surface may be generalized as part of the desk. However, spatial information, i.e. location, may still be inferred as we will reveal later. To perform surface-to-plane generalization, we employed the popular Random Sample Consensus (or RANSAC) plane fitting method [Fischler and Bolles, 1981]. RANSAC is run multiple times over the given point cloud to find a set of points that can form a plane. First, a random oriented point is picked (i.e. with x, y, z -position { } and n ,n ,n -normal). Then, RANSAC checks whether other points form a con- { x y z} sensus plane with the random point given a set plane-to-point distance threshold. For our implementation, we use a strict threshold of 0.05. This is performed for multiple trials to try to get the largest planes, i.e. largest consensus of points that will form a plane. For every plane formed, the points associated to it are removed from the point cloud. The process is repeated until all the remaining points can no longer form a consensus plane or until a set number of planes have been formed. Fig. 4.4b shows an example set of planes that are released after RANSAC generalization of the revealed partial raw point cloud shown in Fig. 4.4a.

4.4 Evaluation Setup

We employ these two attack methods for spatial inference. We evaluate their perfor- mance over both HoloLens and ARCore data, and investigate the viability of a spatial complexity measure as an indicator of inference risk.

Dataset We utilize the three datasets; Holo-Raw, Holo-Gen, and ARCore introduced in Section 4.1. To build the reference datasets, we augment the raw captured point cloud data from HoloLens with a generalized version. Specifically, we combine the captured Holo-Raw spatial maps with a generalized set from Holo-Gen as our reference HoloLens ensemble; for evaluation, we use disjoint sets from Holo-Gen. On the other 4. Spatial Privacy Leakage from 3D Mixed reality data 62 hand, for ARCore, since the spatial maps are already planar, we use multiple ARCore captures of the spaces as a combined reference ensemble. Specifically, we use two sets of ARCore captures for the reference or training ensemble, and use another three sets for evaluation.

Inference Success Validation For inference success validation, we use one-time partially released spaces. We select 1000 random submaps each from the Holo-Gen and ARCore testing datasets. To get a submap, we randomly choose a point as centroid or vertex and get its spherical neighborhood of points within radius r. Then, we vary the r to see whether the attackers (as shown in Step 2 in Figure 4.3) can successfully identify from which space the submap is from. We assume that the attacker have access to historical data and has performed training and reference database construction o↵- line (as shown in Step 1 in Figure 4.3). To measure performance, we use the F1 score which is the harmonic mean of the precision and recall per space. To get an overall F1 score, we get the unweighted average of per space precision and recall.

Investigation Setup Next, we investigate the viability of generalization as a privacy measure also over one-time partially released spaces from Holo-Raw and Holo-Gen.We compare the privacy leakage in terms of the two-level adversarial inference error of our attackers, as defined in Section 3.4: inter- (⇧1), and intra-space (⇧2) inference. For the first level, privacy can be directly linked, as in Equation 3.2, to the inter-space inference error rate. While, for the second level, as in Equation 3.3, intra-space privacy can be related to the distance error which is in distance units u (where 1 u is approximately 1 meter). Moreover, we set the following desired subjective lower-bounds for the privacy metrics. For inter-space privacy, we can set a desirable lower-bound at ⇧ 0.5. 1 This means that an adversary can only make a correct guess at most half the time. Furthermore, we define 1.0 ⇧ 0.75 as high privacy, 0.75 > ⇧ 0.5 as medium 1 1 privacy, and ⇧1 < 0.5 as low privacy. For intra-space privacy, we set a desirable lower-bound at ⇧ 4m. However, we emphasize the great subjectivity of these 2 lower-bounds especially that of the intra-space distance error where a desirable lower- bound highly depends on the scenario or location. For example, for indoor scenarios, a distance error of at least 4m can perhaps mean that the actual user is in a di↵erent room, while, for outdoor scenarios, a distance error of at least 4m is still relatively small. All experimental evaluation was performed in a Unix machine with a 12-core CPU with 2.30GHz base frequency (i.e. Intel® Xeon® Processor E5-2670 v3 30M Cache, 2.30 GHz) and 512GB of RAM. 4. Spatial Privacy Leakage from 3D Mixed reality data 63

4.5 Spatial Inference Success

Now, we present the results of the systematic evaluation. We first present the results of the validation experiments over partially released spaces in terms of the F1 score. Then, we investigate the spatial privacy provided by surface-to-plane generalizations in terms of the two-level attack of inter- (⇧1) and intra-space (⇧2)inference.

4.5.1 Validating inference success over partial releases

Figure 4.5: Overall inference success in terms of F1 score

Figure 4.5 shows the overall inference performance in terms of the overall F1 scores of the two attackers over spatial maps from two MR platforms. Both attackers perform very well and achieve an F1 score > 0.8 at r 2.0m over Holo-Gen query spaces. On the other hand, despite increasing with revealed size, both F1 scores over ARCore query spaces stayed < 0.4. This low performance over ARCore spaces is expected, as the spatial maps revealed by ARCore have observably less structural variations or complexity compared to that of HoloLens, which is further investigated in Section 4.6. Comparing the performance of the two attackers, at r<2.0m the descriptor matching-based NN-matcher performs much better than the deep neural network- based pointnetvlad. But, at r>2.0m, pointnetvlad starts to performs better than NN-matcher by a small margin. This performance di↵erences can be observed for both Holo-Gen and ARCore data. The better performance of the NN-matcher at smaller r can be attributed to how the extracted 3D descriptors can still potentially match small portions of the 3D spaces. On the other hand, since the pointnetvlad is trained at a specific size, i.e. r =2m, it can understandably perform best at sizes equal or greater than the size it was trained on. 4. Spatial Privacy Leakage from 3D Mixed reality data 64

(a) Per-space F1 score of Holo-Gen spaces

(b) Per-space F1 score of ARCore spaces

Figure 4.6: Heatmap of Inference performance Per-space in terms of F1 score with annotated values at r = 1.0, 2.0, 3.0 { }

Per-space inference success Figure 4.6 shows the per-space F1 scores of both attackers over Holo-Gen and ARCore spaces. Figure 4.6a-left shows that the per-space performance of NN-matcher varies over certain spaces than others. For example, spaces Aprtmnt, Wrkst’n, and Hallway have F1 score > 0.95 while the other spaces have F1 score < 0.95 at r =3.0m. Similarly, the per-space F1 scores of pointnetvlad over Holo-Gen spaces shown in Figure 4.6a-right shows that at r 2.0m, the space Kitchen had the lowest F1 score while the rest of the spaces had relatively higher F1 scores. Di↵erently, per-space performance of both attackers over ARCore data presented in Figure 4.6b do not show consistently improving performance with in- creasing r; although, pointnetvlad performance in Figure 4.6b-right did show that space Wrkst’n always had the lowest F1 score while space Aprtmnt always had the 4. Spatial Privacy Leakage from 3D Mixed reality data 65

Figure 4.7: One-time partially released RANSAC-generalized spaces vs varying radii: (top) inter-space and (bottom) intra-space privacy highest F1 score. Nonetheless, it is evident that spatial maps provided by ARCore poses significantly less inference risk than spatial maps revealed by HoloLens. The low spatial inference risk with ARCore can be directly attributed to the significantly low structural complexity of its revealed maps.

4.5.2 Spatial privacy through surface-to-plane generalization

Now, we investigate the privacy provided by surface-to-plane generalizations by com- paring the performance of the two attackers between Holo-Raw and Holo-Gen. Fig. 4.7 shows the average privacy provided by surface planar generalization as we vary the size of released spaces for the one-time release case, and for our two attackers. We use the subscript NN for NN-matcher while PV for pointnetvlad. We also show the privacy values of Raw spaces for comparison. As shown in Fig. 4.7-top, generalization provides improved inter-space privacy for the one-time partial release case. This can be observed in the sharp drop of ⇧1,NN,Raw while ⇧1,NN,Gen slowly drop as we increase the radius. At r =1.0m, there is more than a two-fold di↵erence between ⇧1,NN of

Raw and Gen ; then, at r =1.5m,⇧1,NN,Raw < 0.1while⇧1,NN,Gen > 0.3. On the other hand, pointnetvlad learns to generalize during training, thus Raw and Gen query spaces will result to the similar global descriptors and, hence, same inter-space attack performance ⇧1. Moreover, as shown in Figure 4.7-top and 4.5, NN-matcher attacker performs better than pointnetvlad at r 1.75m for Gen query  4. Spatial Privacy Leakage from 3D Mixed reality data 66 spaces, but pointnetvlad starts to perform better at r 2.0m. Fig. 4.7-bottom shows the average intra-space privacy for the partial query spaces that are successfully identified by the attackers during inter-space inference. Similar with the inter-space performance, pointnetvlad has the same intra-space performance for Raw and Gen spaces. On the other hand, NN-matcher performs slightly better with Raw spaces. Now, comparing the two attackers, at partial releases with r 1.0m,  ⇧2,NN > ⇧2,PV and ⇧2,NN > 2m. This directly translates to the intra-space hypothesis being o↵ by at least 2m in average from the true intra-space location. However, at r>2.0m,⇧2,NN < ⇧2,PV,where⇧2,NN < 2.0m while ⇧2,PV > 4.0m. Evidently, as shown in Fig. 4.7-bottom, there is no intra-space privacy benefit from generalizations regardless of the radius of the revealed space as having ⇧2 < 4.0m is arguably not good enough. On the other hand, as shown in Fig. 4.7-top, an inter- space privacy benefit can be observed at smaller radii, e.g. r<1.5m. However, one-time partial releasing provides very limited data to applications. In reality, other MR applications may desire to receive new, expanded, and/or updated information about the user’s physical space to deliver their MR functionality with immersiveness. Thus, we proceed our attack evaluation over generalized spaces but with successively released spatial information.

Takeaway Surface-to-plane generalizations provide INTER-space privacy benefit for the one-time partial release case but only for small radii. Moreover, it does not pro- vide any significant benefit in terms of INTRA-space privacy as demonstrated by the performance of both attackers.

4.5.3 Utility in terms of QoS

Surface-to-plane generalizations contribute variations to the released point clouds from true spaces. Fig. 4.8 shows the computed average QoS Q based on Eq. 3.5 (with coecients ↵, =0.5) for generalized spaces with varying radii r. We also show the plots of two components used to compute the overall Q: the normal QoS, and the point-wise QoS.5 As shown, the overall Q of surface-to-plane generalized spaces are relatively low at Q<0.15 and only very slowly increases (but not consistently) as we increase the radius. The increase can be attributed more to the normal QoS component which has medium positive correlation, i.e. ⇢ =0.74, with radius r. While the overall Q has a medium positive correlation, i.e. ⇢ =0.71, with r, and, if we average Q over all radii, we get Q¯ =0.120 0.007 which has less than 10% standard deviation. This ± 5A normal QoS of 0.2 means that the resulting normals after generalization are on average about 18 o↵ from the original normals. 4. Spatial Privacy Leakage from 3D Mixed reality data 67

Figure 4.8: QoS Q vs varying radius r. The average Q over all test radii, i.e. 0.25 r 4.0m,is0.120 0.007.   ± slight increase is attributed to how more information, i.e. larger radius, introduces additional errors, albeit minimally, especially on the normals during generalization.

4.6 Detecting Spatial Inference Risk

To investigate the inherent influence of the space to its inferrability, we look towards spatial complexity based on the geometric complexity of the 3D spaces represented by the captured 3D spatial maps. Specifically, we hypothesize that spatial complexity can signify spatial inference risk. In this work, we investigate three potential geometric shape functions as basis for geometric complexity: Euclidean distance of the position of two random 3D points, the dot product of the normal vector of two random 3D points, and the mean cosine similarity of the normal vectors of a neighborhood of 3D points. The distance-based metric is based on the probabilistic D2 shape function [Osada et al., 2002] (we use label D2 and variable d), while the dot product-based metric is also inspired from the probabilistic shape distributions but was applied over vector fields [Ohbuchi et al., 2005, Dinh and Xu, 2008] (we use label V2 and variable v). Both D2 and V2 were originally used as global dissimilarity measures for 3D shape classification. On the other hand, the neighborhood cosine similarity is similar to V2 but is a deterministic version which computes the mean cosine similarity (we use variable ⇠) using the dot product of the normal vectors of all pairs of points in the neighborhood instead of random pairs; thus, ⇠ is a local measure for the neighborhood.

4.6.1 Computing the geometric shape functions

From Figure 3.2, we define the 3D spatial map as a collection of oriented 3D points, i.e. pˆ , pˆ ,... wherep ˆ is a 3D point with an orientation defined by the normal vector { 1 2 } j nˆj. Then, we calculate our shape functions from this collection of points. 4. Spatial Privacy Leakage from 3D Mixed reality data 68

(a) Distribution of D2 distance d.

(b) Distribution of V2 random pair normal vector similarity v.

(c) Distribution of neighborhood normal vector similarity ⇠.

Figure 4.9: Distribution of the per-space similarity measures – d, v, and ⇠ – of our gathered spaces from HoloLens (Holo-Raw and Holo-Gen) and ARCore. (For the distributions of d, and v, we plot the moving average with width 3 for a smoother histogram.) 4. Spatial Privacy Leakage from 3D Mixed reality data 69

Computing distance D2

To compute D2, following [Osada et al., 2002], we pick an L random set of point pairs, pˆj andp ˆk, from the 3D spatial map and compute the Euclidean distance normalized by the mean distance:

pˆj pˆk 2 1 d = k k , where mean d¯= pˆj pˆ 2. d¯ L k kk j,k 8X{ } As a result, d is a dimensionless metric due to the normalization with the mean. It is also unbounded, i.e. can reach + as the size of the 3D map increases; thus, 1 we truncate it at 3.0. We pick a finite set of L = 100, 400, 1600 point pairs as { } it is computationally intensive to compute for all points. Figure 4.9a shows the d distribution of our collected spaces.

Computing pair normal vector similarity V2

Similarly, we pick a set of L = 100, 400, 1600 point pairs from the 3D spatial map { } and compute the dot product v of their normal vectors, i.e. v = nˆ nˆ .Sincethe | j · k| normal vectors are already normalized, v [0, 1]. Figure 4.9b shows the v distribution 2 of our collected spaces.

Computing neighbourhood normal vector similarity

Then, we calculate the ⇠ of a 3D pointp ˆj relative to its point neighborhood. Let its K- NearestNeighbors = pˆ , pˆ ,...,pˆ with their corresponding normals nˆ , nˆ ,...,nˆ . { 1 2 k} { 1 2 k} The ⇠j relative top ˆj is the average of the absolute dot product of point normaln ˆj with its neighborhood normals nˆ , nˆ ,...,nˆ : { 1 2 k} K 1 ⇠ = nˆ nˆ p K | p · k| Xk=1 Similar to the L in the D2 and V2 shape functions, we also pick K = 100, 400, 1600 { } neighbors to compute ⇠. Also, ⇠ [0, 1]. Figure 4.9c shows the neighborhood normal 2 vector similarity ⇠ distribution of our collected spaces.

Comparing the distributions shown in Figure 4.9a-4.9c, the distance-based D2 shape distribution has significant peaks for all spaces whether Holo-Raw, Holo-Gen, and ARCore. On the other hand, the two normal vector-based distributions, i.e. v and ⇠, is a↵ected by the surface-to-plane generalizations in Holo-Gen and the planar surfaces of ARCore which leads to distributions with most peaks at v, ⇠ 1. ! 4. Spatial Privacy Leakage from 3D Mixed reality data 70

(a) Heatmap of per-space spatial complexity d in terms of the likelihood of low distance d: d=0.94 = P (d<0.94)

(b) Heatmap of per-space spatial complexity v in terms of the likelihood of low normal vector (random pair) similarity v: v=0.48 = P (v<0.60)

(c) Heatmap of per-space spatial complexity ⇠ in terms of the likelihood of low normal vector similarity ⇠: ⇠=0.92 = P (⇠<0.92)

Figure 4.10: Heatmap of per-space spatial complexity in terms of our chosen metrics: d, v, and ⇠. 4. Spatial Privacy Leakage from 3D Mixed reality data 71

4.6.2 Computing Spatial Complexity

So far, d, v and ⇠ are only shape distributions. To signify spatial complexity, we hypothesize a spatial complexity metric in terms of these three distribution metrics:

• D2 Complexity: d. We hypothesize that more complex spaces have high like- lihood of having low-valued d distance; thus, we pose a D2-based complexity

metric d as the likelihood of d less than a set threshold dt:

d = P (d

• V2 Complexity: v. Similarly, we hypothesize that more complex spaces have high likelihood of having low-valued v similarity; thus, we pose a V2-based com-

plexity metric v as the likelihood of v less than a set threshold vt:

v = P (v

• Neighborhhood Complexity: ⇠. Lastly, we hypothesize that more complex spaces have high likelihood of having low-valued ⇠ similarity; thus, we pose a neighbor-

hood normal vector-based complexity metric ⇠ as the likelihood of ⇠ less than

a set threshold ⇠t:

⇠ = P (⇠<⇠t)

We use the median of medians from the distributions of spaces gathered both from HoloLens and ARCore which gives the following threshold values: dt =0.94, vt =0.48, and ⇠t =0.92. Figure 4.10a-4.10c shows the heatmap of these complexity measures computed from the distributions shown in Figure 4.9a-4.9c. For all three shape distributions, we hypothesize that higher means higher complexity.

4.6.3 Spatial Complexity vs Inference Success

The heatmaps in Figure 4.10 further shows the e↵ects of generalizations to the metric. Interestingly, comparing the distance-based d heatmaps of Holo-Raw and sample Holo-Gen shown in Figure 4.10a, the generalization increases the d values.

Same can be said for the ARCore spaces which has higher d than Holo-Raw.Thisis counter to our hypothesis that generalization reduces the complexity, i.e. less complex spaces have less low-valued D2 distance d and, hence, smaller d. On the other hand, the trends for the two normal vector-based metrics are very similar. The generalization has reduced the complexity for both v and ⇠ as shown in Figure 4.10b and 4.10c. Moreover, a visual comparison of Figure 4.10b-4.10c and Figure 4.6a-4.6b suggests a correlation between spatial complexity in terms of the 4. Spatial Privacy Leakage from 3D Mixed reality data 72

Table 4.1: Correlation coecient of the three metrics and overall F1 score (with varying metric parameters (neighbors or pairs) and query space size r)

NN-matcher over Holo-Gen spaces neighbors pairs using d distance pairs using v similarity using ⇠ similarity r =1.0 2.0 3.0 r =1.0 2.0 3.0 r =1.0 2.0 3.0 100 -0.292 -0.159 -0.133 100 0.856 0.675 0.612 100 0.485 0.754 0.673 400 -0.191 0.322 -0.183 400 0.771 0.764 0.626 400 0.732 0.885 0.810 1600 -0.052 -0.004 0.070 1600 0.689 0.593 0.425 1600 0.825 0.882 0.800

pointnetvlad over Holo-Gen spaces neighbors pairs using d distance pairs using v similarity using ⇠ similarity r =1.0 2.0 3.0 r =1.0 2.0 3.0 r =1.0 2.0 3.0 100 -0.711 -0.096 0.046 100 -0.169 0.238 0.072 100 0.344 0.505 -0.287 400 -0.486 0.518 0.453 400 -0.02 0.362 0.142 400 0.252 0.493 -0.241 1600 -0.639 -0.254 -0.176 1600 -0.154 0.25 -0.096 1600 0.076 0.407 -0.216 normal vector-based metrics and inference risk especially with an NN-matcher attacker. Thus, we hypothesize that spaces with low normal vector similarity – whether of the random pair-wise v or the deterministic neighborhood ⇠ – have a higher inference risk. We already know that ARCore spaces poses the lowest risk due to its relatively low inferrability (as shown in Figure 4.5), and, thus, we can derive that ARCore spaces has the lowest complexity. In the succeeding discussion, we will focus on the higher complexity query spaces, i.e. Holo-Gen, for the correlation comparison.

Correlation of spatial inference and complexity

Table 4.1 shows the correlation coecient ⇢ of the three metrics (with varying parameters, i.e. L, K = 100, 400, 1600 ) with the overall F1 score as we increase the { } size of the revealed space r. With either attackers, no consistently strong correlation can be observed between the distance-based spatial complexity d=0.94 and inference success apart from the dominantly negative correlation. This dominantly negative weak correlation can be observed when we compare the d heatmap in Figure 4.10a with that of the per-space inference heatmaps Figure 4.6a-4.6b. For example, the

Holo-Gen-Aprtmnt has lower d but higher F1 score, while Holo-Gen-Drvwy has higher

d but lower F1 score.

Correlation of inference and the normal vector-based complexity v and

⇠ Di↵erently, with the NN-matcher attacker, both the normal vector-based spatial complexity v=0.48 and ⇠=0.92 have medium to strong positive correlations with in- 4. Spatial Privacy Leakage from 3D Mixed reality data 73 ference success with coecients ranging 0.425 <⇢<0.856 and 0.485 <⇢<0.885, respectively. Thus, a more similar space (i.e. with ⇠>0.92) is less likely to be inferred and with stronger correlation at larger spaces. We also emphasize that the highest correlation coecient values for an NN-matcher regardless of r is with computations using a neighborhood size K 400 (with ⇢ values marginally decreasing with K = 1600). These trends suggests that spatial complexity strongly influences the inference success of the intrinsic descriptor-based NN-matcher.

On the other hand, with a pointnetvlad attacker, the correlation of v=0.48 and

⇠=0.92 with inference success are at their highest – albeit weak to medium positive correlation – for both at r =2.0m. This can be attributed to the F1 score of pointnetvlad being low and, then, high at r =1.0m and 3.0m,respectively(see Figure 4.5). Due to the deep learning nature of pointnetvlad, the mislabels at r = 1.0m or the almost correct labels at r =3.0m results to weak correlations regardless of complexity. So far, these correlations are based on global , i.e. complexity computed from the whole-of-space distributions shown in Figure 4.9. Thus, to further confirm these observations, we need to look into the correlation of local complexity, i.e. per partial space, with inference success. That is, we compute the local for every query partial space irrespective of the larger space they were extracted from.

4.6.4 Local Complexity vs Inference Success

Now, we treat the spatial inference success event as a random variable C and compute the apriori, P ( C), and posteriori, P (C ), distributions of the complexity as | | shown in Figure 4.11.6 For this investigation, we used a fixed query size of r =2.0m and fixed metric parameters of L, K = 400.

Local d complexity Similar to our earlier observations, there is minimal di↵er- ence between the average local d values of correctly and incorrectly labelled partial spaces (di↵erence of 0.006 for NN-matcher as shown in Figure 4.11a and 0.001 for pointnetvlad as shown in Figure 4.11b). The posterior distributions P (C )fur- | d ther visualize the earlier observations of inverse correlation between d and success. Specially for the NN-matcher attacker shown in Figure 4.11c, we can see that par- tial spaces with low d, e.g. Holo-Gen-Aprtmnt in Figure 4.10a, is more likely to be correctly inferred than with high d, e.g. Holo-Gen-Drvwy in Figure 4.10a.

6The empty vertical bars in the posterior distributions indicate that our sample partial spaces do not have those values. 4. Spatial Privacy Leakage from 3D Mixed reality data 74

(a) P ( C) for NN-matcher success and failure d=0.94|

(b) P ( C) for pointnetvlad success and failure d=0.94|

(c) Likelihood of NN-matcher success given local d=0.94

(d) Likelihood of pointnetvlad success given local d=0.94

Figure 4.11: (a-b) The aprioridistribution of the complexity value d given inference success C,i.e.P ( C), and (c-d) the posteriori likelihood of inference d| success C given local ,i.e.P (C ) d | d 4. Spatial Privacy Leakage from 3D Mixed reality data 75

(e) P ( C) for NN-matcher success and failure v=0.48|

(f) P ( C) for pointnetvlad success and failure v=0.48|

(g) Likelihood of NN-matcher success given local v=0.48

(h) Likelihood of pointnetvlad success given local v=0.48

Figure 4.11: (Continuation) (e-f) The aprioridistribution of the complexity value given inference success C,i.e.P ( C), and (g-h) the posteriori likelihood of v v| inference success C given local ,i.e.P (C ) v | v 4. Spatial Privacy Leakage from 3D Mixed reality data 76

(i) P ( C) for NN-matcher success and failure ⇠=0.92|

(j) P ( C) for pointnetvlad success and failure ⇠=0.92|

(k) Likelihood of NN-matcher success given local ⇠=0.92

(l) Likelihood of pointnetvlad success given local ⇠=0.92

Figure 4.11: (Continuation) (i-j) The aprioridistribution of the complexity value ⇠ given inference success C,i.e.P ( C), and (k-l) the posteriori likelihood of ⇠| inference success C given local ,i.e.P (C ) ⇠ | ⇠ 4. Spatial Privacy Leakage from 3D Mixed reality data 77

Local v complexity With the NN-matcher attacker shown in Figure 4.11e, the av- erage v for correctly and incorrectly labelled spaces is 0.426 and 0.339, respectively.

This rearms our earlier observations about the correlation of v and inference suc- cess. The posterior distribution P (C ) shown in Figure 4.11g further visualizes the | v earlier observations of direct correlation between v and success. Conversely, with the pointnetvlad attacker, the incorrectly labelled spaces had a minimally higher average (di↵erence of 0.038 as shown in Figure 4.11f). v

Local ⇠ complexity Similar to v, the average ⇠ for correctly and incorrectly labelled spaces by an NN-matcher attacker shown in Figure 4.11i is 0.640 and 0.471, respectively. Thus, the correlation of ⇠ and inference success is indeed stronger than

v which is also based on normal vector dot products. The posterior distribution shown in Figure 4.11k further arms the strong correlation. Also, similar to v,a minimally higher average ⇠ for incorrectly labelled spaces can be observed (di↵erence of 0.010) with the pointnetvlad attacker.

Overall, as observed in Table 4.1, the pointnetvlad success is only weakly influ- enced by spatial complexity. This can be attributed to how pointnetvlad is reliant on the positions of the 3D points instead of their orientation which is defined by their normal vectors; thus, it is not strongly a↵ected especially by normal vector similarity. Conversely, the intrinsic descriptor computation for NN-matcher is reliant on the nor- mal vectors – specifically, the spin image description algorithm relies on the normal vector as the axis of the spin (see Section 4.2.3); thus, NN-matcher success is strongly correlated to the normal vector-based measures of v and ⇠ while the neighborhood cal- culated ⇠ has a marginally stronger correlation to inference success. However, ⇠ com- putation is more computationally expensive due to the necessary nearest-neighbour computation (see Section 4.6.1); thus, if computational eciency is required, the ran- dom pair-wise dot products of v should suce. Therefore, Figure 4.11, in conjunction with the performance shown in Figure B.3, shows that both pointnetvlad and NN-matcher are e↵ective attackers, but pointnet- vlad is less a↵ected by a space’s spatial complexity in terms of normal vector simi- larity which makes it a stronger attacker despite surface-to-plane generalization-based countermeasures. Nonetheless, as demonstrated in [de Guzman et al., 2019b, de Guz- man et al., 2020b], surface-to-plane generalizations still provide sanitized protection for sensitive information that are inadvertently captured by MR platforms while still providing adequate 3D data utility; thus, despite generalization not a reliable defense against spatial inference attacks, it is still arguably necessary as a sanitization ap- proach. 4. Spatial Privacy Leakage from 3D Mixed reality data 78

Summary

In this chapter, we experimentally demonstrate the spatial privacy risks – specifically, on the risk of spatial inference where an adversary will try to infer the identity of the space – associated with two common MR platforms: Microsoft HoloLens, and Android with Google ARCore. Then, aside from exposing the risks, we investigate the viability of geometry-based spatial complexity metrics as an indicator of spatial privacy risk. Our investigation shows that spaces captured from HoloLens with high normal vector similarity are less likely to be identified; otherwise, spaces with less similarity are more likely to be identified. Moreover, a deep-learning based attacker is less a↵ected by such complexity which poses stronger risks. Consequently, the spatial maps revealed by ARCore which have high similarity due to these maps being inherently planar are also less likely to be identified; thus, posing less risk than HoloLens spaces. Furthermore, our investigation rearms outputs of current related work that utilize naive surface-to-plane generalizations are not sucient defenses against spatial infer- ence attacks. Nevertheless, the normal vector-based spatial complexity measure, i.e.

v and ⇠, can be leveraged in tuning controlled release of generalized planes. There- fore, can be utilized as a transformation parameter to further reduce the inference risk even as more of the space is revealed. Future work may include an investigation on ’s e↵ectiveness as such. In the subsequent chapters (Chapter 5 & 6), we present and evaluate a data-centric (i.e. leveraging spatial data manipulation) and data flow-centric (i.e. demonstrating visual information access control) measures for MR privacy protection. Chapter 5

Conservative Plane Releasing for Spatial Privacy Protection

3D data: Attacks, Risks, and Protection As revealed in Chapter 2, very few recent works have started to look into the actual risks brought about by indefinite access to 3D data. In the preceding chapter, we presented evidence of how an adversary can easily infer spaces from captured 3D point cloud data from Microsoft Hololens [de Guzman et al., 2019b]; and how, even with spatial generalization (i.e. the 3D space is generalized in to a set of planes), spatial inference is still possible at a significant success rate. Moreover, we presented a heuristic measure for spatial privacy risk. Another recent work employed machine learning to reveal original scenes from 3D point cloud data with additional visual information [Pittaluga et al., 2019]. While a concurrent work focused on a privacy-preserving method of pose estimation to counter the scene revelation [Speciale et al., 2019]: 3D “line” clouds are used during pose estimation to obfuscate 3D structural, i.e. geometrical, information; however, this approach only addresses the pose estimation functionality and does not present the viability for surface or object detection which is necessary for a virtual object to be rendered or “anchored” onto. Thus, it is still necessary for 3D point cloud data to be exposed but with privacy-preserving transformations to hide sensitive content and prevent spatial recognition. In this work, we augment the currently inadequate spatial generalizations (as shown in the previous chapter) with conservative releasing as a stronger countermeasure against the NNN-matcher attacker. 1

1This chapter contains results from a work under review [de Guzman et al., 2020b].

79 5. Conservative Plane Releasing for Spatial Privacy Protection 80

(a) Reference generalized partial spaces (b) Conservatively released planes

Figure 5.1: Example of conservative plane releasing.

5.1 Conservative Plane Releasing

In the previous chapter, we already employed two baseline protection measures (Sec- tion 4.3): namely, partial releasing, and surface-to-plane generalizations. And the evaluation has shown that generalizations are inadequate forms of protection [de Guz- man et al., 2019b]; thus, we augment the protection further with conservative re- leasing of the plane generalizations. Specifically, we limit the maximum number – we call max planes – of RANSAC-generalized planes that are released. We take ad- vantage of how RANSAC tends to release the largest planes first (see Appendix C). Thus, even with low max planes, a conservatively released space will most likely have released planes that will generally capture the overall spatial structure of the space and maintain good data utility. Figure 5.1a shows a visual example of a space with completely released RANSAC-generalized planes, while Figure 5.1b shows how we can limit max planes, say, to 3 maximum planes.

5.2 Extending the Attack Scenario

In Chapter 4, we utilised one-time released partial spaces for evaluation. We follow the same evaluation setup described in Section 4.4, but we extend it to the realistic case of successive releasing and investigate the viability of conservative plane releasing in the successive setup. Likewise, we use the two-level, i.e. ⇧1 and ⇧2, privacy measures as metrics.

Successive release of partial spaces. To demonstrate the case when users are moving around and their physical space is gradually revealed, we included a validation setup that successively releases partial spaces. Following the described generalization 5. Conservative Plane Releasing for Spatial Privacy Protection 81 strategy in Section 4.3, we perform successive releasing of partial spaces for collected raw point cloud, and for RANSAC generalized versions. We do 100 random sample partial iterations, and produce 100 releases per random sample. We do this for radii r = 0.5, 1.0, 2.0 . { } As the released points are accumulated, we perform generalization over the ac- cumulated points. To achieve consistency on the generalizations, we implemented a plane subsumption handler for generalizing successively released points. Specifically, new points will be checked against existing planes (produced by previous releases) if they can be subsumed by the existing ones instead of performing RANSAC general- ization for the entire accumulated points; RANSAC will only be performed over the remaining [ungeneralized] points. We feed the same set of successive spaces to both the NN-matcher, and pointnetvlad. The resulting stronger attacker will be used for the subsequent evaluation with con- servative plane releasing.

Conservative plane releasing over successive partial releases. In addition to the previous successive setup, we employ conservative releasing which limits the num- ber of planes a surface-to-plane generalization produces as a form of spatial privacy countermeasure. For our investigation, we apply conservative releasing over the same set of successively released partial spaces with subsumption applied during generaliza- tion. But, for every sample release, we investigate the impact of limiting the maximum number of planes so-called max planes that are generated by the RANSAC plane gen- eralization process. We performed controlled plane releasing with max planes in steps of 2 from 1 to 29.

5.3 Inference Success with Successive Releasing

Fig. 5.2 shows the inference performance in terms of inter- (⇧1) and intra-space (⇧2) privacy as we successively release generalized (Holo-Gen) partial spaces. Regardless of the size of the released partial spaces and of the attacker used, as we either increase the size of a partial space, or reveal more portions and/or planes of the space, unsur- prisingly, the inter-space privacy (Fig. 5.2-top) decreases. For r =0.5m, both ⇧1,NN and ⇧1,PV slowly drops but with ⇧1,NN dropping faster. If we double the size of the releases to r =1.0m, both ⇧1 drops quickly with ⇧1,NN still dropping faster. Thus, in terms of inter-space privacy, NN-matcher still performs better than the deep neural network-based pointnetvlad even in the successive case. Despite the released spaces having radius r 2.0m, it is important to note that,  as we increase the number of releases, the size of the revealed space increases (>> 5. Conservative Plane Releasing for Spatial Privacy Protection 82

Figure 5.2: Successively released generalized partial spaces: (top) inter-space and (bottom) intra-space privacy

2.0m) as we accumulate the released spaces. Now, despite the accumulated spaces in the successive case being larger than 2.0m after a few releases, pointnetvlad’s performance su↵ers from its submap division as it has to divide the revealed space to

2m-submaps. We did observe that ⇧1,P V drops to < 0.1 in Fig. 4.7-top, but that is due to how the partial spaces are generally enclosed in spheres. Whereas in the successive case, the accumulated spaces will now be irregularly shaped. Fig. 5.1 shows how it looks with two releases. Thus, for accumulated spaces larger than 2.0m, the submap generator will produce more than 1 submap to cover the entire query space and, then, perform the inference on the submaps. Some of these submaps, say the edge submaps, will lead to erroneous inter-space labels. On the other hand, similar to the one-time partial case, intra-space performance of pointnetvlad is fairly sustained–owing to the discriminative power of the NetVLAD layer as long as the inter-space label is correct. At release size of r =0.5m,theNN-matcher’s intra-space distance error initially drops but slowly increases as we reveal more of the space. As shown in Fig. 5.2- bottom, at the first release, ⇧2,NN is slightly high with an error of > 6m, but, after

20 or more releases, the ⇧2 drops < 3m. But, as we reveal more of the space, the ⇧2 seems to improve: for r =1.0m,⇧2 approximately increases to almost 4m again as we reach 60 or more releases. Di↵erently, at release sizes r 1.0m,⇧2,NN is initially low and proceeds to increase with the number of releases. And, regardless of the release size, after 20 releases, all ⇧2,NN follows a similar increasing trend. This variation in 5. Conservative Plane Releasing for Spatial Privacy Protection 83

the initial ⇧2,NN values can be attributed to how artificial, i.e. man-made, spaces have repeating similar structures of small sizes which also produces very similar descriptors. As a result, successive releases can contain structures on a di↵erent intra-space location but is similar to structures that were previously released. A good example of this are the workstations in the same company or institution which are all designed similarly. However, this improvement in intra-space privacy is not necessarily an improve- ment as the revealed space grows in size which leads to a possible overlap of the hypothesis intra-space to the true space. For example, after a number of releases the width of the accumulated space is 4.0m,a⇧2,NN < 4.0m means that the adversary was still able to produce a hypothesis space that overlaps with or, even, within the true space. It is not the intention of this work to identify the strogest attacker; nevertheless, we will use NN-matcher in the succeeding evaluation as it outperforms pointnetvlad, albeit slightly, in both inter- and intra-space inference in the successive case.

Takeaway The evaluation over successive releases highlights and reinforces the spa- tial privacy risks as demonstrated by the spatial inference power of both an intrinsic descriptor-based and deep neural network-based attacker in both INTER- and INTRA- space inference even after employing surface-to-plane generalizations.

5.4 Spatial Privacy with Conservative Releasing

Now, we explore the privacy approach of combining conservative releasing with sur- face planar generalizations to provide spatial privacy in MR as more of the space is revealed. Specifically, we control the maximum number of released planes after the surface-to-plane generalization process. Fig. 5.3 shows the inter-space privacy values as we increase the maximum number of planes, i.e. max planes, and the number of releases. We can see from the 3D plot for r =0.5m in Fig. 5.3a that the inter-space privacy gradually decreases as we reveal more of the space (more releases) and in- crease max planes. For ⇧1(r =0.5m) to go below 0.5, both the number of releases and max planes needs to be high. This is more evident through the unevenly quan- tized heatmap shown in Fig. 5.3b which shows that we will get ⇧ 0.5 for only a few 1  occasions in the provided range of values for number of releases and max planes.We also show the heatmap version for the successive case with no conservative releasing, i.e. max planes= , at the top of Fig. 5.3b for comparison. And as shown, for suc- 1 cessively released partial spaces with radius r =0.5m, the average inter-space privacy

⇧1 drops below 0.5 after 31 or more releases. But, with conservative releasing, we can release up to 51 successive partial releases with max planes 23 for ⇧ 0.5.  1 Furthermore, if we average ⇧1(r =0.5m) over all releases and observe it relative to 5. Conservative Plane Releasing for Spatial Privacy Protection 84

(a) 3D plot (r =0.5) (b) Heatmap (r =0.5)

(c) Heatmap (r =1.0) (d) Heatmap (r =1.0)

(e) Average ⇧1 over all releases

Figure 5.3: Average INTER-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) 5. Conservative Plane Releasing for Spatial Privacy Protection 85

(a) 3D plot (r =0.5) (b) Heatmap (r =0.5)

(c) 3D plot (r =1.0) (d) Heatmap (r =1.0)]

(e) Average ⇧2 over all releases

(f) Average ⇧2 over varying number of planes

Figure 5.4: Average INTRA-space privacy of conservatively released planes over successive releasing (using NN-matcher attacker) 5. Conservative Plane Releasing for Spatial Privacy Protection 86 max planes, average ⇧ (r =0.5m) does not go below 0.5ifmax planes 29 as shown 1  in Fig. 5.3e.

On the other hand, for r =1.0m, Figs. 5.3c and 5.3d shows that the decrease in ⇧1 is primarily due to max planes. Aside from the few occasions shown in Fig. 5.3d that ⇧ 0.5 with lower number of releases, we can observe that generally ⇧ < 0.5 for 1 1 max planes 13. We also show the successive case with max planes= for r =1.0m 1 which shows that right after the first release we have ⇧1 < 0.5. Furthermore, as shown in Fig. 5.3e, the average ⇧1(r =1.0m) over all releases goes below 0.5ifthe max planes 13. Thus, for r 1.0, there can be at most 11 planes released regardless  of the number of successive partial releases so that ⇧ (r =1.0m) 0.5. Arguably, 1 having a maximum limit of 11 planes is already adequate for most MR functionalities. We also present the intra-space privacy with conservative releasing in Fig. 5.4 with quantized heatmaps. Figs. 5.4a-5.4b and 5.4c-5.4d shows di↵erences in ⇧2 for r =0.5 and 1.0m, but, if we are to average it over all releases as shown in Fig. 5.4e, the plots of ⇧ (r =0.5m) and ⇧ (r =1.0m) overlap and plateaus at roughly 3.085m.The 2 2 ⇠ di↵erence being only at max planes 3where⇧ (r =1.0m) drops faster ⇧ (r =0.5m)  2 2 from max planes= 1 to 3 before eventually plateauing at similar levels.

On the other hand, if we are to average ⇧2 over max planes, we see that Fig. 5.4f reflects the behavior of ⇧2,NN shown in Fig. 5.2-bottom . Both figures show increasing

⇧2 as we increase the number of releases. In Fig. 5.4f, it is more evident for ⇧2(r =1.0) which starts to increase immediately after 1 release, while ⇧2(r =0.5) increases only after 31 releases.

Takeaway Conservative releasing of generalized planes provides additional protec- tion by delaying the success of spatial inference attackers. For example, at r =0.5,if we are to release up to 17 planes, an adversary will misidentify a space at least half of the time regardless of the number of releases, and will be o↵ by at least 3.0m in INTRA-space inference for these infrequent occasions of INTER-space success. We can get similar privacy for r =1.0 with up to 11 plane releases.

5.4.1 Utility with conservative releasing

Plane-fitting generalizations contribute variations to the released point clouds from true spaces. Fig. 5.5 shows the computed average QoS Q based on Eq. 3.5 (with coecients ↵, =0.5) for conservatively released generalized planes over successively released partial spaces. We can see that Q decreases as we increase max planes. Conse- quently, if we fix max planes, more successive partial releases increases Q.Intuitively, more information should, i.e. more releases, should decrease Q but becasue we are 5. Conservative Plane Releasing for Spatial Privacy Protection 87

(a) 3D plot (r =0.5) (b) Heatmap (r =0.5)

(c) 3D plot (r =1.0) (d) Heatmap (r =1.0)

Figure 5.5: Average QoS Q of conservatively released planes over successive releasing also limiting the maximum allowable number of planes to be released, this introduces errors and thus increasing Q. Let’s look at the sample shown earlier in Fig. 5.1. As shown in Fig. 5.1b, if we set max planes = 3, this means that the succeeding releases will be forced on to these previously released 3 planes. Other structures present on these succeeding releases that cannot be subsumed by these planes will e↵ectively not be released, and, thus, contribute to the QoS calculation. Example of these other structures are the planes shown in Figure 5.1a but not present in Fig. 5.1b. Furthermore, the increasing Q with increasing radius is also corroborated by Fig. 5.5 which shows that the values for r =0.5 are less than or equal to that of r =1.0 for the same number of releases or max planes.

Takeaway To achieve better QoS with generalized spaces, smaller size or radius is preferred. And, if conservative releasing is applied, good Q,sayQ<0.2,canbe 5. Conservative Plane Releasing for Spatial Privacy Protection 88

(a) r =0.5 (b) r =1.0

Figure 5.6: Intersection map of Q 0.2 and ⇧ 0.5  1 achieved with lower number of releases.

5.4.2 Utility vs Privacy

Fig. 5.6 shows an intersection map for acceptable Q and ⇧ (light shaded), i.e. Q 0.2 1  and ⇧ 0.5. For r =0.5m, as shown in Fig.5.6a, the good intersection regions are 1 primarily dictated by good Q, i.e. up to 51 releases and even up to 27 max planes. Contrariwise, for r =1.0m, as shown in Fig.5.6b, the good intersection regions are very scarce and are primarily dictated by good ⇧1; specifically, up to 13 max planes and up to 26 releases but lower max planes worsen Q. Thus, for r =1.0, if we prioritize privacy over utility/QoS, then we can stick to max planes 11 but no limit in the  number of successive releases. But, if we want better Q, we can reduce the size of the space, say, r =0.5m, and enjoy more freedom in the number of releases.

Functional Utility So far, we have only been focused on data utility. Specifically, we assume that as long as the revealed generalized space has a low Q, we hypothesize that the same original functionality can still be provided with the user perceiving minimal to no di↵erence. For a great deal of MR applications requiring “anchoring” to surfaces, as long as the plane generalizations are kept aligned with their corresponding true surfaces, the user may indeed perceive minimal to no di↵erence. For the case when a specific target, e.g. a 2D image marker or 3D object, is the desired anchor, the position (and orientation) of this target can be provided as either a small plane generalization or as a point anchor which can be used together with the other plane generalizations. However, it would also be worthwhile as future work to implement the proposed mechanism for protection over various MR applications and use cases, and measure actual user satisfaction or Quality-of-Experience (QoE). 5. Conservative Plane Releasing for Spatial Privacy Protection 89

5.4.3 Protection Properties of Conservative Releasing

Similar to other data manipulation approaches (discussed in Section 2.4.2) and as a data-centric measure, conservative plane releasing provides the following protection properties of confidentiality, anonymity, unlinkability, undetectability, and plausible deniability. Table 2.2 shows how SafeMR compares to other approaches discussed in the literature.

Summary

In this chapter, we presented conservative releasing as a data-centric measure for privacy preservation. We have demonstrated that we can enhance privacy protection with plane generalizations by conservatively releasing the generalized planes provided to applications. Our experimental investigation over accumulated data from successive releases (emulating user movement) shows that we can reveal up to 11 planes and avoid inter-space inference at least for half of the time for large enough revealed spaces, i.e. r 1.0. Specifically, with such very conservative releases, the success rate of any  (inter-space) inferrer is no better than a random guess. And, for the occasions that the adversary correctly recognizes the space, the intra-space location can be o↵ by at least 3 meters. Moreover, we quantified the privacy improvement in terms of both

⇧1 and ⇧2 by reducing the size of the partial spaces to be revealed. Consequently, in terms of data utility Q, a smaller size is preferred to provide good (i.e. low) Q. Overall, conservative releasing is a viable solution in protecting users with mea- surable privacy and utility guarantees or thresholds. Plane generalization are already (or can potentially easily be) implemented in most existing MR platforms. Thus, per- haps, what remains is the implementation of conservative plane releasing to provide protection in these MR platforms. In our future work, we aim to develop a library for mobile devices that includes the mechanisms proposed in this work which facilitates MR app designers to develop privacy-aware MR applications. In the next chapter, we present a visual access control mechanism in the form of object-level abstraction as a data flow-centric privacy measure which can potentially be employed in conjunction with a data-centric measure such as conservative releasing. Chapter 6

SafeMR: Object-level Abstraction

In this chapter, we present a visual access control mechanism in the form of object- level abstraction.1 Using readily-available object detection algorithms, we are able to demonstrate a proof-of-concept object-level abstraction for fine-grained access control in a mobile device. Furthermore, aside from the inherent confidentiality and content awareness guarantee of abstraction, reduction in execution times from visual processing resource sharing is another consequential benefit of abstraction without any energy consumption impact.

Motivating Example Say we have an AR/MR gaming application similar to Poke- mon Go that requires location and visual information for the game. Physical objects in the environment, which can be used as visual tags to represent game resources, e.g. monster food, are captured and detected using the MR device. However, during vi- sual capture, the gaming app can also capture irrelevant and latent visual information which can potentially be sensitive personal information, such as documents, photos, or medication. Whilst there has been a large body of work which has highlighted the implications of the leakage of personal information [Seneviratne et al., 2014,Ren et al., 2016], there have only been a few preliminary work that attempt to address the privacy issues of MR systems as discussed in Chapter 2. Thus, we designed and developed an object-level abstraction system that (1) provides visual information protection while (2) reduc- ing visual processing latency by taking advantage of concurrent processing through reusable object-level abstractions. Our system follows the least privilege paradigm

1The majority of this chapter combines our work presented and demonstrated in IEEE LCN 2019 [de Guzman et al., 2019c, de Guzman et al., 2019a].

90 6. SafeMR: Object-level Abstraction 91

Target is centroid within view bounding box

a b c d e

Figure 6.1: Diminishing information: a) the raw visual capture; b) the target is cropped out but still with complete visual information of the target; c) only the bounding box of the target is exposed; d) only the centroid of the target is exposed; and e) only the binary presence, whether the target is within view or not, is exposed. where applications are only provided with the minimum amount of information nec- essary for their intended functionality and/or permitted by the user as illustrated in Fig. 6.1. It shows a cascading and diminishing visual information representation which presents the di↵erent levels of privilege, and, in turn, the amount of information an application have access to.

6.1 Visual Processing and Threat Model

First, we look into the general processing employed in mobile vision applications, particularly on MR applications, and then present the potential threats they pose.

Conventional Visual Processing. As discussed in Chapter 1 and shown in Figures 1.2 and 2.1b, there are three essential steps in visual processing for MR applications: (1) detection includes the collection of input data (primarily visual) and processing them to detect targets or markers in the environment, which we generally refer to as object/s of interest (OoI); (2) transformation includes the process of identifying the appropriate outputs (can be in the form of visual augmentations, actuations, music, and so on) given the detected OoI; lastly, (3) rendering is the delivery of these outputs to the user-observable (physical or real) space or interface (e.g. visual or augment displays, motors, sound, and so on). In a conventional setup, applications have direct access to the raw visual infor- mation captured by the device through camera and sensor APIs. Each application performs their own detection and transformation (see sample exploded diagram in Figure 6.2 labelled conventional processing) which may be assisted by remote vision processing servers on the cloud, and, then, ship back the necessary augmentations for rendering. Once the applications have been given access, the user may no longer have control of how the raw visual information will be used. These sensing channels are potential threat vectors for latent and nascent security vulnerabilities and privacy 6. SafeMR: Object-level Abstraction 92

Trusted Untrusted

within the mobile device, i.e. smartphone

Intermediary object-level abstraction App 1

Camera User Privacy Preferences Detection App 2 Cloud-assisted SafeMR Visual processing Camera / / Camera

Vision API Vision Transformation ... Conventional processing App n TensorFlow Detection e.g. OpenCV, OpenCV, e.g. Object-level Object-level abstractions Display / / Display

Display ARCore e.g. API, Reality Mixed Rendering API Rendering

abstractions Object-level abstractions interfacing with the

Obj1.name = laptop App 1 Sensitive = True Reqst’d objects from Detection Obj2.name = cell App 2 phone reqst’d objects Sensitive = True to Display ......

Objn.name = cup App n Sensitive = False reqst’d objects

Figure 6.2: Proposed visual processing architecture with object-level abstraction SafeMR inserted as an intermediary layer between the core APIs and the third-party applications. risks, if no safeguards are put in place.

Adversary Model. In this chapter, we assume that the adversary can either be internal (i.e. an MR application or service which allows it to access visual data released by the MR platform directly) or external (i.e. an external adversary that receives MR data from a non-adversarial application). The application or service, then, ships virtual objects back to the MR platform for rendering.

6.2 SafeMR: Object-level Abstraction

6.2.1 System Architecture

SafeMR is a visual information protection system. As mentioned earlier, it follows from the same fundamental idea of least privilege. As shown in Fig. 6.2, it is inserted as an intermediary layer, between the trusted device APIs (e.g. ARCore) and the third party applications to provide information access to visual data via object-level abstrac- tions by detecting the objects and exposing them as abstractions to the applications. Applications’ access to the object abstractions and the privilege level is specified by the user privacy preferences. 6. SafeMR: Object-level Abstraction 93

6.2.2 System Properties and Functionalities

In contrast to other general privacy enhancing techniques (or PETs), the proposed abstraction based protection is a user-mediated and an active form of protection. Other PETs based on k-anonymity and di↵erential privacy including the conservative releasing approach presented in Chapter 5 are usually application-initiated, and post- collection-implemented (i.e. during transformation) which results in a more passive- reactive form of protection from the user’s perspective. In contrast, SafeMR is applied immediately after detection–from raw data to OoI-detection. This allows SafeMR to provide the following key functionalities to the MR system:

1. fine-grained visual information access control;

2. visual information aggregation or summarization; and

3. visual processing resource sharing.

The aggregation or summarization functionality is inherently provided by informa- tion abstraction where we ‘simplify’ data, e.g. bounding area or box instead of raw RGB pixels of the object-of-interest. In the rest of the apper, eciency and accuracy of the aggregation or summarization will not be considered and we will rely on the accuracy of the o↵-the-shelf algorithms we use. Fine-grained permissions and access control. Within the SafeMR platform, access control mechanisms governed by user-defined privacy preferences can be implemented to selectively determine which applications have access to detected objects; thus, ap- plications will have limited access, or access to limited information after object-level detection. A similar fine-grained access control mechanism can be implemented on the output or rendering stage where user-specified output policies controls how and what can be rendered.

6.2.3 Implementation

We developed an Android application to demonstrate the viability of SafeMR. The ap- plication utilizes OpenCV’s Scale-invariant Feature Transforms (or SIFT) [Lowe, 2004] and Oriented Fast and Rotated BRIEF (or ORB) [Rublee et al., 2011] implementa- tions for specific object detection.2 In addition, we also employed TensorFlow’s Object Detection API (TF-OD) [Huang et al., 2017] for the generalized object detection.3 For

2A specific target, i.e. a person’s face, a painting, and so on, is desired to be detected; the reference features describes the specific pre-determined target. 3A general set of objects, i.e. chairs, tables, humans, and so on, is desired to be detected. Machine learning is the dominant technique for this purpose. A set of training images for each object type is presented to the algorithm to produce a generalized model that can be used for object detection. 6. SafeMR: Object-level Abstraction 94

E D C B A

privacy slider

MR DEMO INITIALIZE

MR DEMO SCENARIO 1

MR DEMO SCENARIO 2

(a) Scene (b) Defining object (c) Privilege Level B (d) Privilege Level D sensitivity

Figure 6.3: SafeMR demo showing di↵erent privilege levels these algorithms, we only utilize their vanilla, non-optimised implementations. The implementation continuously captures images with variable resolution and switching between running with and without SafeMR.Theclientdeviceusedintheexperiments is a Google Pixel 2 with up to 2.35GHz octa-core processor and 2GB of memory.

Demonstrating SafeMR. Sample screenshots from our demo application are shown in Fig. 6.3. As shown in Fig. 6.3b, object sensitivity can be specified by toggling the detected objects from a list populated by an initial detection process (seen as the top button on the menu with label “MR DEMO INITIALIZE”). This shows object-level permissions and access control. Likewise, we implemented a “privacy slider” to allow the users to change the privilege levels as defined in Fig. 6.1. Figs. 6.3c and 6.3d illustrates the use of the slider to set two di↵erent privilege levels, i.e. level B and D, respectively.4

6.3 Performance Evaluation

In this section, we evaluate SafeMR to validate its functionality as an enabler of object- level visual information access control and as a resource sharing mechanism.

4We will be submitting a separate demo paper to LCN Demos that details the implementation and testing of the proposed SafeMR abstraction with Google ARCore integration. 6. SafeMR: Object-level Abstraction 95

350 70 Average processing time 300 Average number of matches 60 250 245.88 50 41.713 217.16 200 40 150 30 100 20 Number of 50 28.90 3.24 10 feature matches

Time (milliseconds) 0 0 SIFT ORB TF-OD Process

Figure 6.4: Comparing the employed detection algorithms in terms of per-frame processing and number of feature matches: OpenCV-SIFT, OpenCV-ORB, and TensorFlow Object Detection API. TF-OD does not expose the number of feature matches. 6.3.1 Validating the Vision Algorithms

Before proceeding with the system evaluation, we first validate the detection perfor- mance of the algorithms, i.e. SIFT, ORB, and TF-OD, we utilized in the current implementation. Fig. 6.4 shows the average single-frame performance of the three detection algorithms used. For a single detection activity, SIFT and TF-OD have av- erage execution time latencies of 245ms and 217ms, respectively, while ORB performs almost 10x faster with an average latency of 28ms. The number of feature matches (with respect to a reference image) also follows the same trend for both the specific ob- ject detection algorithms, SIFT and ORB. SIFT can produce, on average, 41 matches while ORB produces 3 matches for the same targets. This means that ORB features will take much less memory. However, this also means that ORB features does not discriminate well between di↵erent targets as compared to the multiple matches pro- duced by SIFT. When the same feature match thresholds are used, ORB failed to produce good detections especially with very small or scaled-down targets.

6.3.2 System Evaluation Setup

For this evaluation, we modified the described implementation in §6.2.3 to emulate di↵erent task instances of third-party applications or services with varying target OoIs and can either require specific (SIFT or ORB) or general (TF-OD) object detection. Our evaluation setup, as shown in Fig. 6.5, involves a client Android device which is set to capture a scene with at least one object for general detection (e.g. a cup, or a cell phone) and one for specific object detection (i.e. the white tack packet labeled sample target). The application analyzes the captured frames and detects objects in the scene. We then vary the following parameters: 6. SafeMR: Object-level Abstraction 96

Without abstraction, all 4 detected With abstraction, all 4 objects are objects (as shown in the privileged detected but only the required view) are provided to the requesting object is provided to the requesting application. application.

Privileged View Privileged View

Sample Target

Figure 6.5: Varying abstraction mode: without (left) and with (right) SafeMR.The privileged views show actual detected objects while the larger views show which objects (or their information) are provided to applications.

1. number of concurrent detection tasks with task types randomly picked from the three algorithms employed,

2. size of the input frame, and

3. the operating mode of whether to proceed with conventional MR processing or with object-level abstraction.

We automate the application to take 11 frames for each setup combination of number of tasks, input size, and operating mode and repeat the experiment 4 times for a total of 44 frames per setup combination. We record all necessary measurements which we will discuss in the next section. 6. SafeMR: Object-level Abstraction 97

6.3.3 Evaluation Metrics

We evaluate the object-level abstraction scheme by comparing it with conventional, i.e. without abstraction, MR processing in terms of (1) detection hits, (2) task execution latency, and (3) energy consumption. Our system’s detection accuracy and precision will be dependent on the o↵-the-shelf algorithms we have employed.

Detection Hits. To determine the utilization of the underlying detection algo- rithms, we captured the detection hits for either target, latent, or secret objects and use it as a simple functional utility measure. For every detection in a frame, we in- crement a corresponding detection count. For conventional MR processing, raw RGB (like in Fig. 6.1.a) is exposed to the third-party applications/services. Whenever a target is detected, the Target Detection Hit count is incremented. Likewise, when a non-target object is detected the combined Secret Detection or Latent Detection Hit count is incremented. For processing with abstraction, only the target objects are provided to the appli- cations. When a target is detected, the corresponding Detection Hit is incremented, but when an object tagged as sensitive is detected the Blocked Secret hit count is incremented. Likewise, we also increment the Blocked Latent hit count for detected objects not targeted by the applications and are not sensitive. For evaluation, we ran two test cases; i) In Case I, the targets are non-sensitive objects, ii) In Case II, the targets included sensitive objects. For both cases, aside from the targets being present, the scene had other objects that may or may not be sensitive.

Execution Time. For evaluating execution time, we vary the number of concurrent tasks to emulate concurrent requesting applications. Then, we produced three sets of instances (to vary task type combinations) per number of concurrent tasks. The same sets were used with and without abstraction, and repeated 4 times for each set. For every run, execution time of 11 consecutive frames were recorded, which resulted in a total of 44 measurements for every set of task instances per number of concurrent tasks.

Energy Consumption. We checked the energy consumption of the di↵erent setups by measuring the instantaneous power as well as the current and voltage of our client device using a Monsoon Power Monitor. 6. SafeMR: Object-level Abstraction 98

Table 6.1: Average Detection Hit Rate ( stdev) (Processed frame size is 500x500) ±

Case Without Abstraction With Abstraction Overall Target Secret & Latent Overall Target Secret Target Hits Latent Target Hits Detection Hits Target Detection Hits Detection Hits (Blocked Hits) (Blocked Hits) I 0.992 0.091 0 (0.989 0.103) ± ± 0.998 0.038 0.994 0.076 II ± ± 0.984 0.124 0 (0.975 0.156) 0 (0.971 0.169) ± ± ±

6.4 SafeMR Performance

Fig. 6.5 shows example screenshots of our proof-of-concept Android application–with (right) and without (left) SafeMR. The larger views present the detected objects seen by a third-party application or service. For demonstration purposes, we also show a full-privileged view (lower right crop insets) to show the actual results of the detection process. Without SafeMR, all objects, targets and non-targets, are seen by the ap- plication; in contrast, SafeMR can detect all objects, as seen in the privileged view in Fig. 6.5.right, but only the sample target’s information is provided to the application.

6.4.1 Detection Utility & Secrecy

Table 6.1 shows detection hit rates for the two cases described in §6.3.3. It shows the average detection hit per frame, i.e Target Detection, Secret Detection, and Latent Detection, for a setup with fixed set of task instances, target objects, and sensitive objects. This serves as a validation for the operation of both with and without ab- straction. Without abstraction, all types of objects whether targets or not, sensitive or non- sensitive, were all detected and could be accessed by the third-party applications; thus, no detections were blocked. Average overall detection hit rate is at 0.998 per frame. Likewise, latent and secret objects were also detected which results to a combined hit rate of 0.994.5 On the other hand, with SafeMR, the detection hits were counted for every target detected and got an average hit rate of 0.992 and 0.984 for Cases I and II, respectively. The 1% drop on the overall hit rate when using SafeMR was due to the fact that ⇠ only non-sensitive targets were provided to the requesting application. For Case I, the combined secret and latent hit rate is 0; that is, application access was to secret and/or latent targets were blocked. Case I blocked hit rate is 0.989. For Case II, the secret and target detection hits are both 0, while the blocked hits were counted separately as

5Case I has a combined latent and privacy detection hit rate as applications only desire access to non-sensitive targets; we append the combined count for every frame that has a detected secret (sensitive) OR latently private object; thus, the resulting count is not a sum but rather a union. 6. SafeMR: Object-level Abstraction 99

Target Detection Hits, w/o abstraction Target Detection Hits, w/ abstraction Secret and Latent Blocked Hits, w/ abstraction 1 0.8 0.6

CDF 0.4 0.2 0 0.1 0.3 0.5 0.7 0.9 Normalized value of hits

Figure 6.6: CDF of the detection hits and secret hits the applications were also targeting sensitive objects which are 0.975 and 0.971 secret and latent block hits, respectively. To see the hits with continuous detection, we varied the number of concurrent detection task requests as well as the target and sensitive objects. We captured 11 consecutive frames and count the detection hits and combined secrecy and latent privacy hits (as Secret Hits). We normalized the counts to get per-frame, per-task hit values. We plotted the resulting CDF of the raw normalized hit values in Fig. 6.6. For this setup, since raw visual information was provided for without abstraction, its detection hit counts all detections–targets or not–while the detection hits of with abstraction counts only the detections that were those of the targets. CDF of the detection hits without-abstraction (blue circle) in Fig. 6.6 illustrates that a higher detection hit rate is more likely without abstraction compared to with- abstraction (yellow cross). However, this is due to both target and non-target objects being detected without abstraction. Consequently, the target hits with abstraction tend to be less due to some detections being blocked since they are non-targets. It should also be noted that it is not necessary that the sum of detection hits with abstrac- tion and the corresponding secret hits will be the detection hits without abstraction. The assigned sensitivity to objects are also randomized–some objects are tagged sen- sitive for one task but may not be for another. On the other hand, the detection hits can also represent leakage. The ratio of the secret hits from with-abstraction over the detection hits of conventional processing represents the rate of information leakage on sensitive objects. This is obviously a consequence of the most privilege operation with conventional MR processing. 6. SafeMR: Object-level Abstraction 100

20 w/o abstraction w/ abstraction 15 10 5

Time (seconds) 0 1 2 3 5 7 9 11 15 20 Number of Concurrent Tasks

Figure 6.7: Average overall frame processing time in seconds ( standard deviation). (Processed frame size is 500x500) ±

5 without abstraction 4 with abstraction 3 2 1

Time (seconds) 33399915 33399915 15 15 15 15 0 300 400 500 Frame Size

Figure 6.8: Comparing performance based on input frame size (number of tasks are indicated at the bottom of the bars)

6.4.2 Execution Time Performance

As a consequence of abstraction, sharing of tasks are inherently provided within our SafeMR platform. Fig. 6.7 shows average overall frame processing times with varying number of concurrent task instances. A point in the graph represents the average value of multiple instances of the same number of tasks while the error bars’ height shows one standard deviation up and down. Unsurprisingly, with conventional MR processing, as the number of concurrent task requests increases, the execution time to process a frame increases as well. On the other hand, with abstraction implemented, as the number of concurrent tasks increase, the execution time increases linearly with a low gradient (slowly; from about 350ms to a maximum of about 1400ms) and plateaus and remains below 2s after 11 concurrent tasks. The di↵erence in performance between conventional MR processing is much more evident in Fig. 6.8 which shows the relationship between the frame sizes with number of tasks per setup. Similar to the results in Fig. 6.7, as the number of tasks increases, execution times without abstraction increases as much as 6 for the same input frame ⇥ size, while with abstraction execution time increases only by approximately 2.5 . ⇥ For the same frame size and with at least 5 concurrent tasks, execution time with 6. SafeMR: Object-level Abstraction 101 abstraction is at least half, i.e. 50%, that of without.

6.4.3 Energy Consumption

The power consumption of a camera preview (no detection), with and without ab- straction activities are very similar at 15.92 mWh, 15.94 mWh, and 15.95 mWh, respectively. They di↵er, however, in the processing period for the same number of processed frames. For example, with 5 consecutive frames, we can get an estimate per- frame energy consumption of 10mWh and 19mWh for with and without abstraction, respectively, which is approximately a 47% reduction in energy consumption when us- ing SafeMR. That is, per-frame energy consumption is lower with SafeMR than without as it takes less time to process the same amount of frames with SafeMR applied.

6.5 Provided Utility by SafeMR

As shown in Fig. 6.1, the minimum visual information required is to determine whether an object is present or not, while, sometimes, an application may also need to know where in the view the target is. Results from §6.4.1 (i.e. Table 6.1 and Fig. 6.6) shows the detection hit rate. In conventional processing (i.e. no abstraction), there is no check whether the minimum information is available. In fact, applications are provided most privilege which results in a higher detection hit rate. Assuming that conventional MR processing will also be running with least privilege and that the SafeMR platform employs all necessary detection algorithms required by the applications, object-level abstraction can provide the same functional utility as that of the current conventional setup. For example, if the minimal information necessitated by applications is to know if their OoIs are in the view and where it is in the view, conventional visual processing and SafeMR can provide the same utility to the user. This is corroborated by the similar detection hits for both with and without abstraction in Table 6.1.

Utility for beyond the minimal Applications may require combinations of infor- mation, e.g. RGB plus depth, or combining them with other data sources, such as GPS, and compass data. For this case, other necessary algorithms or functions are not readily employed by the current SafeMR platform. As a result, the information released will have lower utility and may hinder functionality. Thus, mechanisms for allowing the platform to adapt and use other algorithms are necessary. For object-level adaptations, new specific object targets can be represented by their SIFT or ORB features and inserted on to the library of specific targets. On the other 6. SafeMR: Object-level Abstraction 102 hand, for generalized object detection, the employed TensorFlow object detection API allows for o✏ine retraining of its classifier to expand the objects that can be detected.

6.5.1 Resource Sharing Benefit

Assuming that applications only require minimal information, it will be desirable to implement abstraction to take advantage of the benefits of reduced execution due to visual processing resource sharing. This is on top of the benefits of reduced leakage and access control provided by the abstraction. The results in Section 6.4.2 shows that resource sharing can only be maximized with two or more concurrent tasks, and there is no pay-o↵ for a single application requesting one task at a time. Thus, there is no loss of utility when using SafeMR with minimal functionality. Moreover, when MR devices, services, and applications are widely used, users will be utilizing two or more services at the same time as a consequence of the freedom from screens that MR can potentially provide. Furthermore, despite the slow execution time for a single detection task (average execution time for a size of 500 500, as shown in Figs. 6.4 and 6.7, is on the order ⇥ of 100ms), parallel tracking allows for detection to continue in the background while previous detection information are used to achieve a desirable output frame rate. Another important benefit is the reduced energy consumption per frame when using SafeMR. This is due to the fact that the resource sharing results to reduced per-frame execution time; that is, for the same number of frames to be processed, the total energy consumption is less with abstraction applied.

6.6 Protection Properties of SafeMR

To analyse the protection provided by SafeMR, we follow the combined security and privacy model in Section 2.1, and show in Table 2.2 how SafeMR compares to other approaches discussed in the literature in terms of the properties provided. The primary protection o↵ered by SafeMR is selective access control and this is captured by the confidentiality and content awareness properties. Objects are kept confidential as long as users do not provide access to applications. For content awareness, SafeMR needs to make users aware of which objects applications desire access to, and to control which applications will have access to which objects–analogous to how current mobile devices provide access to sensors for the applications. Furthermore, applications that accessed the objects cannot deny, i.e. non-repudiate, their access to the objects they requested access to. By extension, the access control provides the security property of authorization. 6. SafeMR: Object-level Abstraction 103

For undetectability, and unobservability, SafeMR provides users the ability to allow and disallow application access to specific objects within the view. Partial protec- tion for both properties are provided because once third-party applications have been provided access to those objects they are no longer undetectable and unobservable. Nonetheless, as long as the user does not allow access to those sensitive objects, the properties will be maintained for those objects. For policy and consent compliance, all elements along the data flow should comply. However, only the compliance of the trusted core components can be ensured. With SafeMR, implementing user-specified privacy preferences can easily be applied thus ensuring policy and consent compliance. For unlinkability and plausible deniability, SafeMR cannot provide further protec- tion other than the partial protection o↵ered through undetectability and unobserv- ability. Once access has been provided, unlinkability and plausible deniability can no longer be ensured. Furthermore, there is no inherent anonymity and pseudonymity protection provided by SafeMR. However, bystander anonymity can be provided if input detection policies by default consider detected persons as sensitive objects. Overall, SafeMR can provide significant confidentiality and content awareness, and partial unobservability, undetectability, and policy and consent compliance. However, for unlinkability, plausible deniability, and anonymity and pseudonimity, other existing PETs will need to be incorporated to SafeMR for complete privacy protection.

Summary

In this chapter, we presented an object-level abstraction called SafeMR for visual in- formation protection. Using readily-available object detection algorithms, we demon- strated that it is possible to provide a proof-of-concept object-level abstraction that can form the basis for providing the necessary visual protection for emerging MR applications. The approach in combination with the existing privacy enhancing tech- nologies including our proposed data-centric measure in Chapter 5 can provide wider privacy protection. We have also shown that the abstraction enables visual process- ing resource sharing which further reduces the overall per-frame processing time with multiple concurrent tasks and has no energy consumption impact. In fact, per-frame energy consumption is reduced due to the reduction in execution time from resource sharing. Given these privacy and execution pay-o↵s as well as minimal overhead, object-level abstraction is a viable approach to privacy protection in mixed reality. Chapter 7

Conclusions

Immersive User Experience

User space i User : MR Output with Objects-of- Privacy screen Y Interest (OoI) Preferences abstraction e.g. Pokémon on laptop your desk screen SafeMR abstraction laptop Object-level Other OoI Abstractions abstraction G1 IntendedG1 IntendedG Function/s Function/sIntended � Function/s MR Device Risk + detected OoIs Measure Si Main MR Spatial Data Transfor- ~ Processing Pipeline mations Transformed Si Point Cloud

Intermediary Protection Mechanisms

Figure 7.1: Overall system diagram showing how both SafeMR and data manipulations can be integrated within an intermediary layer of protection.

Ensuring security and privacy for future technologies can ensure easy user-adoption. Mixed reality presents a future of new and immersive experiences that inherently [and unsurprisingly] have security and privacy risks. Moreover, as mixed reality devices are just starting to ship out commercially, there may still be unknown security and privacy risks. In this work, we have collected, categorized, and reviewed various security and privacy work on or related to MR, and highlighted the current gaps in this aspect. In Chapter 3, we laid down our theoretical framework that formalizes the spatial privacy problem and properly positions it against the various work we reviewed. Following this formalization, we demonstrated the privacy leakage in MR data, and present a

104 7. Conclusions 105

Table 7.1: Our two proposed approaches, and which security and privacy properties they provide to which data flow element: data flow, process, storage, and/or ⌃ ⇤ entity. (As presented in Table 2.2) 4

Section Approach Integrity Non-RepudiationAvailability Authorization AuthenticationIdentification ConfidentialityAnonymity Unlinkability UndetectabilityDeniability Awareness Compliance

Data Protection Approach

Ch 5 Conservative Plane Releasing ⇤ ⇤ [de Guzman et al., 2020b] 4 4 ‡

Input Protection Approach

Ch 6 SafeMR [de Guzman et al., ⌃ ⌃ ⌃ ⌃ 2019c] 4 ‡ heuristic measure for spatial privacy risk in Chapter 4. Then, we presented two ap- proaches that (1) directly addresses the spatial privacy problem through a data-centric approach which leverages MR data manipulation for privacy preservation, and (2) a visual information access control mechanism as an input data flow-centric protection. We inserted these two protection measures into Table 2.2 to see how our work compares to the rest of the literature and we focus on the two measures in Table 7.1. As we reveal in Table 2.2 and 7.1, our proposed measures are primarily input and data protection approaches. Figure 7.1 shows how our two proposed mechanisms can be integrated as intermediary protection layers that ensures spatial and object-level privacy. We have demonstrated that we can enhance the protection from plane gener- alizations by conservatively releasing the generalized planes provided to applications, while guaranteeing spatial data utility. Moreover, the spatial complexity measure can be leveraged as a tuning parameter for spatial data transformations to further re- duce the spatial inference risk even as more of the space is revealed. Furthermore, using readily-available object detection algorithms, we demonstrated SafeMR as a proof-of- concept object-level abstraction that can form the basis for providing the necessary visual protection for current and emerging MR applications. Both proposed measures in combination with other privacy enhancing technologies can potentially provide com- plete privacy protection.

Open Challenges and Future Work

As we have presented a wide understanding of security and privacy in mixed reality, both in presenting the existing related work and proposing protection measures, our work has revealed further challenges and opportunities for future academic endeavors. 7. Conclusions 106

Extending the attack scenario. It was not the focus of this work to develop the best attacker; however, countermeasures are only as good as the best attack it can defend against. Thus, it is a worthwhile e↵ort to explore improvements on the attack methods as future work. For example, we can investigate how adversaries with strong background knowledge (e.g. having access to other user information that can reveal their location) will perform. An attacker can also consider temporal analysis with spatial data as an additional attribute in a time-series together with other user data: we hypothesize that our current countermeasure can become ine↵ective as the privacy leakage accumulates with time; thus, the countermeasure has to likewise be improved to be resilient to such temporal-based attacks.

Extensions to other 3D data sources. Furthermore, we have only focused on 3D point cloud data captured by MR platforms. However, there are other platforms and use cases in which 3D data is used: for example, 3D lidar data used in geo-spatial work, captured by self-driving cars, and, now, by many other applications on recent smartphones. As we have mentioned, the methods we have used for spatial attacks are adapted from place recognition methods originally used for 3D lidar data. Now, conversely, the protection methods we have designed and developed can be applied on these other platforms. Of course, figuring out a balance between functional utility, say, for a self-driving car, and privacy will still be a significant a challenge.

Extending SafeMR for Output Protection. Despite our proposed measures be- ing primarily input and data protection approaches, there is potential in reinforcing output level protection through the SafeMR system. As shown in Figure 6.2, the non- repudiation security property is enforced by ensuring that the application only accesses their intended object-of-interest (OoI). Likewise, the applications which rendered aug- mentations as a response to their OoI cannot deny their renders which e↵ectively extends the non-repudiation property to the outputs. We have not explicitly demon- strated this but a software implementation can demonstrate how each OoI abstraction can incorporate an audit list of application access and renders due to it.

SafeMR Implementation Considerations. Implementing (or forcing) a shared ab- straction ontology requires cooperation with third party developers. Additional con- siderations include implementation complexity that is loaded to developers given the limited information provided by the abstraction. Furthermore, with our object-level abstraction, most of the privacy preference specification can now be performed by the user. This can extraneously load the user and may become unreasonably taxing, which can defeat the intention of protection. Thus, the perceived utility of the user should 7. Conclusions 107 also be considered in quantifyinig the overall quality of experience (QoE).

Security and Privacy Analysis of Existing Devices and Platforms. In terms of an overall challenge, a systematic security and privacy analysis of MR applications, devices, and platforms can still to be performed in order to identify other potential and latent risks particularly on their input capabilities. For example, scanning capability of current devices, such as the HoloLens, should be investigated whether it can potentially be used to detect heartbeats or other physiological signals of bystanders, or, aside from what we have revealed, how these spatial data can be further abused by adversaries. Then, we can evaluate these MR systems against the security and privacy requirements we have earlier specified. Bibliography

[Acquisti, 2011] Acquisti, A. (2011). Privacy in the age of augmented reality.

[Acquisti et al., 2016] Acquisti, A., Taylor, C. R., and Wagman, L. (2016). The eco- nomics of privacy.

[Aditya et al., 2016] Aditya, P., Sen, R., Druschel, P., Joon Oh, S., Benenson, R., Fritz, M., Schiele, B., Bhattacharjee, B., and Wu, T. T. (2016). I-pic: A platform for privacy-compliant image capture. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 235–248. ACM.

[Ahn et al., 2018] Ahn, S., Gorlatova, M., Naghizadeh, P., Chiang, M., and Mittal, P. (2018). Adaptive fog-based output security for augmented reality. In Proceedings of the 2018 Morning Workshop on Virtual Reality and Augmented Reality Network, pages 1–6. ACM.

[Andrabi et al., 2015] Andrabi, S. J., Reiter, M. K., and Sturton, C. (2015). Usability of augmented reality for revealing secret messages to users but not their devices. In SOUPS, pages 89–102.

[Ankerst et al., 1999] Ankerst, M., Kastenm¨uller, G., Kriegel, H.-P., and Seidl, T. (1999). 3d shape histograms for similarity search and classification in spa- tial databases. In International symposium on spatial databases, pages 207–226. Springer.

[Arandjelovic et al., 2016] Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (2016). Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5297–5307.

[Aslan et al., 2014] Aslan, I., Uhl, A., Meschtscherjakov, A., and Tscheligi, M. (2014). Mid-air authentication gestures: an exploration of authentication based on palm and finger motions. In Proceedings of the 16th International Conference on , pages 311–318. ACM.

108 BIBLIOGRAPHY 109

[Azuma et al., 2001] Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B. (2001). Recent advances in augmented reality. IEEE computer graph- ics and applications, 21(6):34–47.

[Azuma, 1997] Azuma, R. T. (1997). A survey of augmented reality. Presence: Tele- operators and virtual environments, 6(4):355–385.

[Baldassi et al., 2018] Baldassi, S., Kohno, T., Roesner, F., and Tian, M. (2018). Challenges and new directions in augmented reality, computer security, and neuroscience–part 1: Risks to sensation and perception. arXiv preprint arXiv:1806.10557.

[Benford et al., 1998] Benford, S., Greenhalgh, C., Reynard, G., Brown, C., and Koleva, B. (1998). Understanding and constructing shared spaces with mixed- reality boundaries. ACM Transactions on computer-human interaction (TOCHI), 5(3):185–223.

[Bosse and Zlot, 2009] Bosse, M. and Zlot, R. (2009). Keypoint design and evalua- tion for place recognition in 2d lidar maps. Robotics and Autonomous Systems, 57(12):1211–1224.

[Bosse and Zlot, 2013] Bosse, M. and Zlot, R. (2013). Place recognition using keypoint voting in large 3d lidar datasets. In 2013 IEEE International Conference on Robotics and Automation, pages 2677–2684. IEEE.

[Brkic et al., 2017] Brkic, K., Sikiric, I., Hrkac, T., and Kalafatic, Z. (2017). I know that person: Generative full body and face de-identification of people in images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1319–1328.

[Bronstein et al., 2010] Bronstein, A. M., Bronstein, M. M., Bustos, B., Castellani, U., Crisani, M., Falcidieno, B., Guibas, L. J., Kokkinos, I., Murino, V., et al. (2010). Shrec 2010: robust feature detection and description benchmark. Proc. EUROGRAPHICS Workshop on 3D Object Retrieval (3DOR).

[Butz et al., 1998] Butz, A., Beshers, C., and Feiner, S. (1998). Of vampire mirrors and privacy lamps: Privacy management in multi-user augmented environments. In Proceedings of the 11th annual ACM symposium on software and technology, pages 171–172. ACM.

[Butz et al., 1999] Butz, A., H¨ollerer, T., Feiner, S., MacIntyre, B., and Beshers, C. (1999). Enveloping users and computers in a collaborative 3d augmented reality. BIBLIOGRAPHY 110

In Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality, IWAR ’99, pages 35–, Washington, DC, USA. IEEE Computer Society.

[Chang et al., 2010] Chang, J. J.-Y., Li, M.-J., Wang, Y.-C., and Juan, J. S.-T. (2010). Two-image encryption by random grids. In Communications and Information Tech- nologies (ISCIT), 2010 International Symposium on, pages 458–463. IEEE.

[Chatzopoulos et al., 2017] Chatzopoulos, D., Bermejo, C., Huang, Z., and Hui, P. (2017). Mobile augmented reality survey: From where we are to where we go. IEEE Access.

[Chauhan et al., 2017] Chauhan, J., Hu, Y., Seneviratne, S., Misra, A., Seneviratne, A., and Lee, Y. (2017). Breathprint: Breathing acoustics-based user authentication. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages 278–291. ACM.

[Cheok et al., 2002] Cheok, A. D., Yang, X., Ying, Z. Z., Billinghurst, M., and Kato, H. (2002). Touch-space: space based on ubiquitous, tangible, and . Personal and , 6(5-6):430–442.

[Ciriani et al., 2010] Ciriani, V., Vimercati, S. D. C. D., Foresti, S., Jajodia, S., Para- boschi, S., and Samarati, P. (2010). Combining fragmentation and encryption to protect privacy in data storage. ACM Transactions on Information and System Security (TISSEC), 13(3):22.

[Crabtree et al., 2016] Crabtree, A., Lodge, T., Colley, J., Greenhalgh, C., Mortier, R., and Haddadi, H. (2016). Enabling the new economic actor: data protection, the digital economy, and the databox. Personal and Ubiquitous Computing, 20(6):947– 957.

[de Guzman et al., 2019a] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2019a). Demo: Privacy-aware visual information protection for mobile mixed real- ity. In 2019 IEEE 41st Conference on Local Computer Networks (LCN).IEEE.

[de Guzman et al., 2019b] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2019b). A first look into privacy leakage in 3d mixed reality data. In European Symposium on Research in Computer Security, pages 149–169. Springer.

[de Guzman et al., 2019c] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2019c). Safemr: Privacy-aware visual information protection for mobile mixed reality. In 2019 IEEE 41st Conference on Local Computer Networks (LCN).IEEE. BIBLIOGRAPHY 111

[de Guzman et al., 2019d] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2019d). Security and privacy approaches in mixed reality: A literature survey. ACM Comput. Surv., 52(6):110:1–110:37.

[de Guzman et al., 2020a] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2020a). Analysing spatial inference risk over mixed reality spatial data. (Submitted) Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/Ubicomp), Proc. of the ACM).

[de Guzman et al., 2020b] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2020b). Conservative plane releasing for spatial privacy protection in mixed reality. arXiv preprint arXiv:2004.08029.

[de Guzman et al., 2020c] de Guzman, J. A., Thilakarathna, K., and Seneviratne, A. (2020c). Spatial privacy leakage in 3d mixed reality data. In (Accepted) 2020 Cyber Defence Next Generation Technology and Science Conference 2020 (CDNG 2020). CSIRO.

[de Montjoye et al., 2014] de Montjoye, Y.-A., Shmueli, E., Wang, S. S., and Pentland, A. S. (2014). openpds: Protecting the privacy of metadata through safeanswers. PloS one, 9(7):e98790.

[Deng et al., 2011] Deng, M., Wuyts, K., Scandariato, R., Preneel, B., and Joosen, W. (2011). A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements. Requirements Engineering, 16(1):3–32.

[DeVincenzi et al., 2011] DeVincenzi, A., Yao, L., Ishii, H., and Raskar, R. (2011). Kinected conference: augmenting video imaging with calibrated depth and audio. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, pages 621–624. ACM.

[Dickinson, 2016] Dickinson, B. (2016). 5 authentication methods putting passwords to shame.

[Dinh and Xu, 2008] Dinh, H. Q. and Xu, L. (2008). Measuring the similarity of vector fields using global distributions. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pages 187–196. Springer.

[Dorfmuller-Ulhaas and Schmalstieg, 2001] Dorfmuller-Ulhaas, K. and Schmalstieg, D. (2001). for interaction in augmented environments. In Aug- mented Reality, 2001. Proceedings. IEEE and ACM International Symposium on, pages 55–64. IEEE. BIBLIOGRAPHY 112

[Du et al., 2014] Du, L., Yi, M., Blasch, E., and Ling, H. (2014). Garp-face: Balancing privacy protection and utility preservation in face de-identification. In Biometrics (IJCB), 2014 IEEE International Joint Conference on, pages 1–8. IEEE.

[Dwork et al., 2014] Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of di↵erential privacy. Foundations and Trends® in Theoretical , 9(3–4):211–407.

[Eaddy et al., 2004] Eaddy, M., Blasko, G., Babcock, J., and Feiner, S. (2004). My own private kiosk: Privacy-preserving public displays. In Wearable Computers, 2004. ISWC 2004. Eighth International Symposium on, volume 1, pages 132–135. IEEE.

[Ens et al., 2015] Ens, B., Grossman, T., Anderson, F., Matejka, J., and Fitzmaurice, G. (2015). Candid interaction: Revealing hidden mobile and wearable computing activities. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 467–476. ACM.

[Erlingsson et al., 2014] Erlingsson, U.,´ Pihur, V., and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM.

[Fang and Chang, 2010] Fang, C. and Chang, E.-C. (2010). Securing interactive ses- sions using mobile device through visual channel and visual inspection. In Proceed- ings of the 26th Annual Computer Security Applications Conference, pages 69–78. ACM.

[Felt et al., 2012] Felt, A. P., Ha, E., Egelman, S., Haney, A., Chin, E., and Wagner, D. (2012). Android permissions: User , comprehension, and behavior. In Proceedings of the eighth symposium on usable privacy and security, page 3. ACM.

[Figueiredo et al., 2016] Figueiredo, L. S., Livshits, B., Molnar, D., and Veanes, M. (2016). Prepose: Privacy, security, and reliability for gesture-based programming. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 122–137. IEEE.

[Fischler and Bolles, 1981] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395.

[Forte et al., 2014] Forte, A. G., Garay, J. A., Jim, T., and Vahlis, Y. (2014). Eyede- crypt – private interactions in plain sight. In International Conference on Security and Cryptography for Networks, pages 255–276. Springer. BIBLIOGRAPHY 113

[Friedman and Kahn Jr, 2000] Friedman, B. and Kahn Jr, P. H. (2000). New direc- tions: A value-sensitive design approach to augmented reality. In Proceedings of DARE 2000 on designing augmented reality environments, pages 163–164. ACM.

[Gaebel et al., 2016] Gaebel, E., Zhang, N., Lou, W., and Hou, Y. T. (2016). Looks good to me: Authentication for augmented reality. In Proceedings of the 6th Inter- national Workshop on Trustworthy Embedded Devices, pages 57–67. ACM.

[George et al., 2017] George, C., Khamis, M., von Zezschwitz, E., Burger, M., Schmidt, H., Alt, F., and Hussmann, H. (2017). Seamless and secure vr: Adapting and evaluating established authentication systems for virtual reality.

[Gross et al., 2006] Gross, R., Sweeney, L., De la Torre, F., and Baker, S. (2006). Model-based face de-identification. In null, page 161. IEEE.

[Gross et al., 2008] Gross, R., Sweeney, L., de la Torre, F., and Baker, S. (2008). Semi- of multi-factor models for face de-identification. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8.

[Hayashi et al., 2010] Hayashi, M., Yoshida, R., Kitahara, I., Kameda, Y., and Ohta, Y. (2010). An installation of privacy-safe see-through vision. Procedia-Social and Behavioral Sciences, 2(1):125–128.

[He et al., 2007] He, W., Liu, X., Nguyen, H., Nahrstedt, K., and Abdelzaher, T. (2007). Pda: Privacy-preserving data aggregation in wireless sensor networks. In INFOCOM 2007. 26th IEEE International Conference on Computer Communica- tions. IEEE, pages 2045–2053. IEEE.

[He et al., 2011] He, W., Liu, X., Nguyen, H. V., Nahrstedt, K., and Abdelzaher, T. (2011). Pda: privacy-preserving data aggregation for information collection. ACM Transactions on Sensor Networks (TOSN), 8(1):6.

[Heimo et al., 2014] Heimo, O. I., Kimppa, K. K., Helle, S., Korkalainen, T., and Lehtonen, T. (2014). Augmented reality-towards an ethical fantasy? In Ethics in Science, Technology and Engineering, 2014 IEEE International Symposium on, pages 1–7. IEEE.

[Henrysson et al., 2005] Henrysson, A., Billinghurst, M., and Ollila, M. (2005). Face to face collaborative ar on mobile phones. In Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’05), pages 80–89.

[Hoh and Gruteser, 2005] Hoh, B. and Gruteser, M. (2005). Protecting location pri- vacy through path confusion. In First International Conference on Security and BIBLIOGRAPHY 114

Privacy for Emerging Areas in Communications Networks (SECURECOMM’05), pages 194–205. IEEE.

[Howard and Lipner, 2006] Howard, M. and Lipner, S. (2006). The security develop- ment lifecycle, volume 8. Microsoft Press Redmond.

[Hsu et al., 2011] Hsu, C.-Y., Lu, C.-S., and Pei, S.-C. (2011). Homomorphic encryption-based secure sift for privacy-preserving feature extraction. In Media Watermarking, Security, and Forensics III, volume 7880, page 788005. International Society for Optics and Photonics.

[Huang et al., 2017] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al. (2017). Speed/accuracy trade-o↵s for modern convolutional object detectors. In IEEE CVPR.

[Huang and You, 2012] Huang, J. and You, S. (2012). Point cloud matching based on 3d self-similarity. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 41–48. IEEE.

[Huang et al., 2011] Huang, Y., Evans, D., Katz, J., and Malka, L. (2011). Faster se- cure two-party computation using garbled circuits. In USENIX Security Symposium, volume 201.

[Jana et al., 2013a] Jana, S., Molnar, D., Moshchuk, A., Dunn, A. M., Livshits, B., Wang, H. J., and Ofek, E. (2013a). Enabling fine-grained permissions for augmented reality applications with recognizers. In USENIX Security.

[Jana et al., 2013b] Jana, S., Narayanan, A., and Shmatikov, V. (2013b). A scan- ner darkly: Protecting user privacy from perceptual applications. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 349–363. IEEE.

[Jensen et al., 2019] Jensen, J., Hu, J., Rahmati, A., and LiKamWa, R. (2019). Pro- tecting visual information in augmented reality from malicious application devel- opers. In The 5th ACM Workshop on Wearable Systems and Applications, pages 23–28.

[Jiang et al., 2017] Jiang, L., Xu, C., Wang, X., Luo, B., and Wang, H. (2017). Secure outsourcing sift: Ecient and privacy-preserving image feature extraction in the encrypted domain. IEEE Transactions on Dependable and Secure Computing.

[Johnson and Hebert, 1998] Johnson, A. E. and Hebert, M. (1998). Surface match- ing for object recognition in complex three-dimensional scenes. Image and Vision Computing, 16(9-10):635–651. BIBLIOGRAPHY 115

[Johnson and Hebert, 1999] Johnson, A. E. and Hebert, M. (1999). Using spin images for ecient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis & Machine , (5):433–449.

[Kalloniatis et al., 2008] Kalloniatis, C., Kavakli, E., and Gritzalis, S. (2008). Ad- dressing privacy requirements in system design: the pris method. Requirements Engineering, 13(3):241–255.

[Khamis et al., 2016] Khamis, M., Alt, F., Hassib, M., von Zezschwitz, E., Hasholzner, R., and Bulling, A. (2016). Gazetouchpass: Multimodal authentication using gaze and touch on mobile devices. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pages 2156–2164. ACM.

[Kohno et al., 2016] Kohno, T., Kollin, J., Molnar, D., and Roesner, F. (2016). Dis- play leakage and transparent wearable displays: Investigation of risk, root causes, and defenses. Technical report.

[Kress and Starner, 2013] Kress, B. and Starner, T. (2013). A review of head-mounted displays (hmd) technologies and applications for consumer electronics. In Proc. SPIE, volume 8720, page 87200A.

[Lantz et al., 2015] Lantz, P., Johansson, B., Hell, M., and Smeets, B. (2015). Visual cryptography and obfuscation: A use-case for decrypting and deobfuscating infor- mation using augmented reality. In International Conference on Financial Cryptog- raphy and Data Security, pages 261–273. Springer.

[Lebeck et al., 2016] Lebeck, K., Kohno, T., and Roesner, F. (2016). How to safely augment reality: Challenges and directions. In Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications, pages 45–50. ACM.

[Lebeck et al., 2017] Lebeck, K., Ruth, K., Kohno, T., and Roesner, F. (2017). Secur- ing augmented reality output. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 320–337. IEEE.

[Lebeck et al., 2018] Lebeck, K., Ruth, K., Kohno, T., and Roesner, F. (2018). To- wards security and privacy for multi-user augmented reality: Foundations with end users. In Towards Security and Privacy for Multi-User Augmented Reality: Foun- dations with End Users, page 0. IEEE.

[Lee et al., 2015] Lee, L., Egelman, S., Lee, J. H., and Wagner, D. (2015). Risk per- ceptions for wearable devices. BIBLIOGRAPHY 116

[Li et al., 2016a] Li, A., Li, Q., and Gao, W. (2016a). Privacycamera: Cooperative privacy-aware photographing with mobile phones. In Sensing, Communication, and Networking (SECON), 2016 13th Annual IEEE International Conference on, pages 1–9. IEEE.

[Li et al., 2016b] Li, S., Ashok, A., Zhang, Y., Xu, C., Lindqvist, J., and Gruteser, M. (2016b). Whose move is it anyway? authenticating smart wearable devices using unique head movement patterns. In Pervasive Computing and Communications (PerCom), 2016 IEEE International Conference on, pages 1–9. IEEE.

[Lin et al., 2017] Lin, P.-Y., You, B., and Lu, X. (2017). Video exhibition with ad- justable augmented reality system based on temporal psycho-visual modulation. EURASIP Journal on Image and Video Processing, 2017(1):7.

[Lowe, 2004] Lowe, D. G. (2004). Distinctive image features from scale-invariant key- points. International journal of computer vision, 60(2):91–110.

[Maiti et al., 2016] Maiti, A., Armbruster, O., Jadliwala, M., and He, J. (2016). Smartwatch-based keystroke inference attacks and context-aware protection mech- anisms. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pages 795–806. ACM.

[Maiti et al., 2017] Maiti, A., Jadliwala, M., and Weber, C. (2017). Preventing shoul- der surfing using randomized augmented reality keyboards. In Pervasive Computing and Communications Workshops (PerCom Workshops), 2017 IEEE International Conference on, pages 630–635. IEEE.

[Matsuda, 2016] Matsuda, K. (2016). Hyper-reality.

[McSherry and Talwar, 2007] McSherry, F. and Talwar, K. (2007). Mechanism design via di↵erential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 94–103. IEEE.

[Mohan et al., 2012] Mohan, P., Thakurta, A., Shi, E., Song, D., and Culler, D. (2012). Gupt: privacy preserving data analysis made easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 349–360. ACM.

[M¨ollers and Borchers, 2011] M¨ollers, M. and Borchers, J. (2011). Taps widgets: in- teracting with tangible private spaces. In Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, pages 75–78. ACM. BIBLIOGRAPHY 117

[Morris et al., 2006a] Morris, M. R., Cassanego, A., Paepcke, A., Winograd, T., Piper, A. M., and Huang, A. (2006a). Mediating group dynamics through tabletop interface design. IEEE and applications, 26(5):65–73.

[Morris et al., 2006b] Morris, M. R., Huang, A., Paepcke, A., and Winograd, T. (2006b). Cooperative gestures: multi-user gestural interactions for co-located group- ware. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 1201–1210. ACM.

[Mulloni et al., 2008] Mulloni, A., Wagner, D., and Schmalstieg, D. (2008). Mobility and social interaction as core gameplay elements in multi-player augmented reality. In Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts, pages 472–478. ACM.

[Mun et al., 2010] Mun, M., Hao, S., Mishra, N., Shilton, K., Burke, J., Estrin, D., Hansen, M., and Govindan, R. (2010). Personal data vaults: a locus of control for personal data streams. In Proceedings of the 6th International COnference, page 17. ACM.

[Newton et al., 2005] Newton, E. M., Sweeney, L., and Malin, B. (2005). Preserving privacy by de-identifying face images. IEEE transactions on Knowledge and Data Engineering, 17(2):232–243.

[Oh et al., 2016] Oh, S. J., Benenson, R., Fritz, M., and Schiele, B. (2016). Faceless person recognition: Privacy implications in social media. In European Conference on Computer Vision, pages 19–35. Springer.

[Ohbuchi et al., 2005] Ohbuchi, R., Minamitani, T., and Takei, T. (2005). Shape- similarity search of 3d models by using enhanced shape functions. International Journal of Computer Applications in Technology, 23(2-4):70–85.

[Osada et al., 2002] Osada, R., Funkhouser, T., Chazelle, B., and Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4):807–832.

[Pearson et al., 2017] Pearson, J., Robinson, S., Jones, M., Joshi, A., Ahire, S., Sahoo, D., and Subramanian, S. (2017). Chameleon devices: investigating more secure and discreet mobile interactions via active camouflaging. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 5184–5196. ACM.

[Pittaluga et al., 2019] Pittaluga, F., Koppal, S. J., Kang, S. B., and Sinha, S. N. (2019). Revealing scenes by inverting structure from motion reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 145–154. BIBLIOGRAPHY 118

[Qi et al., 2017a] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660.

[Qi et al., 2017b] Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural systems, pages 5099–5108.

[Qin et al., 2014] Qin, Z., Yan, J., Ren, K., Chen, C. W., and Wang, C. (2014). To- wards ecient privacy-preserving image feature extraction in cloud computing. In Proceedings of the 22nd ACM international conference on Multimedia, pages 497– 506. ACM.

[Qin et al., 2016] Qin, Z., Yan, J., Ren, K., Chen, C. W., and Wang, C. (2016). Secsift: Secure image sift feature extraction in cloud computing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 12(4s):65.

[Ra et al., 2013] Ra, M.-R., Govindan, R., and Ortega, A. (2013). P3: Toward privacy-preserving photo sharing. In Nsdi, pages 515–528.

[Rabbi and Ullah, 2013] Rabbi, I. and Ullah, S. (2013). A survey on augmented reality challenges and tracking. Acta Graphica znanstveni ˇcasopis za tiskarstvo i grafiˇcke komunikacije, 24(1-2):29–46.

[Raja et al., 2015] Raja, K. B., Raghavendra, R., Stokkenes, M., and Busch, C. (2015). Multi-modal authentication system for smartphones using face, iris and periocular. In Biometrics (ICB), 2015 International Conference on, pages 143–150. IEEE.

[Raval et al., 2014] Raval, N., Srivastava, A., Lebeck, K., Cox, L., and Machanava- jjhala, A. (2014). Markit: Privacy markers for protecting visual secrets. In Proceed- ings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pages 1289–1295. ACM.

[Raval et al., 2016] Raval, N., Srivastava, A., Razeen, A., Lebeck, K., Machanava- jjhala, A., and Cox, L. P. (2016). What you mark is what apps see. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 249–261. ACM.

[Reilly et al., 2014] Reilly, D., Salimian, M., MacKay, B., Mathiasen, N., Edwards, W. K., and Franz, J. (2014). Secspace: prototyping usable privacy and security for mixed reality collaborative environments. In Proceedings of the 2014 ACM SIGCHI symposium on Engineering interactive computing systems, pages 273–282. ACM. BIBLIOGRAPHY 119

[Ren et al., 2016] Ren, J., Rao, A., Lindorfer, M., Legout, A., and Cho↵nes, D. (2016). Recon: Revealing and controlling pii leaks in mobile network trac. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 361–374. ACM.

[Roesner et al., 2014a] Roesner, F., Denning, T., Newell, B. C., Kohno, T., and Calo, R. (2014a). Augmented reality: hard problems of law and policy. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: adjunct publication, pages 1283–1288. ACM.

[Roesner et al., 2014b] Roesner, F., Kohno, T., and Molnar, D. (2014b). Security and privacy for augmented reality systems. Commun. ACM, 57(4):88–96.

[Roesner et al., 2014c] Roesner, F., Molnar, D., Moshchuk, A., Kohno, T., and Wang, H. J. (2014c). World-driven access control for continuous sensing. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 1169–1181. ACM.

[Rogers et al., 2015] Rogers, C. E., Witt, A. W., Solomon, A. D., and Venkatasub- ramanian, K. K. (2015). An approach for user identification for head-mounted displays. In Proceedings of the 2015 ACM International Symposium on Wearable Computers, pages 143–146. ACM.

[Rublee et al., 2011] Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An ecient alternative to sift or surf. In Computer Vision (ICCV), 2011 IEEE international conference on, pages 2564–2571. IEEE.

[Samarati, 2001] Samarati, P. (2001). Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6):1010–1027.

[Samarati and Sweeney, 1998] Samarati, P. and Sweeney, L. (1998). Protecting pri- vacy when disclosing information: k-anonymity and its enforcement through gener- alization and suppression. Technical report.

[Saputra et al., 2018] Saputra, M. R. U., Markham, A., and Trigoni, N. (2018). Vi- sual slam and structure from motion in dynamic environments: A survey. ACM Computing Surveys (CSUR), 51(2):37.

[Schneegass et al., 2016] Schneegass, S., Oualil, Y., and Bulling, A. (2016). Skullcon- duct: Biometric user identification on eyewear computers using bone conduction through the skull. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 1379–1384. ACM. BIBLIOGRAPHY 120

[Scott et al., 2004] Scott, S. D., Carpendale, M. S. T., and Inkpen, K. M. (2004). Territoriality in collaborative tabletop workspaces. In Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 294–303. ACM.

[Sekhavat, 2017] Sekhavat, Y. A. (2017). Privacy preserving cloth try-on using mobile augmented reality. IEEE Transactions on Multimedia, 19(5):1041–1049.

[Seneviratne et al., 2014] Seneviratne, S., Seneviratne, A., Mohapatra, P., and Ma- hanti, A. (2014). Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE Mobile Computing and Communications Review, 18(2):1–8.

[Shokri et al., 2011] Shokri, R., Theodorakopoulos, G., Le Boudec, J.-Y., and Hubaux, J.-P. (2011). Quantifying location privacy. In 2011 IEEE symposium on security and privacy, pages 247–262. IEEE.

[Shu et al., 2016] Shu, J., Zheng, R., and Hui, P. (2016). Cardea: Context-aware visual privacy protection from pervasive cameras. arXiv preprint arXiv:1610.00889.

[Simkin et al., 2014] Simkin, M., Schr¨oder, D., Bulling, A., and Fritz, M. (2014). Ubic: Bridging the gap between digital cryptography and the physical world. In European Symposium on Research in Computer Security, pages 56–75. Springer.

[Sluganovic et al., 2017] Sluganovic, I., Serbec, M., Derek, A., and Martinovic, I. (2017). Holopair: Securing shared augmented reality using . In Proceedings of the 33rd Annual Computer Security Applications Conference,ACSAC 2017, pages 250–261, New York, NY, USA. ACM.

[Speciale et al., 2019] Speciale, P., Schonberger, J. L., Kang, S. B., Sinha, S. N., and Pollefeys, M. (2019). Privacy preserving image-based localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5493– 5503.

[Sun et al., 2009] Sun, J., Ovsjanikov, M., and Guibas, L. (2009). A concise and provably informative multi-scale signature based on heat di↵usion. In Computer graphics forum, volume 28, pages 1383–1392. Wiley Online Library.

[Szalav´ari et al., 1998] Szalav´ari, Z., Eckstein, E., and Gervautz, M. (1998). Collab- orative gaming in augmented reality. In Proceedings of the ACM symposium on Virtual reality software and technology, pages 195–204. ACM. BIBLIOGRAPHY 121

[Szalav´ari and Gervautz, 1997] Szalav´ari, Z. and Gervautz, M. (1997). The personal interaction panel–a two-handed interface for augmented reality. In Computer graph- ics forum, volume 16. Wiley Online Library.

[Szczuko, 2014] Szczuko, P. (2014). Augmented reality for privacy-sensitive visual monitoring. In International Conference on Multimedia Communications, Services and Security, pages 229–241. Springer.

[Templeman et al., 2014] Templeman, R., Korayem, M., Crandall, D. J., and Kapadia, A. (2014). Placeavoider: Steering first-person cameras away from sensitive spaces. In NDSS.

[Thomas and Piekarski, 2002] Thomas, B. H. and Piekarski, W. (2002). Glove based user interaction techniques for augmented reality in an outdoor environment. Virtual Reality, 6(3):167–180.

[Truong et al., 2005] Truong, K., Patel, S., Summet, J., and Abowd, G. (2005). Pre- venting camera recording by designing a capture-resistant environment. UbiComp 2005: Ubiquitous Computing, pages 903–903.

[Uy and Lee, 2018] Uy, M. A. and Lee, G. H. (2018). Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR).

[Venkatasubramanian et al., 2010] Venkatasubramanian, K. K., Banerjee, A., and Gupta, S. K. S. (2010). Pska: Usable and secure key agreement scheme for body area networks. IEEE Transactions on in Biomedicine, 14(1):60– 68.

[Vilk et al., 2015] Vilk, J., Molnar, D., Livshits, B., Ofek, E., Rossbach, C., Moshchuk, A., Wang, H. J., and Gal, R. (2015). Surroundweb: Mitigating privacy concerns in a 3d web browser. In Security and Privacy (SP), 2015 IEEE Symposium on, pages 431–446. IEEE.

[Vilk et al., 2014] Vilk, J., Molnar, D., Ofek, E., Rossbach, C., Livshits, B., Moshchuk, A., Wang, H. J., and Gal, R. (2014). Least privilege rendering in a 3d web browser. Microsoft Research Technical Report MSR-TR-2014-25.

[Waegel, 2014] Waegel, K. (2014). [poster] a reconstructive see-through display. In Mixed and Augmented Reality (ISMAR), 2014 IEEE International Symposium on, pages 319–320. IEEE. BIBLIOGRAPHY 122

[Wagner and Eckho↵, 2018] Wagner, I. and Eckho↵, D. (2018). Technical privacy metrics: A systematic survey. ACM Comput. Surv., 51(3):57:1–57:38.

[Wang et al., 2017] Wang, J., Amos, B., Das, A., Pillai, P., Sadeh, N., and Satya- narayanan, M. (2017). A scalable and privacy-aware iot service for live video ana- lytics.

[Woo et al., 2012] Woo, G., Lippman, A., and Raskar, R. (2012). Vrcodes: Unobtru- sive and active visual codes for interaction by exploiting rolling shutter. In Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, pages 59–64. IEEE.

[Wu and Balakrishnan, 2003] Wu, M. and Balakrishnan, R. (2003). Multi-finger and whole hand gestural interaction techniques for multi-user tabletop displays. In Pro- ceedings of the 16th annual ACM symposium on User interface software and tech- nology, pages 193–202. ACM.

[Wu et al., 2018] Wu, Y., Yang, F., and Ling, H. (2018). Privacy-protective-gan for face de-identification. arXiv preprint arXiv:1806.08906.

[Xu et al., 2015] Xu, C., Pathak, P. H., and Mohapatra, P. (2015). Finger-writing with smartwatch: A case for finger and hand gesture recognition using smartwatch. In Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications, pages 9–14. ACM.

[Xu et al., 2008] Xu, Y., Gandy, M., Deen, S., Schrank, B., Spreen, K., Gorbsky, M., White, T., Barba, E., Radu, I., Bolter, J., et al. (2008). Bragfish: exploring physical and social interaction in co-located handheld augmented reality games. In Proceed- ings of the 2008 international conference on advances in computer entertainment technology, pages 276–283. ACM.

[Xu and Zhu, 2015] Xu, Z. and Zhu, S. (2015). Semadroid: A privacy-aware sensor management framework for smartphones. In Proceedings of the 5th ACM Conference on Data and and Privacy, pages 61–72. ACM.

[Yao, 1986] Yao, A. C.-C. (1986). How to generate and exchange secrets. In Foun- dations of Computer Science, 1986., 27th Annual Symposium on, pages 162–167. IEEE.

[Yerazunis and Carbone, 2002] Yerazunis, W. and Carbone, M. (2002). Privacy- enhanced displays by time-masking images. BIBLIOGRAPHY 123

[Zarepour et al., 2016] Zarepour, E., Hosseini, M., Kanhere, S. S., and Sowmya, A. (2016). A context-based privacy preserving framework for wearable visual lifel- oggers. In Pervasive Computing and Communication Workshops (PerCom Work- shops), 2016 IEEE International Conference on, pages 1–4. IEEE.

[Zhang et al., 2014] Zhang, L., Jung, T., Feng, P., Li, X.-Y., and Liu, Y. (2014). Cloud-based privacy preserving image storage, sharing and search. arXiv preprint arXiv:1410.6593.

[Zhao and Seah, 2016] Zhao, J. and Seah, H. S. (2016). Interaction in marker-less aug- mented reality based on hand detection using . In Proceedings of the 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum and Its Applications in Industry-Volume 1, pages 147–150. ACM.

[Ziad et al., 2016] Ziad, M. T. I., Alanwar, A., Alzantot, M., and Srivastava, M. (2016). Cryptoimg: Privacy preserving processing over encrypted images. In Com- munications and (CNS), 2016 IEEE Conference on, pages 570– 575. IEEE.

[Zyskind et al., 2015] Zyskind, G., Nathan, O., et al. (2015). Decentralizing privacy: Using to protect personal data. In Security and Privacy Workshops (SPW), 2015 IEEE, pages 180–184. IEEE. Appendix A

Definitions of the General Security and Privacy Properties

We define here the security and privacy properties employed in Chapter 2.†

1. Integrity – The data storage, flow, or process in MR is not and cannot be tam- pered or modified. This is to ensure that, for example, the visual targets are detected correctly and the appropriate augmentations are displayed accordingly. No unauthorized parties should be able to modify any of these elements in an MR system.

2. Non-repudiation – Any modification, or generation of data, flow, or process cannot be denied especially if the entity is essential or an adversary was able to perform such modifications or actions. When necessary, the modifier or generator of the action should be identified and cannot deny that it was their action. In privacy, however, the converse is desired.

3. Availability – All necessary data, flow, or process for an MR system should be available in order to satisfy and accomplish the targeted or intended service. An adversary should not be able to impede the availability of these entities or resources.

4. Authorization and Access Control – All actions or processes should be originated from authorized and verifiable parties. The same actions are also actuated ac- cording to their appropriate access privileges. Or only the augmentations from applications that have been authorized to deliver augmentations should be ren- dered. †These definitions were originally defined in our survey paper [de Guzman et al., 2019d].

124 A. Definitions of the General Security and Privacy Properties 125

For example, only the authorized application should be able to access the cup and its data in the sample MR environment in Figure 2.1. The integrity of the cup and its data should be ensured and, if modifications were done, the modifying agent cannot refuse (non-repudiate) their action. The resulting modification should also not lead to non-availability of the service.

5. Identification – All actions should be identified to the corresponding entity, i.e. user or party. In a security context, this is interrelated with authorization and authentication properties. Verified identities are used for authorizing access con- trol. Unidentified parties can be treated as adversaries to prevent unidentifiable and untraceable attacks. In sensitive situations, e.g. an attack has occurred, anonymity is not desired.

6. Authentication – Only the legitimate users of the device or service should be allowed to access the MR device or service. Their authenticity should be verified through an authentication method and, then, identification or authorization can follow after a successful authentication.

7. Confidentiality – All actions involving sensitive or personal data, flow, or process should follow the necessary authorization, and access control policies. Parties that are not authorized should not have access to these confidential elements. All elements can be assumed as confidential especially storage or processing of personal and re-identifiable data.

8. Anonymity & Pseudoymity – Entities should be able to remove their association or relationship to the data stored, flow, or process. Likewise, a pseudonym can be used to link the entities but should not be able to be linked back to specific identities. Moreover, an adversary should not be able to identify the user from combinations of information from these elements. However, in the security context, anonymity is not desired especially when adversaries need to be identified.

9. Unlinkability – Any link or relationship of the entity, i.e. user or party, to the data stored, flow, or process as well as with other entities (e.g. data to data, data to flow, and so on) cannot be identified or distinguished by an adversary. For example, co-located MR users, who want to keep their co-location (linking) information private, should not be identifiable by an adversary.

10. Undetectability & Unobservability – Any entities’ existence cannot be ensured or distinguished by an attacker; or an entity can be deemed unobservable or unde- tectable by an adversary; or the entity cannot be distinguished from randomly A. Definitions of the General Security and Privacy Properties 126

generated entities. For example, an MR game like Pokemon Go needs access to the camera view of the user device and determine the ground plane to place the Pokemon on the ground as viewed by the user, but the game does not need to know what other objects are within view. Unlinkability, undetectability, and unobservability properties can be ex- tended to include latent privacy protection. It is the protection of entities that are not necessitated by an application or service but can be in the same domain as their target entities. This includes bystander privacy.

11. Plausible Deniability – An entity should be able to deny that they are the originator of a process, data flow or data storage. This is a converse of the non-repudiation security property which is actually the corresponding threat for plausible deniability (or repudiation). However, plausible deniability is essential when the relationship of an entity (as originator) to stored or processed data should not be divulged; on the other hand, non-repudiation is essential if, for example, when an action needs to be identified. In the case of the cup and its data in the previous example, if the authorized application connects with a health insurance provider and transmits aggregated anonymized data of the cup, the resulting aggregated data should not be linkable to the users from which they originated. Hence, the users are provided with plausible deniability for user-specific sensitive information that may have been included in the aggregated transmission, or, with the help of machine learning, inferred by, say, the health insurance provider.

12. Content Awareness – The user as an entity should be aware of all data flows or processes divulged especially those that are sensitive. And that they should be aware that they have released the information that was necessary for providing the required functionality. In the MR context, for example, the user should be made aware that the MR application not only captured the spatial information around them but also the visual information.

13. Policy and consent Compliance – An MR system should follow the policies that aims to protect the user’s privacy or security. There should be a guarantee that third-party applications or services follow these policies. Appendix B

Preliminary work on 3D Description and Inference

For our preliminary work which we published in [de Guzman et al., 2019b], we used a slightly di↵erent framework from what was described in Chapter 3. We initially focused on inter-space privacy using an predecessor version of the NN-matcher at- tacker. In this appendix chapter, we expound our preliminary exploration on 3D spatial description, and how it was utilized to demonstrate the spatial privacy leakage.

B.1 Preliminary 3D privacy problem

We use the same notations listed in Table 3.1 in Section 3.1.

Defining the function utility. For a given functionality , an e↵ective mechanism G aims to make the resulting outputs y from the raw point cloud s and its privacy- M i i preserving versions ˜(i) similar, i.e. ysi ys˜(i) , or their di↵erence is small, D ; = ' S S ys ys˜ 0. Or in terms of a utility function Q which we intend to maximize (i.e. | i (i) |! e as close to 1 as possible if we assume that D ; 1), S S  e Q( ; )=1 D ; ,where = ( ). (B.1) S S S S S M S The most common functionalitye in MRe is the anchoringe of virtual 3D objects on to real-world surfaces (e.g. the floor, walls, or tables) which requires near-truth 3D point cloud representations to provide consistent anchored augmentations.

Defining the adversarial inferrer. An inferrer produces a hypothesis : i = i J H ⇤ about the true location i of a given set of point clouds, si⇤ ors ˜(i⇤), for any query space

127 B. Preliminary work on 3D Description and Inference 128 i (i.e. J : s or s˜ for any i : i = i) where the following inequality holds ⇤ i⇤ (i⇤) ⇤ !H ⇤ o o P (h : i⇤ = i s or s˜ ) >P(h : i⇤ = i ,foranyi = i s or s˜ ). (B.2) | i⇤ (i⇤) 6 | i⇤ (i⇤)

The privacy-utility problem. Consequently, we can now pose the following pri- vacy function ⇧ in terms of the error rate of the inferrer,

h : i = i ⇧( ; ) = mean | s˜ 6 s|, (B.3) S S iterations i |8 | which is simply the mean misclassificatione rate of an inferrer about the query space J i whose true identity is i .Adesired produces S that maximizes both the privacy s˜ s M ⇧ and the utility function Q. e Privacy and utility metrics. Now, we define the specific privacy and utility met- rics for this work. For privacy, we use the same notion of a high error rate as high privacy; thus, the same metric defined by Eq. B.3 holds. For utility, we use the same similarity definition defined by Eq. B.1 but define the specific components of the similarity function as

Q( ; ) = mean(↵ (1 s s˜ )+ (n ~ n ~ s)) (B.4) S S · || || · s · ˜ where the first componente is the 3D point similarity of the true/raw point S from the transformed point S, the second component are their normal vector similarity, and ↵ and are contribution weights where ↵, [0, 1] and ↵ + = 1. We set ↵, =0.5. e 2 We also insert a subjective acceptability metric [0, 1] like so 2

s s˜ 1 Q( ; ) = mean ↵ 1 d|| ||e + n ~ s n ~ s˜ 1 . (B.5) S S · · b · c  ✓ ◆ ✓ ◆ allowse us to specify the level of error or deviation of the released (i.e. generalized) spaces from the true space – any deviation beyond the set results to a zero utility. The range of Q( ; ) [0, 1]. S S 2 e B.2 Describing the 3D space

Collected 3D point clouds can be used as reference by an adversary to train an inference model. Features that describe and discriminate among 3D spaces are usually used for inference modelling, and there are considerable features in 3D point clouds for it to be directly used as a 3D descriptor, albeit a crude one, and it won’t be translation- and rotation-invariant by itself. Hence, invariant descriptors are necessary for adversarial inference models to be resilient to transformations. B. Preliminary work on 3D Description and Inference 129

n n x x x j

x j r

x θ i × z nx cx x i c r x φ

(a) Local spherical coordinate system for (b) Local cylindrical coordinate system for self-similarity descriptors spin image descriptors

Figure B.1: 3D coordinate systems

To provide invariance, we utilize existing 3D description algorithms.1 The curvature- reliant self-similarity (SS) descriptors [Huang and You, 2012] are very sensitive to point cloud variations, due to the curvature estimation. To counter this, we explored the use of non-curvature reliant spin image (SI) descriptors [Johnson and Hebert, 1998,John- son and Hebert, 1999]. SI descriptors only use the normal vector unlike the SS ap- proach which uses local curvature maxima for key point selection. Thus, a vanilla SI computes the descriptor for every point in the point cloud which produces a dense descriptor space. For our SI implementation, we extract key points and descriptors from the subsampled space by factor of 3 (Fig.B.3 shows that significant errors only appear at resolutions < 3) to create a lighter weight descriptor set. Also, the spinning e↵ect reduces the impact of variations within that spin which makes SI descriptors more robust compared to SS descriptors. Furthermore, as described in Section 4.3, plane generalization removes curvatures which makes its use as a geometric description information impractical. Validation of the inference performance of these descriptors are detailed in §B.3.3.

B.2.1 Self-similarity-based 3D descriptors

The description process of the self-similarity-based 3D descriptor starts with the com- putation of the local point xi’s self-similarity with its neighbours,

N = x ,j = i : x x

localsim(xi) = mean [↵ normalsim,xi + curvsim,xi ] , xj Nx · · 8 2 i where ↵ and are similarity coecients (which can be bounded by ↵ + = 1, be set to ↵ = =0.5), and

1 normal = ⇡ cos (n ~ n ~ ) /⇡ sim xi · xj curvsim =1⇥ c ~ x c ~ x . ⇤ | i j | Then, a point is chosen as a key point if its curvature is a local maxima. For every chosen key point, a spherical descriptor is computed using the self-similarity. To introduce rotation-invariance, a local coordinate reference as shown in Figure B.1a is computed: the key point xi is set as the origin, the z-axis is the normal vector n ~ xi ,the x-axis is the direction of the principal curvature vector c ~ xi , and the y-axis is the cross product of n ~ c ~ . The local reference system is used to create a binned spherical xi ⇥ xi descriptor (r, , ✓). We follow the binning from the reference paper which uses Bin(r) = 6 radial, Bin() = 8 longitudinal, and Bin(✓) = 6 latitudinal. Finally, these bins are then filled by the average local self-similarity of the points that fall within the bins and is within a specified spherical local region from the key point. For our implementation, we set the local spherical region to be r = 1 unit. These results to a 3D self-similarity descriptor with dimension 6 8 6. The resulting descriptors are maximum-normalized ⇥ ⇥ which makes the highest descriptor value to be 1.

B.2.2 Spin Image 3D descriptors

For our spin image implementation, we implement a pseudo key point selection pro- cess by using a sub-sampled point cloud space (sampling factor of 3) as our chosen key points instead of the complete point cloud. This reduces the likelihood of hav- ing neighbouring points having exactly the same descriptors. Then, to compute the descriptor for every key point, we create a local cylindrical (r, z) coordinate system as shown in Figure B.1b with the key point xi as the origin and the normal vector n ~ xi as the z-axis. From the local cylindrical coordinate system, a binned descriptor with Bin(r) = 10 radial bins, and Bin(S) = 20 (10 for +z direction, and another 10 for -z direction) latitudinal/elevation bins are filled with the number of points that e fall within the (r, z)bins.Thespin comes from the fact that spinning the key point about its normal ( 0, 2⇡ ) will have no e↵ect on the computed descriptor. cylindrical 2{ } Similarly, we also maximum-normalize the spin image descriptors to make the highest value be 1. B. Preliminary work on 3D Description and Inference 131

B.3 Inferring the 3D space

For the adversarial inference model, we built two types of inferrers: (1) a baseline 3D Bayesian inference model using directly the 3D point cloud data, and (2) a matching- based inference model using the rotation-invariant descriptors.

B.3.1 Bayesian Inference model using the point cloud

To create the 3D Bayesian inference model, we use a 4-dimensional matrix with di- mension S X Y Z where S = the number of spaces, and the 3D bins i ⇥ bins ⇥ bins ⇥ bins i are all pre-set, i.e. Xbins = Ybins = Z =bins = 500 (250 for each +/- direction). We fill these bins using the number of points that fall within the defined ( x , y , z ) and b c b c b c divide the total of every space to create a likelihood model P (X i) for every space i| i,whereX is the collection of point clouds, i.e. (x, y, z) , of space i. Now, a 3D i { } inference model can be formulated as the maximization of the posterior defined in Eq. B.2, or as follows:

max P (h : i⇤ = i X(i =?)) i | ⇤ which aims to find a hypotheses : i = i about the query space i that gives the H ⇤ ⇤ maximum conditional probability of

P (Xi⇤ h : i⇤ = i) P (i) P (h : i⇤ = i X )= | ⇥ | (i⇤=?) P (X) given a set of points X .Theaprioriprobabilities are set to P (i)= 1 while i⇤ Si the denominator P (X) (called the marginal likelihood) is the total probability of the likelihood for all known points, i.e. X i, or specifically i8

P (X)= P (X h : i⇤ = i) P (i). i⇤ | · Xi The resulting function to be maximized is as follows

P (X h : i = i) P (i) arg max i⇤ | ⇤ · . (B.6) P (X h : i = i) P (i) i i i⇤ | ⇤ · P B.3.2 Inference using the rotation-invariant descriptors.

It is challenging to create a straightforward 3D inference model as we would have in a 3D Bayesian model.2 As a work around, we utilize the standard matching-based

2For example, our spin image description implementation have 200 (i.e. 10 20) dimensions; it’ll ⇥ require 10200 bins for every key point to be described if we are to approximate that each dimension will have 10 bins. B. Preliminary work on 3D Description and Inference 132 approach that is used over high-dimensional descriptors. This approach is rather deterministic as opposed to the probabilistic Bayesian inference model. This deterministic approach used for the rotation-invariant descriptors utilizes a matching-based voting mechanism with a reference set of descriptors to determine a match; then, nearest neighbor distance ratio (or NNDR) is used to qualify a match. Thus, instead of the probabilistic maximization described in Eq. Equation B.2, we utilize this NNDR-based approach for deterministic inference.

Feature matching using the rotation-invariant descriptors A matching func- tion ⌥ maps two sets of features f and f , of spaces a and b, like so: ⌥ : f f . a b a 7! b To determine good matches, we use the descriptor Euclidean distance as a measure of their similarity. To accept a match for a key point xa,1 with feature fa,1 of an unknown query space a = i⇤, we get the nearest neighbor distance ratio (NNDR) of the features like so: f f || a,1 b,1||

fx fxi |{ i⇤ 7! }|, fx |{ i⇤ }| where fx fxi is the number of matched descriptors of an unknown query space |{ i⇤ 7! }| xa=i from one of the known reference spaces xb=i,i i, and fx is the number of ⇤ 28 |{ i⇤ }| key points or descriptors extracted from the query space xi⇤ . This allows us to create a hypothesis, i.e. : i = i, also via argument-maximization as follows, H ⇤ fx fx i⇤ i arg max 1 mean fxi fxi |{ 7! }| , (B.7) f f ⇤ i x xi {|| ||} · fxi ⇣ { i⇤ 7! } ⌘ |{ ⇤ }| where the first product term is the mean similarity (i.e. 1 - mean di↵erence)whilethe second term is the Bayesian-inspired weight.

B.3.3 Validating the inference models.

We conducted a preliminary validation to check the e↵ectiveness of the chosen de- scription and inference approaches. To validate our inference models, we feed them the same data as queries. B. Preliminary work on 3D Description and Inference 133

(a) Bayesian (res = 100) (b) Bayesian, rotated queries

(c) Self Similarity (d) Spin Images

Figure B.2: Inference performance heatmaps of the di↵erent 3D description approaches B. Preliminary work on 3D Description and Inference 134

Figure B.3: Performance of the di↵erent 3D description/inference for di↵erent resolutions

Using the Bayesian Inference model. When complete versions of the set of points xi for each space i is given as a query data, the baseline Bayesian inference model performs very well as shown by the solid yellow diagonal in the heatmap/confusion matrix in Figure B.2a. Figure B.3 shows the results of varying the resolution from 1 res < 20. For un-rotated query spaces, the Bayesian inference model only starts  to have errors at resolutions 10, while its error rate for rotated query spaces is  0.8 for all resolutions. As we have indicated earlier in Section B.2, the baseline inference model is not rotation-invariant and it is clearly observed here. For example, Figure B.2b shows a heat-map for a lower resolution of res = 10 with rotated query spaces; we can not see a distinguishable diagonal to signify good inference performance.

Using the rotation-invariant descriptors. With un-rotated query spaces, the SS descriptors’ maximum error rate is only 0.4 as shown in Figure B.3, while the SI descriptors stays 0 even at the smallest resolution of 1. With rotated query spaces, errors increased for both but significant errors (i.e. 0.1) only appear at res 3 for  the SI descriptors, while errors for the SS descriptors already appear even at higher resolutions of res 14.  The excellent performance of the spin image descriptors can be better visualized with the heatmaps shown in Figure B.2 with res = 10. As can be observed, the spin images discriminates well as demonstrated by the clearer diagonal in Figure B.2d as compared to B.2c. Thus, in the attack methods described in Section 4.2.2, we only used spin image descriptors.

B.4 Memory compactness of descriptors and inference models

Another interesting aspect is how a very good inferrer can be constructed at a low resolution res 10 with discriminative performance similar to that of higher resolu-  B. Preliminary work on 3D Description and Inference 135

Figure B.4: Used memory by inference models and descriptors extracted from di↵erent point cloud resolutions. tions (see Fig. B.3). As shown in Fig. B.4, the memory size exponentially increases as we increase the resolution. A baseline Bayesian inference model with a low resolu- tion of 15 requires a memory size of about 128MB. This memory usage is undesirably huge due to the almost complete representation of the point in 3D space. However, we can take advantage of the sparsity of the data points to make it compact. The memory usage by the compact representation is also shown in Fig. B.4. At res = 15, the compact memory usage is now just 1.30 MB from the original 128 MB – almost 2 orders of magnitude smaller. For the rotation-invariant descriptors, at res = 15, a corresponding set of SS de- scriptors takes about 10.19 MB, but a corresponding set of SI descriptors – which, anyway, performs better than SS descriptors – with a fixed descriptor size is as com- pact as the baseline inference model (that is not rotation-invariant) at only 1.58 MB. In fact, we used res = 3 (as previously stated in Section B.2) for the descriptors used in the inference evaluation discussed in the previous subsections. Thus, any MR application (trusted or not) with access to 3D data produced by the user’s MR device can eciently create a lightweight inference model of the user’s space.3 Specifically, a compact and ecient inferrer of 3D spaces can be created from raw point cloud data released by any MR-capable device (which, now, can be any device with a vision sensor and adequate processing power).

3For reference, the original point-cloud data is about 13 MB; thus, our inferrer is a much more compact representation of the point-cloud data at res = 15. Appendix C

Plane Generalization

Our RANSAC plane generalization, shown in Alg. 3, mainly follows the described al- gorithm in [Fischler and Bolles, 1981] except for the normal estimation which we skip and instead use the estimated normal vectors directly provided by the spatial mesh produced by the Hololens.

Algorithm 3: RANSAC algorithm [Fischler and Bolles, 1981]

1 F the number of planes to find = 30 2 T the point-plane distance threshold = 0.05 3 R the number of RANSAC trials = 100

Data: X = x1,x2,...,xn ,asetof3Dpoints { } Result: P = px : xp ,xp ,... , a set of planes (a 3D point, and a normal) and their { m { 1 2 }} associated co-planar points

4 for f 1 to F do

5 bestPlane = 0, 0 { } 6 bestPoints = {} 7 for r 1 to R do

8 S = s1 = a point at random from X

9 thisP lane = s1,normals { 1 } 10 thisP oints = {} 11 for xi X do 2 12 if (distance(thisP lane, xi) T) then  13 thisP oints thisP oints + xi

14 if thisP oints > bestP oints then | | | | 15 bestPlane thisPlane

16 bestPoints thisPoints

17 P P + bestP lane, coP lanarT ransformed(bestP oints) { } 18 X X bestP oints

136 C. Plane Generalization 137

On the other hand, the algorithm for the locally-originated plane generalization, shown in Alg. 4, is a crude and simplified generalization which removes the point (Line 12) and plane (Line 14) discrimination process from RANSAC.

Algorithm 4: Locally-originated plane generalization

1 F the number of planes to find = 30 2 r the radius of the local region (e.g. 0.5)

Data: X = x1,x2,...,xn ,asetof3Dpoints { } Result: P = px : xp ,xp ,... , a set of planes (a 3D point, and a normal) and their { m { 1 2 }} associated co-planar points

3 for f 1 to F do

4 S = s1 = a point at random from X

5 thisP lane = s1,normals { 1 } 6 thisP oints = xi X : xi s1 r { 2 | | } 7 P P + thisP lane, coP lanarT ransformed(thisP oints) { } 8 X X thisP oints