Security and Privacy of Augmented Reality Browsers

No Escape From Reality: Security and Privacy of Augmented Reality Browsers Richard McPherson Suman Jana Vitaly Shmatikov University of Texas at Austin University of Texas at Austin University of Texas at Austin [email protected] [email protected] [email protected] ABSTRACT Augmented reality (AR) browsers are an emerging category of mobile applications that add interactive virtual objects to the user’s view of the physical world. This paper gives the first system-level evaluation of their security and privacy properties. We start by analyzing the functional requirements that AR browsers must support in order to present AR content. We then investigate the security architecture of Junaio, Layar, and Wikitude browsers, which are running today on over 30 million mobile devices, and identify new categories of security and privacy vulnerabilities unique to AR browsers. Finally, we provide the first engineering guidelines for securely implementing AR functionality. 1 Introduction Figure 1: A Layar-based mobile app [7]. Augmented reality (AR) technologies enhance users’ perception of realistically blending them with real objects. The resulting AR the world by blending interactive virtual objects with the visual rep- content combines image recognition, geolocation, interactive vir- resentation of actual objects in real time [2, 3]. Traditional AR tual objects, conventional Web content, and control code written in applications range from medical visualization to aircraft naviga- JavaScript (see an example in Fig. 1). tion, but only recently have consumer mobile devices become suf- The basic architecture of AR services is shown in Fig. 2. From ficiently powerful to run AR software. the security and privacy perspective, its key aspect is that the AR AR applications have three stages: sensing input, transform- browsers provide augmentation mechanisms, but the actual AR con- ing sensed objects (e.g., adding virtual objects), and rendering the tent comes from channels created by independent developers. Just transformed objects to the user. Modern AR platforms ease the like a conventional Web browser is an interface between the user burden of implementing these tasks. By far the most popular plat- and Web content from independent websites, an AR browser is an forms are AR browsers like Junaio, Layar, and Wikitude, available interface between the user and independent AR content. An AR as SDKs or standalone mobile apps. Junaio has more than 20 mil- browser is thus responsible for ensuring that malicious AR content lion users and over 20,000 content developers who have created cannot access content from other sources, nor damage or abuse the more than 210,000 AR “channels” [14]. Layar has 1.5 million user’s system outside the browser. users and 9,000 content developers [18]. Wikitude has 13 million A major difference between Web browsers and AR browsers is users [29] and over 30,000 content developers. the business model. Web browsers are typically part of the stan- All existing AR browsers are based on Web browsers and are dard software distribution, and their developers are paid by the li- similar to them in the sense that they, too, fetch and display inter- censing fees from OEMs and OS owners and by the search engines. active content from websites (“channels,” in AR parlance). In ad- This model works because there is already a wealth of Web content. dition to rendering HTML and executing JavaScript, AR browsers AR browsers, however, need a different model because there is not provide support for the three key tasks necessary for AR func- much AR content available today. Their sources of revenue include tionality: sensing, transforming, and displaying transformed ob- advertising injected into AR content, registration fees from content jects. They enable AR channels to (1) access sensors on the mo- developers, and revenue sharing for paid content. This business bile device, including the onboard camera and GPS location, (2) model has an impact on the architecture of AR services: unlike create and manipulate a variety of 2D and 3D interactive virtual Web content, which is accessed directly from the Web browser, objects, and (3) display virtual objects on top of the camera feed, requests to load third-party AR content must go through the AR service provider, as shown in Fig. 2. Our contributions. We perform the first systematic analysis of the security and privacy properties of AR browsers and how they Copyright is held by the International World Wide Web Conference Com- differ from Web browsers. Untrusted AR content presents new, mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the unique types of threats, yet—in contrast to Web-browser specifica- author’s site if the Material is used in electronic media. tions—the latest Augmented Reality Markup Language (ARML) WWW 2015, May 18–22, 2015, Florence, Italy. ACM 978-1-4503-3469-3/15/05. specification [19] barely mentions security or privacy, and they are http://dx.doi.org/10.1145/2736277.2741657. often overlooked in the design of the existing AR browsers. We start by analyzing the functional requirements needed to support the sensing, transforming, and rendering of AR content. These include new ways of combining AR objects and conventional HTML content from multiple origins, new APIs for accessing objects outside the browser, new mechanisms for controlling the display of AR and HTML objects, and new ways of launching content. Then, for each functional requirement, we investigate how it is implemented by the existing AR browsers. All AR browsers are based on Web browsers, which do not support AR functionality, forcing AR browsers to resort to ad-hoc cross-origin mechanisms, APIs that open holes in the browser sandbox, custom techniques for composing visual content from different origins, and non-standard delegation schemes for authentication credentials. Architectural flaws in these mechanisms result in security and Figure 2: Architecture of a typical AR service. privacy vulnerabilities. We explore the threat model of AR browsers and demonstrate several new categories of threats caused by the Support for interactive, non-HTML AR content. In addition AR browsers’ unique combination of high-volume visual data gath- to HTML content such as images and text, AR content may in- ering, image-triggered code execution, outsourced image process- clude 2D and 3D models and animations that cannot be described in ing, and merging images from the onboard camera with third-party HTML. AR channels thus include service-specific XML or JSON content. For example, individual-specific items such as license defining how to place and render these objects. plates can automatically launch malicious AR content, enabling Image-triggered code execution. AR browsers access content in fully automated stalking and tracking; malicious AR channels can non-standard ways: they send images from the device’s camera to abuse image-triggered code execution; and a conventional webpage their servers, which attempt to recognize certain pictures and QR can hijack the AR browser installed on the user’s mobile device and codes and automatically launch the associated AR channels. use it to gain unauthorized access to the device’s camera and GPS Outsourced image processing. Image recognition is a computa- without the user’s permission. We also show AR browsers amplify tionally heavy task that may not be feasible on low-powered mobile existing threats such as cross-site scripting, clickjacking, cookie devices and often involves proprietary algorithms. Furthermore, stealing, and leakage of private information. image-based code execution requires the server to extract the trig- For each design flaw, we present our recommendations. Some ger image from the camera feed and match it against a proprietary are easy to fix, others require a substantial re-design, but none are database of registered images. Therefore, AR browsers send im- mere “bugs.” They all stem from the fact that standard system ages from the phone’s camera to the AR provider for processing. components used in today’s mobile and Web applications are in- sufficient to securely support AR functionality. For each functional Visual composition of AR content. The AR browser is respon- requirement of the AR browsers, we explain which features and sible for constructing a visual stack that combines non-HTML AR system abstractions are needed to implement it properly content, such as interactive 3D models, with HTML content from multiple origins (e.g., online ads) on top of the camera feed. 2 AR Services Indirect retrieval of AR content. Instead of directly fetching AR content from its developers, AR browsers typically submit requests AR services are deployed by AR service providers such as Junaio via the AR provider’s server. This enables providers to charge fees and Layar. These companies supply AR client software (we use for registration and usage, inject advertising, etc. the term AR browser) to users and maintain dedicated AR servers through which users access third-party AR content (see Fig. 2). AR 2.2 Components of AR services content providers are independent developers who create AR content, host it on their own servers, and register this content with AR AR browsers. Fig. 3 shows the generic architecture of an AR service providers. We use the term channel generically for any AR browser, including (1) one or more instances of an embedded Web content, but the actual terminology differs from service to service browser such as WebView, (2) a “native” component with direct (e.g., channels are called layers in Layar). access to OS-managed resources such as the camera and GPS loca- By analogy with conventional Web, AR service providers are tion, and (3) ad-hoc mechanisms for gluing these pieces together. similar to Web-browser developers, while AR channels are simi- AR channels. An AR channel is roughly similar to a website.

Load more