<<

IT SecurITy

Protected Web Components: Hiding Sensitive Information in the Shadows

Philippe De Ryck, Katholieke Universiteit Leuven, Belgium Nick Nikiforakis, Stony Brook University Lieven Desmet, Frank Piessens, and Wouter Joosen, Katholieke Universiteit Leuven, Belgium

Third-party code inclusion is rampant, potentially exposing sensitive data to attackers. Protected Web components can keep private data safe from opportunistic attacks by hiding static data in the (DOM) and isolating sensitive interactive elements within a Web component.

he Web has evolved from including have severe consequences if the included code static images and document links to doesn’t behave correctly. comprising Web applications with in- Consequently, by including potentially un- dividual components provided by trusted remote scripts, a Web application de- T numerous service providers. When a Web ap- veloper accepts a certain risk, both for the site’s plication incorporates third-party components integrity and for the safekeeping of user data. using remote scripts, the user’s browser will run Opportunistic attacks on the client-side content the third-party code within the security context of a Web application can be mitigated by hiding of the Web application. This not only exposes the private data and sensitive elements from poten- code’s functionality to the Web application but tially malicious scripts. For example, iframes sup- also gives the included code full access to the Web port content isolation in a webpage, albeit with a application’s client-side context, including the large overhead and a lack in flexibility for integra- page’s content, local data, and origin-protected tion in highly dynamic, visually streamlined Web functionality. This lack of code isolation can applications. Alternatively, JavaScript sandboxing

36 IT Pro January/February 2015 Published by the IEEE Computer Society 1520-9202/15/$31.00 © 2015 IEEE techniques support code isolation,1,2 but don’t ­extracting security tokens and session identifi- offer isolation of data in the Document Object ers. Even when developers carefully select only Model (DOM).3 Finally, the recent Web Com- trusted third parties for remote script inclusion, ponents specification lets developers instantiate a certain risk persists, because third-party pro- custom HTML tags for use within the page.4 A viders can be compromised as well. The dangers major feature of such custom elements is the of third-party script inclusions are best illus- support for a hidden DOM, known as the Shad- trated by real-world examples, such as on-screen ow DOM.5 Unfortunately, the Web components keyboard scraping ,7 malware spread specification focuses on functional separation of through advertisements,8 or actual compromises the DOM and doesn’t offer security features or of third-party providers.9,10 code isolation. An opportunistic attacker can gain access to Here, we motivate the need for a flexible mech- the Web application’s client-side context through anism that supports the isolation of the user’s several attack vectors—for example, by compro- private data in the DOM, as well as the isolation mising a remotely included script or advertise- of sensitive elements, such as input elements of ment, or through a cross-site scripting attack a login form. Furthermore, we investigate the (XSS). Because of the wide variety of sites that properties of the Web components specification, can be compromised through a malicious script and show that there’s a potential for offering the desired level of isolation without compromis- ing the much needed flexibility of modern Web Even when developers carefully select applications. only trusted third parties for remote Use Cases and Existing Technologies script inclusion, a certain risk persists. Integrating third-party components using re- mote scripts is common on the Web. Examples include programming APIs and development frameworks (such as JQuery and Bootstrap), or advertisement, opportunistic attackers carry advertising services (such as DoubleClick and out nontargeted attacks, such as looking for input AdSense), Web analytics tools (such as Google elements of the type , or scraping any us- Analytics), and social media plug-ins (such as er-specific displayed content, such as email mes- Facebook’s “like” button). A 2012 study of re- sages, health records, and bank statements. mote JavaScript inclusions on the Alexa top 10,000 sites showed that 88.45 percent include Use Cases at least one remote script, and one site even in- In light of the opportunistic attacker model, we cluded scripts from 295 remote hosts.6 Further- propose three general use cases that benefit from more, 68.37 percent of sites included the Google effectively isolating data or HTML elements Analytics library, and 79.74 percent included at within the browser. least one Google library. Finally, the study ap- plied a set of metrics to show that 12 percent Displaying sensitive information. Many Web of sites that were deemed security conscious applications process and display user-specific included scripts from sites that deployed weak information, which is often considered private security measures. and sensitive. Common examples of such pri- Including remote scripts not only creates a vate data are email messages, chat conversations, vector for attacks targeting a specific Web ap- bank statements, and security challenges. Op- plication, but it also presents an attack vector portunistic attackers can easily inspect and col- for opportunistic attackers, who aim to execute lect such sensitive information because it isn’t low-profile attacks on a large number of Web isolated from the rest of the page, which includes applications. Such attacks can yield large quan- third-party scripts. tities of sensitive information—for example, by An effective isolation mechanism for in-appli- scraping the webpage’s user-specific content, re- cation content could prevent inspection or col- cording user-provided input in form fields, and lection by an opportunistic attacker.

computer.org/ITPro 37 IT Security

Table 1. Six of the seven highest ranking free online password managers include at least one remote script on the user password page. ­components with known vulnerabilities” ninth Search No. of remote place.12 A similar initiative, the CW E/SA NS Top ranking Name scripts 25 Most Dangerous Software Errors, puts “inclusion 1 PassPack 1 of functionality from untrusted control sphere” 13 3 LastPass 1 at the 16th spot. 4 Norton Identity Safe 4 To support the high rankings in these indus- try surveys, and to establish the relevance of the 5 Keeper 1 aforementioned use cases, we conducted two rel- 8 1 atively small-scale experiments. To support the 10 Clipperz 0 use cases for hiding sensitive data in the DOM, 16 Mitto 1 we investigate popular online password manag- ers, where the DOM holds all of the user’s - words to every website. The second experiment supports the use case for protecting sensitive in- Protecting security tokens. A variant of dis- put elements by measuring the exposure of login played private information are application-­ forms to third-party script providers. related, hidden security tokens, often associated with a user’s session. For example, the security Password managers. Online password manag- tokens protecting against cross-site request forg- ers are used to store the multitude of authenti- ery (CSRF) attacks are embedded as hidden form cation credentials required on the modern Web. elements.11 This private and highly sensitive data is often Hiding such security tokens from opportu- even stored in an encrypted container, which is nistic attackers raises the security level of the decrypted at the client side when the client pro- applied countermeasures, thereby eliminating vides the correct master key. One might expect alternative attack vectors. that in such a sophisticated setup, the decrypted data is handled with care, preventing any risk of Protecting sensitive input elements. A third stolen or leaked data. use case focuses on protecting client-side input For seven online password managers, gath- elements, in contrast to hiding server-delivered ered from the top 20 results for the Google query content. Most Web applications contain sensitive “free online ,” we investigated input elements, such as HTML password elements whether they include scripts from a third-par- and on-screen keyboards. Opportunistic attack- ty on the page that hosts the in the ers can easily gather sensitive user-provided data DOM, giving these scripts full access to the us- by using generally applicable selectors for sensi- er’s credentials. As Table 1 shows, six of the seven tive input elements. (86 percent) include third-party scripts from at Isolating such sensitive input elements from least one remote host on the page that displays opportunistic attackers ensures that user-provid- the user’s passwords. The Ghostery browser ex- ed input cannot easily be stolen with a nontarget- tension (https://www.ghostery.com/en/) consid- ed attack. Note that such an isolation mechanism ers all scripts to be analytics. Additionally, two must extend toward event handlers associated password managers include scripts from addi- with isolated input elements. tional remote hosts on their main page, which is situated within the same origin as the sensitive Motivating Empirical Evidence page. The inclusion of potentially untrusted third- party code into a Web application is a common Login forms. Almost every webpage has a login though potentially dangerous practice.6 Two im- form, which are a trivial target from which an op- portant industry-driven surveys of the most criti- portunistic attacker can extract user credentials. cal software errors warn of this risk. The Open We crawled the Alexa top 1,000 sites, looking for Web Application Security Project (OWASP) login forms situated on a page with third-party Top Ten Project, which lists the 10 most dan- script inclusions, thereby giving the third party gerous risks for Web applications, gives “using full access to the login form.

38 IT Pro January/February 2015 1.0

0.8 We found that 52 percent of the websites included a login form, and 0.6 all of them included at least one third-party script in the login page. 0.4 Of the sites with a login form, 40 percent included scripts from more 0.2 than five different third-party hosts. 0.0

Figure 1 shows the right-skewed % of Alexa websites with login forms distribution of login pages includ- 01020 30 40 ing scripts from remote hosts, with #Unique remote hosts providing JS files an average number of 3.4 hosts on a login page, and an extreme of one Figure 1. Empirical cumulative distribution function (ECDF) of the login page including code from 36 percentage of login pages of popular Alexa sites, and the number of different remote hosts. These num- unique remote hosts from which they request JavaScript code. bers indicate that a scenario with an opportunistic attacker targeting login forms is, unfortunately, very plausible. DOM separation. The Web Components speci- fication combines a set of technologies allowing Existing Technologies the creation of custom HTML elements.4 One Several technologies are relevant when discuss- interesting technology is the shadow DOM, ing third-party script inclusion and content which allows custom elements to hide their in- separation. ternal DOM structure from the outside world.5 One currently deployed example is the HTML5 Document isolation. Web developers can use video element, which features a control bar with frames or iframes to isolate content in separate play/pause buttons. The internals of the video el- documents to varying degrees, depending on ement are implemented using traditional HTML the associated origins. Placing data in a docu- elements but are hidden from the webpage and ment with a different origin from the main the user via the shadow DOM. document effectively offers both DOM-based The shadow DOM is well suited to hiding con- and script-based isolation, and further restric- tent in the DOM but doesn’t prevent later access, tions are available through the HTML5 sandbox nor does it offer script-based isolation properties. attribute. Document-based isolation offers strong secu- Protected Web Components rity guarantees but has a rigid, block-level struc- Web components are the most viable starting ture, making it less attractive for modern Web point for creating a protection mechanism for applications. Additionally, frames with different private data and sensitive elements against op- origins require a separate roundtrip to fetch the portunistic attackers.4 They offer the required content, causing a delay in page load times. flexibility to cope with the highly dynamic re- quirements of modern Web applications, as JavaScript sandboxing. Driven by the rise in opposed to iframes, and already possess the ca- remote script inclusions, script-based sand- pability to host a separate DOM tree using the boxing techniques are being developed and shadow DOM, a property that is hard to achieve deployed.1,2 By isolating a remote script in a using JavaScript sandboxing technologies. sandbox, developers gain fine-grained control To leverage Web components to create pro- over its capabilities, thereby preventing the tected Web components, we must be able to hide script from misbehaving. static data in the DOM tree, without it being ac- Although sandboxing techniques can effec- cessible to opportunistic attackers. Second, pro- tively be used to contain remote scripts, they typ- tected Web components should be able to host ically don’t provide a way to isolate parts of the interactive elements, without being vulnerable to DOM, making it difficult to secure the described script-based compromises—for example, through use cases. function-overriding or prototype-poisoning

computer.org/ITPro 39 IT Security

HTML elements using the shadowRoot property, and composed into a single DOM tree during the rendering pro- cess. The main document and any embedded shadow DOM trees are functionally separated, limiting the propagation of Cascading Style Sheets (CSS) or selectors between the main document and the subtrees, in both directions. Shadow DOM trees are al- ready used to implement browser con- (a) (b) trols, such as the playback bar for the video element, and can also be used by a developer through a JavaScript API. Figure 2. Protected Web components for data security: (a) a password Note that the browser’s internal shadow manager page containing private data and sensitive elements, DOM trees are not accessible through together with a third-party advertisement, without any isolation or the shadowRoot property, whereas devel- protection; (b) the effect of using protected Web components. oper-created shadow DOM trees re- main accessible from JavaScript. Unfortunately, the latter property of script- var protected = document.createElement(‘div’); defined shadow DOM trees conflicts with the var root = protected.createShadowRoot(); goal of hiding static data in the DOM. How- //Append data to the root ever, by redefining the getter of the shadowRoot root = null; property, developers can make their script- Object.defineProperty(protected, “shadowRoot”, defined DOM trees inaccessible to JavaScript. { get: function() { return null; }}); Figure 3 shows the creation and population of a shadow DOM, and the overriding of the get- Figure 3. Data can be hidden in the shadow ter to return null instead of a reference to the DOM by clearing existing references and shadow DOM. redefining the only access point. After redefining the getter and wiping all exist- ing references to the shadow DOM, it’s no longer possible to directly access the data stored in the ­attacks. In this section, we explain how shadow shadow DOM. Therefore, instantiating an inac- DOM trees can be permanently hidden by taking cessible shadow DOM tree with sensitive data advantage of ECMAScript 5 getters, and elaborate before loading untrusted code ensures that the on techniques that can be used to isolate script private data will never be exposed to opportunis- code within a hidden tree. Figure 2 illustrates the tic attackers. use of protected Web components in a password manager. Isolating Interactive Scripts The third use case aims to protect sensitive input Hiding Static Data elements from untrusted scripts. Sensitive input The goal of the first and second use cases was to elements are usually part of a form, and they typ- embed private, user-specific data into the DOM ically depend on JavaScript handlers for interac- tree, without exposing it to an ­opportunistic tive input processing and validation. ­attacker, who uses DOM manipulation tech- Although the shadow DOM is ideally suited niques to extract potentially sensitive infor- to isolating elements from the rest of the page, mation. Such techniques include the use of a problem arises when these elements use Ja- JavaScript DOM APIs, stylesheet operations, and vaScript handlers for processing input events. custom selectors. The shadow DOM offers functional separation The shadow DOM supports the creation of sepa- but doesn’t instantiate a separate JavaScript rate DOM trees, which are attached to traditional context, leaving the JavaScript code defined

40 IT Pro January/February 2015 (function() { var getElement = document.getElementById; var data = getElement(“shadowInput”).textContent; //... })() in the shadow DOM vulnerable to several at- tacks, such as function overriding and proto- type poisoning. Figure 4. By using closures and known good copies of To obtain protected Web components, the functions, scripts can be isolated within the shadow shadow DOM’s script code needs to be effective- DOM. ly isolated, not only to prevent JavaScript func- tions and variables from leaking into the global Protecting sensitive input elements. Sensi- namespace, but also to prevent the use of po- tive input elements capture user input and can tentially contaminated functions defined in the be a target for opportunistic attackers. These ele- global namespace or Object prototypes. Obtain- ments can be placed in a secure Web component ing this isolation in the current shadow DOM re- as well, preventing direct querying by an attacker. quires two separate steps. First, any code within If these input elements depend on script-based the shadow DOM should be encapsulated in a handlers for validation, autocompletion, and so separate namespace, which is possible in JavaS- on, the handler code must be part of the secure cript through the correct use of closures. Second, Web component as well. the use of potentially contaminated functions The protected Web components not only fit the can be prevented by storing and using known three proposed use cases but also protect against op- good versions of the required functions, a tech- portunistic attackers in the two examples­ presented­ nique often used in JavaScript sandboxing and earlier. First, the online password managers can use policy enforcement mechanisms.14,15 Figure 4 is protected Web components to prevent deliberate a brief code snippet using closures and known or inadvertent extraction of the user’s credentials good functions. from the DOM, while preserving the possibility of Isolating the shadow DOM’s JavaScript code, including third-party scripts. In the second sce- in combination with overriding the shadowRoot nario, the login forms and associated handlers can getter, effectively supports HTML elements con- be embedded in a protected Web component, pre- taining sensitive data, while maintaining script- venting a curious or malicious script from stealing based interaction. the user’s credentials through input events.

Motivating Examples Revisited Protected Web components offer a strong mecha- lthough protected Web components offer nism to isolate data and sensitive elements within significant security benefits against a re- the DOM tree, without sacrificing the flexibility A alistic, ubiquitous opportunistic attacker, to place this data anywhere within the page, like they also have a limited impact. First, by embed- iframes do. These properties ensure that protect- ding sensitive elements in a secure Web compo- ed Web components are well suited to meet the nent, they are effectively separated from the rest three use cases described earlier. of the page, preventing any interactions, even from legitimate code within the page. Therefore, Displaying sensitive information. By embed- all code interacting with a sensitive element must ding sensitive data in a secure Web component, be loaded in the secure component. Typically, this using the shadow DOM to hide static data, we ef- code is closely tied to the element anyway, with fectively prevent an opportunistic attacker from validation handlers and autocompletion code as extracting the data in an automated way. an example. Continuing on these handlers, we regret that the full implementation burden rests Protecting security tokens. Because security once again with the developer. Therefore, we envi- tokens are often embedded in interactive ele- sion the Web components specification endorsing ments such as forms, they can be protected by two configurable extensions to the current model: placing the element inside a secure Web compo- nent. Security tokens, such as CSRF tokens, are • hiding a shadow DOM, where the shadowRoot at- part of the DOM, and the secure component will tribute doesn’t return a reference to the shad- prevent an opportunistic attacker from extract- ow DOM, similar to the current behavior of ing them. user-agent-created shadow DOMs, and

computer.org/ITPro 41 IT Security

• instantiating a new script context within the shad- 4. D. Cooney and D. Glazkov, “Introduction to Web ow DOM, ensuring that all scripts imported by Components,” W3C Working Group Note, 24 July the shadow DOM are separate from the host- 2014; www.w3.org/TR/components-intro. ing page. 5. D. Glazkov, “Shadow DOM,” W3C Working Draft, work in progress, June 2014. The latter extension is comparable to how Web 6. N. Nikiforakis et al., “You Are What You Include: workers also run in a separate context, enabling Large-Scale Evaluation of Remote JavaScript Inclu- messaging through a predefined interface. The sions,” Proc. 19th ACM Conf. Computer and Comm. Secu- possibility of instantiating new script contexts rity (CCS 12), 2012, pp. 736–747. in a shadow DOM also benefits the deployment 7. S. Mitchell, “IE Mouse-Tracking Flaw Allows Any- of Web components, because it prevents ­naming one to Steal Passwords,” PC Pro, 13 Dec. 2012; www. and scoping conflicts between the different im- pcpro.co.uk/news/security/378667/ie-mouse-tracking- ported components and the host page. The flaw-allows-anyone-to-steal-passwords. downside of instantiating a new script context is 8. C. Smith, “Yahoo Ad Malware Attack Far Greater the lack of shared global variables, requiring any Than Anticipated,” BGR, 13 Jan. 2014; http://bgr. libraries to be loaded in each context. com/2014/01/13/yahoo-malware-attack. Hiding private content and sensitive elements 9. “qTip2 Code Compromised,” Github, Incident Re- through Web components can help mitigate op- port, 8 Dec. 2011; https://github.com/Craga89/qTip2/ portunistic, nontargeted attacks, but it doesn’t issues/286. offer an airtight security solution. We consider 10. K. Zetter, “Google Hack Attack Was Ultra Sophisti- this approach to be part of the recent trend in cated, New Details Show,” Wired, 14 Jan. 2010; www. client-side security mechanisms, which signifi- wired.com/threatlevel/2010/01/operation-aurora. cantly improve the security of client-side aspects 11. N. Jovanovic, E. Kirda, and C. Kruegel, “Preventing of Web applications, often by applying the de- Cross-site Request Forgery Attacks,” Proc. 2nd Int’l fense-in-depth principle. Previously adopted ex- Conf. Security and Privacy in Comm. Networks (Secure- amples are the HttpOnly flag for cookies, which Comm 06), 2006, pp. 1–10. prevents several common session attacks; and 12. D. Wichers, “OWASP Top 10,” Open Web Applica- the Content Security Policy,16 which significant- tion Security Project (OWASP), 2013; www.owasp. ly raises the bar for typical cross-site scripting org/index.php/Category:OWASP_Top_Ten_Project. attacks. 13. B. Martin et al., “CWE/SANS Top 25 Most Danger- ous Programming Errors,” Common Weakness Enu- Acknowledgments meration, 2011; http://cwe.mitre.org/top25. This research is partially funded by the Agency for Innovation by 14. J. Magazinius, P.H. Phung, and D. Sands, “Safe Wrap- Science and Technology in Flanders (IWT), the Research Fund KU pers and Sane Policies for Self Protecting JavaScript,” Leuven, the IWT-SBO project SPION, and by the EU FP7 project Proc. 15th Nordic Conf. Secure IT Systems (NordSec 12), STREWS. The Prevention of and Fight against Crime Programme 2012, pp. 239–255. of the European Union (B-CCENTRE) also provided financial 15. L. Meyerovich and B. Livshits, “ConScript: Specify- support. ing and Enforcing Fine-Grained Security Policies for JavaScript in the Browser,” Proc. 31st IEEE Symp. Secu- References rity and Privacy (SP 10), 2010, pp. 481–496. 1. P. Agten et al., “JSand: Complete Client-Side Sand- 16. B. Sterne and A. Barth, Content Security Policy 1.0, boxing of Third-Party JavaScript without Browser World Wide Web Consortium (W3C) Candidate Rec- Modifications,” Proc. 28th Ann. Computer Security Ap- ommendation, 2012; www.w3.org/TR/CSP. plications Conf. (ACSAC 12), 2012, pp. 1–10. 2. L. Ingram and M. Walfish, “Treehouse: JavaScript Philippe De Ryck is a post-doctoral researcher in the Sandboxes to Help Web Developers Help Them- Computer Science Department at the Katholieke Univer- selves,” Proc. Usenix Ann. Technical Conf. (ATC 12), siteit Leuven, Belgium. He has recently finished his PhD 2012, pp. 153–164. on Web application security, with a specific focus on cross- 3. Developer Network, “Document Object site request forgery (CSRF), session management, and Model (DOM),” 2014; https://developer.mozilla.org/ ­JavaScript sandboxing techniques. He is the lead author en-US/docs/Web/API/Document_Object_Model. of Primer on Client-side Web Security, which gives a

42 IT Pro January/February 2015 broad overview of the current state of client-side security in Frank Piessens is a professor in the Department of Com- the Web. Contact him at [email protected]. puter Science at the Katholieke Universiteit Leuven, Belgium. His research interests include software security, and in par- Nick Nikiforakis is an assistant professor in the Computer ticular the development of high-assurance techniques to deal Science Department at Stony Brook University. His research with implementation-level software vulnerabilities and bugs, interests include Web application security and privacy, which he including techniques such as software ­verification, runtime usually approaches by looking at the Web as a series of inter- monitoring, type systems, and programming language de- connected ecosystems. Contact him at [email protected]. sign. Contact him at [email protected].

Lieven Desmet is a research manager of secure soft- Wouter Joosen is a professor in the Computer Science De- ware within the iMinds-DistriNet research group at the partment at the Katholieke Universiteit Leuven, Belgium. ­Katholieke Universiteit Leuven, Belgium. His research His research interests are aspect-oriented software develop- interests include software security, and in particular, Web ment, middleware, and software security. Joosen received a application security. Lieven received a PhD in computer sci- PhD in computer science from KU Leuven. Contact him at ence from the University of Leuven. He’s a board member [email protected]. of the Open Web Application Security Project’s Belgium chapter, and program director of the yearly SecAppDev training courses on secure application development. Contact Selected CS articles and columns are available him at [email protected]. for free at http://ComputingNow.computer.org.

IEEE_half_horizontal_Q6:Layout 1 4/21/11 4:21 PM Page 1

Experimenting with your hiring process? Finding the best computing job or hire shouldn’t be left to chance. IEEE Computer Society Jobs is your ideal recruitment resource, targeting over 85,000 expert researchers and qualified top-level managers in software engineering, robotics, programming, artificial intelligence, networking and communications, consulting, modeling, data structures, and other computer science-related fields worldwide. Whether you’re looking to hire or be hired, IEEE Computer Society Jobs provides real results by matching hundreds of relevant jobs with this hard-to-reach audience each month, in Computer magazine and/or online-only!

http://www.computer.org/jobs

The IEEE Computer Society is a partner in the AIP Career Network, a collection of online job sites for scientists, engineers, and computing professionals. Other partners include Physics Today, the American Association of Physicists in Medicine (AAPM), American Association of Physics Teachers (AAPT), American Physical Society (APS), AVS Science and Technology, and the Society of Physics Students (SPS) and Sigma Pi Sigma.

computer.org/ITPro 43