Analysis of UI Redressing Attacks and Countermeasures

Marcus Niemietz

www.nds.rub.de

Analysis of UI Redressing Attacks and Countermeasures

Marcus Niemietz Place of birth: Castrop-Rauxel, Germany Email: [email protected]

25th January 2019

Ruhr-University Bochum Horst G¨ortz Institute for IT-Security Chair for Network and Data Security

Dissertation zur Erlangung des Grades eines Doktor-Ingenieurs der Fakult¨atf¨urElektrotechnik und Informationstechnik an der Ruhr-Universit¨atBochum

First Supervisor: Prof. Dr. rer. nat. J¨org Schwenk Second Supervisor: Prof. Dr. rer. nat. Martin Johns

www.nds.rub.de For the last seven years, I had the chance to speak at over 35 international IT security conferences. Moreover, I am part of a small team which has established a popular lecture about Web security (HackPra) at the Ruhr-University in Bochum, an own annual non-profit IT security conference (RuhrSec), and an IT security start-up company (Hackmanit). Since I have started IT security research in 2011, I had the opportunity to contact and work with amazing people resulting in a great friendship. Next to my family, I would like to thank (in alphabetical order): Abraham Aranguren, Thorsten Holz, Tilman Frosch, Robert Hansen, Mario Heiderich, Brad Hill, Jeremiah Grosmann, Martin Grothe, Vincent Immler, Krzysztof Ko- towicz, Christian Mainka, Giorgio Maone, Andreas Mayer, Vladislav Mladenov, Dominik Noß, David Ross, Juraj Somorovsky, Paul Stone, Karsten Tellmann, and Sandra Terstegge. I also want to explicitly thank all of my colleagues from Hackmanit and the Chair of Network and Data security. I would also like to thank my first advisor Prof. Dr. J¨org Schwenk and my second advisor Prof. Dr. Martin Johns. Thank you all for your valuable time.

Bochum, 1st April 2019 Abstract

UI Redressing (UIR) describes a set of powerful attacks which can be used to circumvent browser security mechanisms like sandboxing and the Same-Origin Policy. In essence, an attacker wants to lure a victim into performing actions out of context by commonly making use of social engineering techniques in combination with invisible elements and hijacked trustworthy events. The set of attacks includes techniques like manipulating the mouse cursor, stealing touch gestures, and maliciously reuse keystrokes. Introduced in 2008, was the first UIR attack which made it possible to automatically hijack the camera respectively microphone of the victim by stealing a few left-clicks within a Flash-based browser game. This thesis analyzes fundamentals, attacks, and countermeasures of UIR in depth. In addition to well known techniques, new research results like case studies of new UIR attacks will be provided. As an important contribution to the fundamentals of UIR, the first extensive investigation about the targets of UIR attacks is provided. These targets are called trustworthy events in this thesis, which should not be confused with the concept of trusted events also known from Web security. Based on this investigation, three new UIR attack variants with a minimized visibility were introduced. Furthermore, an empirical study about the DOM-based Same- Origin Policy – perhaps the most important security mechanism for protecting Web applications – is given. Its aim to separate content from different origins can legally be bypassed by using trustworthy events. Therefore, an extensive evaluation is provided about this target of UIR attacks. By looking at UIR attack contributions, this thesis describes novel drag- and-drop attack variants, an SVG masking technique, tabnabbing to redress named windows, a scriptless attack to steal keystrokes, and inter alia browser- less attacks on Android systems that are based on tapjacking. As UIR defense contributions, window spoofing protection mechanisms, JSAgents as a practi- cal alternative to Content Security Policy, and browserless tapjacking defense mechanisms are presented. Zusammenfassung

UI-Redressing (UIR) beschreibt eine umfangreiche Menge an Angriffen, die benutzt werden k¨onnen, um browserbasierte Sicherheitsmechanismen wie Sand- boxing und die Same-Origin Policy zu umgehen. In der Regel m¨ochte ein Angreifer das Opfer unter Verwendung von Social-Engineering Techniken in Kombination mit unsichtbaren Elementen und entf¨uhrten Trustworthy-Events dazu bringen, Aktionen auszuf¨uhren, die außerhalb des Kontextes liegen. Die Menge der Angriffe beinhaltet dabei Techniken wie die Manipulation des Mauszeigers, das Stehlen von Touch-Gesten und das b¨osartige Wiederverwen- den von Tastatureingaben. Im Jahr 2008 wurde Clickjacking als erster UIR- Angriff vorgestellt, der erlaubte nach einigen entf¨uhrten Mausklicks innerhalb eines Flash-basierten Browserspiels einen automatischen Zugriff auf die Kamera und das Mikrofon des Opfers zu erhalten. In dieser Arbeit werden auf UIR basierende Grundlagen, Angriffe und Gegen- maßnahmen detailliert analysiert. Dar¨uber hinaus werden neben bekannten Angriffen mitunter neue Forschungsergebnisse aus bspw. Fallstudien ¨uber neue UIR-Angriffe er¨ortert. Als ein wichtiger Beitrag zu den Grundlagen von UIR wird die erste umfangre- iche Untersuchung ¨uber die Ziele von UIR-Angriffen vorgestellt. Diese Ziele wer- den in dieser Arbeit Trustworthy-Events genannt, so dass diese von dem Web- sicherheitskonzept der Trusted-Events abgegrenzt werden k¨onnen. Aufgrund dieser Untersuchungen konnte das Konzept von Trusted-Events ¨uberlistet und drei neue Varianten von UIR-Angriffen, mit einer minimalisierten Sichtbarkeit, eingef¨uhrt werden. Dar¨uber hinaus wird eine empirische Studie ¨uber die DOM basierte Same-Origin Policy, als der vermutlich wichtigste Sicherheitsmechanis- mus von Webapplikationen, beschrieben. Dessen Ziel Inhalte von verschiedenen Herk¨unften zu separieren kann mit der Hilfe von Trustworthy-Events umgangen werden. Aus diesem Grund wurde eine umfangreiche Untersuchunguber ¨ dieses Ziel von UIR-Angriffen durchgef¨uhrt. Im Hinblick auf die Beitr¨agezu UIR-Angriffen werden in dieser Arbeit neuar- tige Drag-and-Drop Angriffsvarianten, Maskierungen mit der Hilfe von SVGs, Tabnabbing und das Umadressieren von benannten Fenstern, skriptlose Angriffe zum Stehlen von Tastatureingaben, sowie unter anderem browserlose Angriffe auf Android-Systeme die auf Tapjacking basieren, beschrieben. Als Beitr¨age zu UIR-Gegenmaßnahmen werden Pr¨aventionsmaßnahmen gegen die Manip- ulation von Browserfenstern, JSAgents als praktische Alternative zur Content Security Policy und browserlose Abwehrmechanismen gegen Tapjacking pr¨asen- tiert. Contents

I. Thesis Introduction 1

1. Outline, Contributions, and Publications 2 1.1. Thesis Outline and Contributions ...... 3 1.2. Publications ...... 5

II. UI Redressing Fundamentals 8

2. Previously Known Fundamentals 9 2.1. Hypertext Transfer Protocol ...... 9 2.2. Transport Layer Security ...... 11 2.3. Languages ...... 11 2.4. Other Attack Techniques ...... 17 2.5. Cursor ...... 20

3. Thesis Contributions to Fundamentals 22 3.1. UI Redressing and Trustworthy Events ...... 22 3.2. Same-Origin Policy: Evaluation in Modern Browsers ...... 39

III. UI Redressing Attacks 63

4. Previously Known Attacks 64 4.1. Classic Clickjacking ...... 65 4.2. Likejacking and Sharejacking ...... 69 4.3. Cursorjacking ...... 70 4.4. Cookiejacking ...... 70 4.5. Filejacking ...... 71 4.6. Double Clickjacking ...... 72 4.7. Nested Clickjacking ...... 73 4.8. Drag-and-Drop Operations ...... 73 4.9. Strokejacking ...... 74 4.10. Other Clickjacking Techniques ...... 75

5. Thesis Contributions to Attacks 78 5.1. Drag-and-Drop Attacks ...... 78 5.2. SVG Masking ...... 81 5.3. Tabnabbing: Attacking Named Windows ...... 83 5.4. Scriptless Attacks: SVG-based keylogger ...... 84 5.5. Browserless Attacks: Tapjacking ...... 87 5.6. Study: Router Web Security Evaluation Revisited ...... 92

i IV. UI Redressing Defense Mechanisms 101

6. Previously Known Defense Mechanisms 102 6.1. JavaScript-based Frame Buster Overview ...... 102 6.2. Frame Busting ...... 103 6.3. Randomization to Detect Clickjacking Campaigns ...... 114 6.4. X-Frame-Options ...... 114 6.5. Content Security Policy ...... 117 6.6. NoScript ...... 121

7. Thesis Contributions to Defenses 124 7.1. Spoofing Protection ...... 124 7.2. JSAgents: A Practical Alternative to CSP ...... 125 7.3. Browserless Tapjacking Defense Mechanisms ...... 133

V. Thesis Final Part 136

8. Conclusions and Outlook 137

9. Appendix 139

10.Bibliography 141

List of 157

List of Figures 158

ii Part I.

Thesis Introduction

1 1 Outline, Contributions, and Publications

For being such an underestimated attack, UI redressing produces surprising financial consequences: most notably, it prevents Paypal and other payment processors from embedding “one-click-pay” buttons in vendors’ Web pages. Current browser-built-in countermeasures, aimed to restrict cross-origin documents nesting, just can’t solve this problem. Giorgio Maone, InformAction

The current era shows that a company like Alphabet could generate a profit of over $27 billion in 2017 by primary using Web technologies.1 This fact alone makes it interesting for commercial companies to offer Web applications to do online banking, shopping, or just to share status with friends via Twitter or other social networks like Facebook. Particularly, with regard to the last years, time has shown that Web applications have become more complex in providing new functions, in having more usability features, or just in bringing up some eye catchers. Due to their high complexity, software engineers have to pay more attention to developing secure Web applications. However, this is a difficult task because of continuously arising Web technologies and sometimes also due to upcoming attacks on the basis of these technologies. Hansen and Grossman introduced such a Web-based attack in 2008 [61]. In their attack called clickjacking, they let the attacker get access to the micro- phone and webcam of the victim. In order to achieve this, the victim had to do certain clicks within a browser game. Instead of clicking on a displayed button within a game, the victim was actually clicking on an invisible overlaying Flash Player Settings Manager interface. At the beginning of its introduction, click- jacking was also known under the previously used term “UI redressing” [176]. The relationship between both terms differs because clickjacking is nowadays known as a subset of UI redressing (UIR) attacks. Therefore, they are not treated equally in this work. According to Avira, clickjacking “is one of the most used techniques by hackers trying to gain access over your accounts or obtain private data”.2 One of the important properties of UIR attacks is that the user interface (UI), for example in a Web browser, can consist of visible and also invisible

1Google Company Financials, http://www.nasdaq.com/symbol/goog/financials,July2018 2The 3 most common questions about Clickjacking, http://blog.avira.com/ clickjacking/,Nov.2014

2 elements; such elements can have a visibility of 0–100%. This makes it possible to define completely transparent elements. If invisible or rather transparent elements are loaded inside the Web browser, there is a certain chance that the user, accidentally or due to an attacker’s code, clicks on them. The probability of a momentous click depends on different conditions like the size of the element, its position in the browser’s window, the degree of visibility, and last but not least, on the social engineering skills of the attacker.

1.1. Thesis Outline and Contributions

This thesis analyzes the fundamentals, the attacks, and also countermeasures of UIR in depth. It will point out that there are much more user events that can be hijacked besides a click. Furthermore, there is no single solution to solve UIR attacks in general; even existing countermeasures can sometimes be attacked and thus, we have to harden them against attacks. However, best practices and new research approaches from this thesis can be used to strengthen Web applications against UIR attacks. This cat and mouse game illustrates the complexity of UIR and it shows the difficulty to find proper countermeasures.

Chapter 2 presents foundations like Hypertext Transfer Protocol (HTTP) and Transport Layer Security (TLS). Apart from these protocols, there are lan- guages described that are important for UIR.First,HTTP is given with its basic structure and a focus on framesets and frame elements. Second, Cas- cading Style Sheets (CSS) as a language for defining the presentation of a document is provided. The property opacity of specifying the transparency of an element, which is important for UIR, is highlighted therein. Third, there are fundamentals given regarding JavaScript and Extensible Markup Language (XML). In this relation, Scalable Vector Graphics (SVG) is mentioned as an XML-based format, which can include JavaScript code and scriptfree code to perform attacks. Last but not least, the cursor as a foundation for UIR and attack techniques like social engineering, Cross-Site Request Forgery (CSRF), and Cross-Site Scripting (XSS) are described.

Chapter 3 introduces the first extensive investigation on the target of UIR attacks called trustworthy events. These events are consciously triggered by a human user (e.g., a left mouse click) to authorize security critical changes. Due to these investigations, major differences could be detected between widely-used browser families in handling trustworthy events. Furthermore, the concept of trusted events defined by the W3C can be circumvented. In this context, three new UIR attack variants with a minimized visibility are being introduced. The investigations have pointed out that cross-origin actions which are usu- ally protected by the Same-Origin Policy (SOP) are sometimes allowed in case of trustworthy events; for example, drag-and-drop actions between cross-origin windows. Due to the importance of the SOP as perhaps the most important security mechanism for protecting Web applications, there is an empirical study presented about the SOP-DOM with an open-source testbed consisting of 544

3 different test cases across 10 major browsers. The findings of the study are discussed in terms of read, write, and execute rights in different access control models.

Chapter 4 gives an overview about UIR attacks which were presented by other researchers. At the beginning, the first public UIR attack called clickjacking is described. In September 2008, this attack was introduced with an attack on Adobe’s Flash. This technique allowed an attacker to hijack mouse clicks within an attacker’s game to configure an invisible Iframe loading the Flash Player Settings Manager. As a result, an attacker was able to automatically get access to the camera and microphone each time when a user visits the attacker’s . Apart from classic clickjacking attacks, drag-and-drop attacks are discussed besides likejacking and sharejacking which address social networks like Face- book. Moreover, cross-browser attacks called cursorjacking, double clickjack- ing, drag-and-drop, nested clickjacking, and strokejacking are given. Identified as browser specific bugs, cookiejacking as an Explorer (IE) attack to steal cookies, filejacking to read the user’s file by showing dialogs out of context in Chrome (GC), and whole-page clickjacking as an attack on Opera (OP) are described.

Chapter 5 depicts novel UIR attack contributions. At the beginning, drag- and-drop attacks are described for the Fritz!Box 2170 router and for Web ap- plications of identity providers. SVG masking as a new technique to show elements out of context is introduced. By exemplary attacking an online voting application with the help of named windows, there is a new UIR attacked pro- vided, called tabnabbing. Beside these attacks, a strokejacking related attack with scriptless attacks is being presented. In contrast to the previous pub- lished strokejacking attacks, the described attack only makes use of scriptfree HTML5 environments with SVG. Finally, browserless attacks on Android sys- tems are introduced. By deriving attack techniques known from desktop-based environments, new attacks on Android are introduced to circumvent the access management system and thus an attacker can lure a victim into actions like doing unintended phone calls.

Chapter 6 gives an overview of countermeasures against UIR attacks provided by other researchers. At the beginning, JavaScript-based frame busters, attacks against these frame busters, and finally hardened JavaScript-based countermea- sures are described. Afterwards, the idea of a randomization approach to detect clickjacking campaigns is explained. The last two sections present X-Frame- Options and Content Security Policy. The HTTP header X-Frame-Options was built to disallow the possibility of being framed from authorized . In contrast, the initial aim of Content Security Policy (CSP) was to mitigate content injection vulnerabilities like XSS. Nowadays, an additional aim of CSP is to lock down applications with framing prevention, for example.

4 Chapter 7 brings up novel protection mechanism against UIR attacks. First, spoofing protections against tabnabbing and design mode attacks are presented. Second, an approach to defeat markup injections attacks is provided with JSAgents. By enforcing a policy which looks similar to CSP, JSAgents is sup- ported by legacy applications which do not have to be modified. Furthermore, a novel cascading enforcement allows to apply different policies to each element in the Document Object Model (DOM). Last but not least, a browserless defense mechanism for Android systems to defend tapjacking attacks is proposed.

1.2. Publications

The following chapters are based on publications that I have written as a single author or in a joint work. My specific contributions are provided at the begin- ning of each chapter.

Chapters 2, 4, and 6. Clickjacking und UI-Redressing – Vom Klick-Betrug zum Datenklau. Ein Leitfaden f¨ur Sicherheitsexperten und Webentwickler; Mar- cus Niemietz, dpunkt.Verlag, April 2012. This book is a guideline for developers to understand UIR attacks and countermeasures.

UI Redressing: Attacks and Countermeasures Revisited; Marcus Niemietz; CON- Fidence 2011, Krakow, Poland, May 2011. In this work, novel UIR attacks such as SVG masking are provided. Furthermore, best practices to harden a Web application against UIR are given.

Chapter 3.1. Out of the Dark: UI Redressing and Trustworthy Events; Mar- cus Niemietz, J¨org Schwenk; 16th International Conference on Cryptology And Network Security (CANS 2017), Hong Kong, China, October 2017. This paper provides the first extensive investigation on trustworthy events, which covers the target of UIR attacks. I am the main author of this publication.

Chapter 3.2. Same-Origin Policy: Evaluation in Modern Browsers; J¨org Schwenk, Marcus Niemietz, Christian Mainka; 26th USENIX Security Sympo- sium (USENIX Security ’17), Vancouver, Canada, August 2017. In an empiri- cal study about the SOP-DOM, we ran a large set of test cases on major web browsers and showed that different browser behaviors could be detected. My focus was the methodology and the SOP-DOM evaluation.

Chapter 5.1. Guardians of the Clouds: When Identity Providers Fail; Andreas Mayer, Marcus Niemietz, Vladislav Mladenov, J¨org Schwenk; CCSW 2014: The ACM Security Workshop. In this publication, we presented a comprehensive analysis of SAML identity providers and introduced a new attack technique called ACSSpoofing. I have detected and evaluated the XSS and UIR attacks.

Chapter 5.1 and 5.6. Owning Your Home Network: Router Security Re-

5 visited; Marcus Niemietz, J¨org Schwenk; W2SP 2015: Web 2.0 Security & Privacy, 2015, San Jose, U.S.A., May 2015. In this paper, we investigated Web interfaces of several DSL home routers. By primary using XSS and UIR at- tacks, we were able to circumvent the security of all of them. I am the main author of this publication.

Chapter 5.3. The Bug that made me President: A Browser- and Web- Security Case Study on Helios Voting; Mario Heiderich, Tilman Frosch, Mar- cus Niemietz, J¨org Schwenk; International Conference on E-voting and Identity (VoteID), 2011, Tallinn, Estonia, September 2011. This publication describes security challenges for critical web applications such as the Helios Voting sys- tem. We used Web browser features to leverage information disclosure and state modification attacks. My main contribution was to evaluate Helios regarding possible XSS and UIR attacks.

Chapter 5.4. Scriptless Attacks – Stealing the Pie Without Touching the Sill; Mario Heiderich, Marcus Niemietz, Felix Schuster, Thorsten Holz, J¨org Schwenk; 19th ACM Conference on Computer and Communications Security (CCS), Raleigh, NC, October 2012. In this paper, we demonstrated with script- less attacks that an adversary might not need to execute code to preserve its ability to extract sensitive information from well protected websites. We showed that an attacker can use seemingly benign features to build side channel attacks which measure and exfiltrate almost arbitrary data displayed on a given website. My focus in this work was to analyze existing attack mitigation techniques to determine how website owners and developers can be protected against script- less attacks.

Chapter 5.5. UI Redressing Attacks on Android Devices; Marcus Niemietz, J¨org Schwenk; Black Hat Abu Dhabi, Dezember 2012. In this paper, we de- scribed novel high-impact user interface attacks on Android-based mobile de- vices, additionally focusing on showcasing the possible mitigation techniques for such attacks. I am the main author of this publication.

Chapter 7.2. Waiting for CSP – Securing Legacy Web Applications with JSAgents; Mario Heiderich, Marcus Niemietz, J¨org Schwenk; ESORICS 2015 – 20th European Symposium on Research in Computer Security, Vienna, Austria, September 2015. In this paper, we proposed JSAgents as a novel and flexible approach to defeat markup injection attacks using DOM meta-programming. I analyzed different policy approaches, built a Firefox (FF) extension to inject JSAgents automatically, and evaluated the performance and policy enforce- ments.

6 Other Publication The following publication is not included into this thesis although it also ad- dresses security problems due to UIR attacks.

Not so Smart: On Smart TV Apps; Marcus Niemietz, Juraj Somorovsky, Christian Mainka, J¨org Schwenk; International Workshop on Secure Internet of Things (SIoT 2015), Vienna, Austria, September 2015. In this paper, we investigate attack models for Smart TVs and their apps, and systematically analyze security of Smart TV devices. I am the main author of this publica- tion.

7 Part II.

UI Redressing Fundamentals

8 2 Previously Known Fundamentals

UI redressing attacks are universally applicable to scenarios beyond typical web sites. For example, they can be used to facilitate or trigger attacks on complex protocols like OAuth and SAML. A fundamental research on UI redressing is thus of a huge importance. Juraj Somorovsky, Ruhr University Bochum

Contents

2.1. Hypertext Transfer Protocol ...... 9 2.2. Transport Layer Security ...... 11 2.3. Languages ...... 11 2.4. Other Attack Techniques ...... 17 2.5. Cursor ...... 20

UIR attacks mostly rely on browser, application, and operating system features. For a better understanding of the UIR attacks and countermeasures, there are described important languages like HTML and attack techniques like XSS.

2.1. Hypertext Transfer Protocol

Websites are usually transmitted with the help of the Hypertext Transfer Pro- tocol (HTTP). The first document HTTP/1.0 was published with RFC 1945 by Berners-Lee et al. in 1996 [31]. Three years later, the currently widespread version HTTP/1.1 [51] was introduced. Basically, one can see HTTP as a pro- tocol with requests initiated by the user and responses by the server. Requests are used to get resources like images or HTML files. HTTP is a stateless protocol and cookies allow the server to track the user’s state. Cookies are delivered in every request and they can be used to authen- ticate the client; it does not matter if the resource is an image, HTML, or CSS file. From the security perspective, a SSL/TLS protected website can leak its cookies to a man-in-the-middle attacker if just one resource is not protected with SSL/TLS.

1 GET /chair/news/ HTTP/1.1 2 Host: www.nds.rub.de 3 User-Agent: /5.0 (Macintosh; Intel Mac OS X 10.10; rv:32.0) Gecko/20100101 Firefox/32.0

9 4 Accept: text/html,application/xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 5 Accept -Language: de,en-US;q=0.7,en;q=0.3 6 ... 7 Pragma: no-cache 8 Cache-Control: no-cache 9 Listing 2.1: HTTP request to get the website of the Chair of Network and Data Security. Listing 2.1 shows an HTTP request to receive the website’s source code of the “Chair of Network and Data Security” website. The request consists of information regarding the HTTP method, the resources’ path (line 1), the used protocol (line 1), and also the host name (line 2). Furthermore, there is inter alia included user agent (line 3) and cache-control data (lines 7–8). Based on this request, the truncated server response is displayed in List- ing 2.2. As it is in the case of the request, the HTTP version is given (line 1); moreover, the server’s name is shown (line 2). In contrast to the HTTP request, there is a header (lines 1–10) and body (lines 12–17) in the response – for this reason we have an empty line in the request, which separates both parts. The body of the response usually includes code, which is displayed inside the browser’s window.1

1 HTTP/1.1 200 OK 2 Server: gunicorn/0.13.4 3 Date: Thu, 23 Oct 2014 12:59:03 GMT 4 Content -Language: de 5 Expires: Thu, 23 Oct 2014 13:14:03 GMT 6 Vary: Accept-Language , Cookie 7 Last-Modified: Thu, 23 Oct 2014 12:59:03 GMT 8 Cache-Control: max-age=900 9 Content -Type: text/html; charset=utf-8 10 Content -Length: 10033 11 12 13 14 15 Nachrichten - Ruhr-Universit¨at Bochum 16 ... 17 Listing 2.2: Truncated HTTP response regarding the request displayed in Listing 2.1. In the context of UIR, we will use the HTTP header and body to attack with UIR and also to defend against it. The attacks, like classic clickjacking (cf.

1To name one exception, in CVE-2009-3013 Opera 9.52 and earlier versions allowed an attacker to manipulate Location headers of HTTP responses by using data URIs so that JavaScript code can be executed. http://www.cvedetails.com/cve/CVE-2009-3013/

10 Chapter 4.1) and content extraction (Ch. 4.8.1), are primarily inside the body whereas the countermeasures can be found in both parts. HTTP headers can be used to activate X-Frame-Options to disallow another website to frame the attacked website (cf. Chapter 6.4). The HTTP body can include frame busting code (cf. Chapter 6.2). Just one area of this work does not use HTTP directly: browserless attacks (Ch. 5.5); it allows attackers to create malicious applications to make actions like unintended phone calls.

2.2. Transport Layer Security

Transport Layer Security (TLS) [46] is the most important security protocol on the Internet. It is located between the transport layer and the application layer in the TCP/IP reference model. Its main purpose is to protect integrity, authenticity and confidentiality in application protocols like HTTP or IMAP, so that these protocols can securely send critical data like passwords or cookies over insecure networks. In this work, the TLS protocol structure will not be analyzed. Instead, TLS plays an important role as a fundamental part of a Web origin. A Web Origin is usually defined with a domain, port, and protocol [25]. In case of TLS, HTTPS is used as a decision criteria whether to grant access to Web objects (cf. Section 3.2.3).

2.3. Languages

In the following, fundamental language requirements will be described for a better understanding of the analyzed attacks and countermeasures of this work. First, HTML will be explained with a special focus on forms, frames, and framesets. Second, we show for UIR required basics like CSS, JavaScript, and XML.

2.3.1. Hypertext Markup Language The Hypertext Markup Language (HTML) is a language defined in March 1989 by Berners-Lee and it is based on the Standard Generalized Markup Language (SGML).2 HTML allows the insertion of hyperlinks, images, lists, forms, and frames. The corresponding elements consist of start and end tags, which are enclosed in angle brackets. Next to the insertion of elements like frames, HTML can be used to include Cascading Style Sheets and inter alia JavaScript code.

Basic Structure. In order to understand UIR examples which are using HTML code, we will walk through the document structure given in Listing 2.3.The parsed output of Safari 8.0 is shown in Figure 2.1. At the beginning, a document type definition (DTD) is given, which can be used to specify the allowed HTML

2The original proposal of the WWW – HTMLized, World Wide Web Consortium, http: //www.w3.org/History/1989/proposal.html, Mar. 1989

11 elements as well as child-elements and attributes. There are different DTD’s like HTML 4.01 Strict, XHTML 1.0 Frameset, and HTML 5.3 Inside the , there are two other elements: head and body. The head element encloses a title element for, as the name indicates, setting the title of the document. In contrast, the body element encloses elements to create a headline, text with some bold letters and a link to www.example.org.

1 3 4 5 Title 6 7 8

Headline

9 This is a bold link: 10 Example 11 12 Listing 2.3: Example for a basic HTML document.

Figure 2.1.: Basic HTML document of Listing 2.3 interpreted by Safari 8.0 on OS X 10.10.

Framesets and Frames. Listing 2.4 is given as a second example; the parsed output of Safari 8.0 is shown in Figure 2.2. At the beginning, a set of frames is created by using the frameset element. The attribute cols specifies that the first frame has a column size of 120 pixels and that the second frame has a variable column size based on the screen resolution. These frames are enclosed by the frameset element. The first frame is loading the file navigation.html and the second frame is loading the file mainpage.html. Each frame element includes a name attribute, which can be used to address frames by for example clicking

3Recommended list of Doctype declarations, World Wide Web Consortium, http://www.w3. org/QA/2002/04/valid-dtd-list.html, Dec. 2011

12 inside the navigation to open the Web page in the Content frame. The content of the next element noframes will be only shown if frames are not supported in the browser. However, if this is the case, there are displayed two links pointing to the Web pages that should be loaded by using the frames. Please note that UIR attacks usually require frames to attack a user. More specifically, the attacker will use Iframe elements that do not require framesets. This element is explained in Chapter 2.3.2.

1 2 3 4 5 <a href=" navigation.html"> Navigation</a>, <a href=" mainpage.html"> main page</a> 6 7 Listing 2.4: HTML code to create a frameset with frames.

Figure 2.2.: HTML frameset of Listing 2.4 interpreted by Safari 8.0 on OS X 10.10.

2.3.2. Cascading Style Sheets With the formatting language Cascading Style Sheets (CSS) it is possible to style structured elements in documents like HTML. This language allows spec- ifying features like fonts, colors, spaces, and sounds. There are different CSS versions. CSS Level 1 (CSS 1) was published in December 1996; CSS Level 2 (CSS 2) in May 1998.4 Nowadays, CSS 2.1 from June 2011 is used in all modern Web browsers; it builds on CSS2.5 One reason is that CSS2 is, in the main, downwards compatible to CSS1. Since April 2000, the W3C is working

4Cascading Style Sheets, level 1 – W3C Recommendation, revised 11 Apr 2008, World Wide Web Consortium, http://www.w3.org/TR/CSS1/, Dec. 1996 5Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification – W3C Recommendation, World Wide Web Consortium, http://www.w3.org/TR/CSS21/, Jun. 2011

13 on CSS3.6

1 2 3 4 CSS 5 11 12 13

h1 tags

14

h1 tags

15

h1 tages

16 17 18 19 Listing 2.5: HTML document with CSS code.

To understand how UI redressing attacks with CSS work, there is shown the code of an HTML document with CSS code in Listing 2.5.Theinterpreted output is displayed in Figure 2.3. As it is the usual case of an HTML document, there are html, head, title, and body tags. The first important element, by focusing on CSS code, is the style element. It includes a comment to hide the CSS declarations in older browsers that do not support CSS. Then there is a definition for Iframe elements; it sets all iframe elements to a height of 100 pixels. Besides, the element with the id attribute small is set to a font size of 12 points. These elements are inside the body element. Please note that CSS is using the following structure inside the style element: selector { property:value; }. Inside the body element, there is first a headline without any CSS modifi- cations. Second, the style attribute is used to set the transparency of the headline to 40%. More details regarding the transparency of elements, which is a very important aspect of UI redressing, are discussed in Chapter 4.1 with a special focus on Table 4.1. Moreover, there is a headline with a font size of 12 points due to the id attribute small. The first Iframe element is used to load the website of the University of Bochum with a width of 320 pixels and a height, due to the CSS definition, of 100 pixels. The last Iframe is loading the website of the Chair of Network and Data Security and is positioned on the top left of the Web page; more precisely 30 pixels from the top and 150 pixels from the left of the top left corner.

6CSS3 introduction – W3C Working Draft, World Wide Web Consortium, http://www.w3. org/TR/2000/WD-css3-roadmap-20000414, Apr. 2000

14 Figure 2.3.: HTML document with CSS code of Listing 2.5 interpreted by Safari 8.0 on OS X 10.10.

2.3.3. JavaScript JavaScript is a scripting language introduced by a cooperation of Netscape and Sun Microsystems in December 1995.7 In July 1997, it was standardized as ECMA-262 by the European Computer Manufacturers Association (ECMA). Since June 2011 the 5.1th edition is published.8 Listing 2.6 shows how JavaScript code can be deployed to create a list of numbers defined by user input. First, there is a script-element with a type attribute setting the media type to JavaScript. Second, the function count per- forms the task of listing numbers line by line influenced by the user’s input. This function is called by using the event-handler onclick, which will be acti- vated in the case that the user is clicking on the text Click here listed in line 13.

1 13 Click here Listing 2.6: JavaScript code to create a list of numbers defined by user input.

7JavaScript: How Did We Get Here?,SteveChampeon–O’ReillyMedia,http://archive. oreilly.com/pub/a/javascript/2001/04/06/js_history.html,Jun.2001 8Standard ECMA-262, ECMA International,http://www.ecma-international.org/ publications/standards/Ecma-262.htm,Jun.2011

15 2.3.4. Extensible Markup Language eXtensible Markup Language (XML) is a W3C data format [38]. It is used for transmission, validation and interpretation of data in different applications ranging from Web services and office applications to configuration files used in various servers and appliances. The huge number of application scenarios adapting XML technology resulted in a large amount of extension specifica- tions allowing to define schemas for XML documents or to apply cryptographic primitives directly on the XML level. In the following, we introduce a standard called Document Type Definition (DTD), which is necessary for further attack descriptions. DTD allows for declaration of new XML building blocks in the prolog of an XML document. These building blocks are called XML entities. XML entities are inserted into the XML document and resolved during XML document pars- ing. There are two types of XML entities: internal and external, see Listing 2.7 with a DTD declaration.

1 3 4 ]> 5 6 & title; 7 &ext; 8 Listing 2.7: An XML document containing a DTD declaration with an internal and external entity.

When an XML parser processes such a document, it first reads the entities in the XML prolog. Afterwards, it resolves all the entity occurrences in the document: &title; is replaced with a text Configuration file and &ext; is substituted with the content of the file:///text.txt file. Resolving external entities can become dangerous if an attacker controls con- tent of processed XML files. This is the case in many applications, for example Web services. If the attacked XML parser resolves external entities, the at- tacker can force the parser to read arbitrary system files and send them over the network. These attacks are referenced as eXternal XML Entity (XXE) attacks [116].

Scalable Vector Graphics Scalable Vector Graphics (SVG) is “a modularized language for describing two- dimensional vector and mixed vector/raster graphics in XML”.9 It was first published by the W3C in 1999 and it includes features like showing text, paint- ings (e.g., filling, stroking), patterns, and filters. Furthermore, script elements can be used to execute JavaScript code.

9Scalable vector graphics (SVG) 1.1 (Second edition), http://www.w3.org/TR/SVG11/,Au- gust 2011

16 Heiderich et al. showed in 2011 that SVGs can be used to exploit major web- sites and Web browsers [63]. Furthermore, they introduced attacks on websites that just use img elements loading SVG files.

2.4. Other Attack Techniques

This chapter describes attack techniques that are used in this work. Social engineering is an important technique for actions like luring the victim into clicking on elements. Cross-Site Request Forgery is important because we will, for example, show ways to bypass existing countermeasures with UI redressing attacks (Ch. 4.10.1). Cross-Site Scripting is relevant due to its code injection requirement as an important step of this attack, and also for the explanation of our introduced scriptless attacks (Ch. 5.4) used in the area of UI redressing.

2.4.1. Social Engineering In the case of social engineering, the attacker uses psychological tricks on a legitimate user of a computer system in order to gain information about the system or to do unauthorized actions in general. From a non-technical view, we can name pester power items like chocolate sticks as an example for social engineering. In this case, the shop owner wants the purchaser to believe that buying a chocolate bar is necessary due to long waiting periods. For this reason, there is a higher chance that such a person will purchase a chocolate bar, even if it is overpriced. A technical example is the Facebook clickjacking worm, which combines var- ious human interests.10 There was published an image with an almost naked woman and a link with the text “Want 2 C something hot? Click da button, baby!” in a Facebook profile message. When the user clicked on this link, a Web page was loaded with a picture of the woman and the mentioned text.11 Additionally, a big red button appeared that the user should click on. Many users clicked on the button, which was deposited with clickjacking code. There- fore, the above mentioned Facebook message was in each user profile, too; the attacker used the woman as a motivation to get clicks from the victim. A variety of researchers are dealing with social engineering attacks. In June 1995, Winkel et al. [173] showed in a case study that this kind of attack can be used to bypass technical countermeasures. Hadnagy illustrated that the at- tacker can use many psychological principles and well-known persuasion tech- niques [59]. In 2014, Nei et al. explained [121] in a case study that clickjacking attacks can be easily carried out on social networks like Facebook by using social engineering attacks. Also in 2014, Faghani et al. presented simulation results regarding the propagation of clickjacking in social networks [48]. Mak- ing social engineering even more successful can be achieved by using techniques like Cross-Site Script Inclusions discussed by Lekies et al. [98].

10Facebookers hit with steamy clickjacking exploit, http://www.theregister.co.uk/2009/11/ 23/facebook_clickjacking_exploit/,Nov.2009 11Link sharing spam on Facebook, Morten Barklund, http://www.barklund.org/blog/2009/ 11/23/link-sharing-spam-facebook/, Dec. 2009

17 2.4.2. Cross-Site Request Forgery CSRF is an attack technique that enables the attacker to send unauthorized HTTP requests. A typical example is a bulletin board on which authenticated users can write messages. The attacker first analyzes whether there is an op- tion to send HTTP requests to do a logout by requesting logout.php. Second, with this knowledge the attacker writes a message on the bulletin board, which consists of an image loading the logout.php-page and a text to abuse the ad- ministrator. The crucial point is that every user, thus also the administrator, will automatically be logged out by viewing the message; this is because of the loaded image pointing to the logout page. For this reason, the administrator is usually only able to delete the message by using the database, in which the message is directly saved. Such a technique can therefore be used to automati- cally send the victim’s unintended requests, while actually executing a malicious code provided by the attacker. A typical countermeasure is to use tokens that are unique and not guessable by the attacker; they are sent, for example via a hidden form field, with each request and validated by the server. Beyond that, researchers published different tools and concepts to detect CSRF vulnerabilities. Johns et al. introduced a client side proxy for the iden- tification of potential fraudulent requests called RequestRodeo [82]. Kongsli showed [90] that Selenium as a software testing framework for Web applica- tions can be used to test for vulnerabilities like XSS, CSRF, and information leakage. CSRF defense techniques are inter alia discussed by Barth et al. [27] and Jovanovic et al. [84]. Braun et al. identified the common root causes of widespread vulnerabilities including CSRF [36].

2.4.3. Cross-Site Scripting For the purpose of this work, two major Cross-Site Scripting (XSS) variants are relevant: reflected and stored XSS. We do not go into detail on Self-XSS, DOM-based XSS [97] (DOMXSS), and mutation XSS [66] (mXSS), because these attack techniques are not used in this work. As an example, the attacked home router admin interfaces (Ch. 4.8), which are evaluated in this thesis, do not have the necessary rich JavaScript code to execute attacks like DOMXSS and mXSS.

Reflected XSS. The first step of a reflective XSS attack is usually to send attack vectors defined by the attacker to the server via an HTTP GET or POST request; Figure 2.4 shows a simplified scenario. After analyzing the server response message, the attacker checks whether the testing code was (partially) displayed and therefore reflected. If this is not the case, the attacker can try another testing code; otherwise, he will abort the attack for the given resource. A successful executed testing code can show characters that are necessary for the attack, for example if < or " are not filtered by the server. In the case of an HTTP-GET request (2), the attacker can then prepare a link pointing to the vulnerable Web application, with a malicious JavaScript code embedded in the query string (1). In case of a POST request, a prepared HTML form can be

18 used, which is auto-submitted as soon as the attack page is loaded.

Figure 2.4.: Attack scenario for reflected XSS to steal the cookie of the victim.

To name a concrete attack scenario, such a malicious code can consist of an image, which is loaded from the attacker’s Web server. The image’s filename is, for example, the name of a cookie provided by the victim’s page; this cookie can be read by using the object document.cookie in JavaScript (4). Due to the reason that the file does not exist, the Web server of the attacker will save each HTTP 404 status code request with the value of the cookie. Finally, the attacker is able to steal the cookie (5) by checking the log files after the victim is opening the link previously prepared by the attacker (1) in a spam mail or malicious instant message.

1

2 3 4 5
Listing 2.8: First part of the reflected XSS attack, showing how the attacker can implement a username and password stealer.

Another example to illustrate this attack technique is a script injection on a website like example.org with the code given in Figure 2.8. It consists of a form element including three input elements to type in the username and password on a login Web page. In the case that the attacker is able to inject scripting code, for example via a vulnerable HTTP GET parameter, he could inject a username and password stealer with the code given in Listing 2.9.Thiscode has a div element with the id attribute foo as a pointer. It is referenced by the steal function, which uses innerHTML for putting HTML code into the div element. Inside the div element is an image with a src attribute referencing to the attacker’s website with a log.php Web page. Finally, the function sends an HTTP GET request to the attacker’s server with data from the input elements every single second.

1

2 Listing 2.9: Second part of the reflected XSS attack.

Stored XSS. If the attack vector can be stored persistently in the vulnerable Web application (e.g., malicious code inside a log file or database), we have a stored, also called persistent, XSS. Each time a victim visits the part of the Web application where the vector is saved, it is executed. A typical stored XSS scenario is given in Figure 2.5. The attacker sends malicious code to the server, for example as a guest book entry, and this code will then be stored into the database. The victim, who visits the guest book, requests the Web page with the malicious code loaded via the database. After that, the code will be executed and used to steal the (session) cookie. In general, the attack has a higher impact compared to reflected XSS due to the fact that attack vectors are saved and executed by all users of the vulnerable Web page. An illustrative example is a stored XSS attack on the router’s login page; all administrators of the home routers are affected here (cf. Chapter 5.6). The attacker could also include a tool like “The Browser Exploitation Frame- work” (BeEF)12 to steal data of users who type in characters by using a key- logger. BeEF is able to hook into one ore more browsers and use them as beachheads for executing attacks via modules; for instance, let the victim in- stall BeEF as an extension on all Web pages by providing a faked Flash update. In 2012, Lundeen et al. [100] released a clickjacking plugin module for BeEF.

Figure 2.5.: Attack scenario for persistent XSS to steal the cookie of the victim.

2.5. Cursor

Nowadays, graphical user interfaces (GUI) can be used to select elements with the help of a computer mouse. It was invented by Engelbart in the early 1960s at the Stanford Research Institute and is still an important device for user input [75]. The mouse cursor, as a characteristic of a computer mouse, is a position indicator and it can look like the arrow displayed in Figure 2.6. Borded with dashed lines and at least from the perspective of a Web browser’s user, the mouse cursor is a rectangular image with a transparent background and a

12BeEF – The Browser Exploitation Framework Project, Orru et al., http://beefproject. com/,Oct.2014

20 visible arrow. The arrow points to a coordinate inside the circled area. Thus, in the default case we are clicking with the top-left corner. In the past, the arrow was straight and not slightly tilted. Xerox introduced the straight arrow in 1981 because of low resolution screens [106]. The slightly tilted arrow is more recognizable and easier to realize (e.g., because of the straight line from the left edge of the arrow).

Figure 2.6.: A typical mouse cursor, a circled area, and a rectangular.

Cursor Properties. CSS allows to specify rules about how the mouse cursor should look within a specified area and how its action behaves on a graphic element. On the one hand, the property cursor can be used to define the image of the mouse cursor. On the other hand, the property pointer-events allows to set the circumstances under which an element can become a target of the pointer event. The CSS property cursor has a value of zero or more URL definitions as fallbacks that point to image files followed by a mandatory keyword value. A web browser will use this keyword in case that all of the images cannot be loaded; for example, pointer as a keyword for a hand symbol which indicates a link. The Mozilla Developer Network (MDN) documentations categorize different keyword values by their purpose.13 These categories include values for links and status actions (e.g., wait), selections, drag-and-drop actions, resizing, scrolling, and zooming. Moreover, general values like none can be defined to completely remove the visible mouse cursor. Specified by a single keyword, the CSS property pointer-events defines how the pointer action is handled.14 The keyword auto can be used to reuse the default behavior of the browser and none to prohibit pointer events on a specified target. Moreover, some keywords are only available for the handling of SVG objects.

13cursor - CSS: Cascading Style Sheets, MDN, https://developer.mozilla.org/en-US/ docs/Web/CSS/cursor,Feb.2019 14pointer-events - CSS: Cascading Style Sheets, MDN, https://developer.mozilla.org/ en-US/docs/Web/CSS/pointer-events,Feb.2019

21 3 Thesis Contributions to Fundamentals

The Same-Origin Policy is the core security skeleton upon which the web is built. While imperfect and fragile, the SOP has stood the test of time. Understand and respect the SOP for what it is, because it will be with us for a long time to come. David Ross, Google

Contents

3.1. UI Redressing and Trustworthy Events ...... 22 3.2. Same-Origin Policy: Evaluation in Modern Browsers 39

This chapter provides for this thesis newly created contributions in the fields of trustworthy events and the Same-Origin Policy.

3.1. UI Redressing and Trustworthy Events

Remark. The content of this Chapter 3.1 was created together with J¨org Schwenk and published at CANS 2017 [130]. I am the main author of this publication.

Abstract. Web applications use trustworthy events consciously triggered by a human user (e.g., a left mouse click) to authorize security-critical changes. Clickjacking and UIR attacks trick the user into triggering a trustworthy event unconsciously. A formal model of Clickjacking was described by Huang et al. and was later adopted by the W3C UI safety specification. This formalization did not cover the target of these attacks, the trustworthy events. We provide the first extensive investigation on this topic and show that the concept is not completely understood in current browser implementations. We show major differences between widely-used browser families, even to the extent that the concept of trustworthy events itself becomes unrecognizable. We also show that the concept of trusted events as defined by the W3C is somehow orthogonal to trustworthy events, and may lead to confusion in understanding the security implications of both concepts. Based on these investigations, we were able to circumvent the concept of trusted events, introduce three new UIR attack variants, and minimize their visibility.

22 3.1.1. Introduction UIR attacks are powerful attacks which can be used to circumvent browser security mechanisms like sandboxing and the SOP. They are far less intrusive than, for example, mails because the user thinks he performs a legal action on an innocent-looking Web page. In 2008 Grossmann et al. had to cancel their OWASP talk about a new attack technique called Clickjacking [61]: it turned out that they were able to bypass a major protection mechanism of Adobe’s Flash – Clickjacking allowed the attacker’s website to automatically get access to the victim’s camera and microphone without any explicit permission. According to Adobe, Clickjacking had the “highest level of damage potential that any exploit can have” [127]. In contrast to Clickjacking that is usually associated with left-click mouse events only, the broader term UIR also covers events from the keyboard and even touch gestures [74, 136]. In the past years, many attacks and defense mechanisms were published by the industry as well as the academic community (e.g., [128, 140, 24, 19], and [96]).

Formal Definition of UIR. Huang et al. [74] defined Clickjacking to be an attack that violates the integrity of either the visual context or the temporary context of a trustworthy user action on a sensitive element of the Web appli- cation. Visual context integrity may either be violated by making the sensitive element invisible (e.g., by placing it in fully transparent mode above some other element), or by hiding the fact that the user is actually clicking on such an el- ement (e.g., by modifying the image of the mouse pointer, also referred to as Cursorjacking [92]). Temporal context integrity can be violated by replacing a non-sensitive element, just before the user clicks on it, by the sensitive element. The definitions from Huang et al. [74] can easily be extended to the broader class or UIR attacks. However, the treatment of trustworthy events becomes more complex because in addition to left-click events, also right-click-and-select, keyboard and inter alia touch events must be taken into account.

Events in Web Applications. Events can be triggered by humans (e.g., by clicking on a button or moving the mouse pointer), by network operations, or automatically with the help of scripts. From the network, events like load or the status change events in XMLHttpRequest queries can be triggered. Purely script based are, for example, those triggered by the setTimeout() or setIntervall() method. For human interaction, a distinction must be made between events that the user consciously starts (e.g., click or keydown), and events that he may not notice (e.g., mouseover). Event-handlers are proce- dures with an on-prefix; they are called when the corresponding event occurs. For example, the onclick event-handler is called whenever a click event occurs. Events are managed in the event system of the browser and there exist many differences across browsers. For example, the event wheel will only be executed on the event system of IE when the method addEventListener() is used. The event system of GC will recognize this event with the same conditions when the event-handler onwheel is used. To foster interoperability, there exists a

23 working draft of an UI event specification designed by the World Wide Web Consortium (W3C) [85]. The specification describes event systems and subsets of different event types.

Trusted vs. Trustworthy Events. Trusted events are defined by the W3C as follows: “Events that are generated by the user agent, either as a result of user interaction, or as a direct result of changes to the DOM, are trusted by the user agent with privileges that are not afforded to events generated by script through the createEvent() method, modified using the initEvent() method, or dispatched via the dispatchEvent() method. The isTrusted attribute of trusted events has a value of true,whileuntrustedeventshaveaisTrusted attribute value of false.” ([165], Section 3.4). This definition is very broad and therefore not suitable for a distinction be- tween events that may be allowed to cause security critical changes, and those that may not. For example, the mouseover and click events are both “trusted” according to the W3C definition when caused by a human user; however, dis- playing a pop-up window or sending the contents of an HTML form simply because the mouse pointer crossed over a certain area of the browser window (mouseover) seems far too permissive. Our definition is more specific: atrust- worthy event is an event that is triggered by a conscious user action (e.g., by left-click, right-click, or keystroke).

Unreliability of isTrusted. To mark trusted actions, the DOM Level 3 spec- ification of the W3C mentions a read-only property called isTrusted,which returns a boolean value depending on the dispatched state [164]. In Section 3.1.4 we show that this property cannot be used to distinguish trustworthy events from other events, since pop-ups are blocked even if isTrusted=true, and are allowed even if isTrusted=false.

Trustworthy Event Scenarios. Trustworthy events are used in different se- curity critical scenarios. User consent in activating potentially dangerous browser features (e.g., activating the webcam via Adobe’s Flash) was the main target in previously described UIR attacks. Pop-up windows are usually blocked when there is no former click with the mouse pointing device. One rea- son is that pop-up windows are used by the advertisement industry and thus they might disturb the user or they may even trick him to install . The clipboard should only be accessible by user initiated keyboard or mouse pointing events. If the clipboard would be accessible by JavaScript code only, an attacker’s website could steal saved data like passwords stored in a password manager (paste action). Drag-and-drop is a scenario where a user is able to move data cross-origin. Again, if this feature was accessible by JavaScript code only, the SOP could be circumvented. Additional scenarios: in FF,the deprecated XML User Interface Language (XUL) handlers and commands can only be triggered by trustworthy events like click and touch [120]. In modern browsers such as GC, forms can be filled out automatically by using the aut- ofill feature that could be activated by trustworthy events like keystrokes and

24 left-clicks [144, 171].

Investigation of Trustworthy Events. We study (1) all mouse events including (a) different left-clicks (click, dblclick, mousedown, mouseup), (b) right-click, (c) mouse movements (mouseover), (d) drag (drag, dragstart) and (e) wheel; (2) the keyboard events keydown, keyup, keypress, and (3) combinations of mouse and keyboard events. We show that many of these user-triggered actions have different interpretations as trustworthy or non-trustworthy events in the different browser families. We investigate the three lesser-researched application areas of trustworthy events: pop-up windows bypassing pop-up blockers, escaping the browser sand- box via copy-and-paste to and from the clipboard, and bypassing the SOP via drag-and-drop.

Research questions. In this work we investigate the following questions: Which events are recognized as trustworthy by a modern Web browser? How is trustworthy event handling implemented in modern Web browsers? Could the knowledge of these implementations lead to new UIR variants?

Contributions. The contributions of this paper are as follows:

• We systematically evaluate trustworthy events in Web applications orig- inating in a mouse device, the keyboard, or a combination thereof, and describe differences in modern browsers implementations.

• We thoroughly analyze three security critical trustworthy event scenarios (pop-up windows, drag-and-drop, and clipboard), both same and cross- origin.

• We introduce and discuss three new UIR attack variants by making use of particularities of trustworthy event implementations in modern browsers.

3.1.2. UI Redressing The initial Clickjacking attack of Grossman et al. raised a lot of attention due to the hijacking possibilities of the webcam and microphone, but they also discovered a general security problem. As listed by Niemietz et al. [128], UIR is a set of attacks that includes Clickjacking as a subset. Next to Classic Clickjacking there are other attacks like Sharejacking and Likejacking (e.g., to attack Facebook [146]), and inter alia Cursorjacking [92, 34]. UIR does not only cover clicks, it also covers drag operations (drag-and-drop attacks [152]), keystrokes (Strokejacking [177]) and even maskings (SVG-based attacks [125]). In a classic Clickjacking attack illustrated in Figure 3.1, the victim has opened the attacker’s website, which consists of two Iframes. The first Iframe (“Funny Kittens”) is loading a visible HTML document to lure the victim into clicking on the More button. The second Iframe loads the target “Account Setting” website, but this frame is rendered invisibly (e.g., with the help of the property opacity=0 ) on top of the visible frame. Because of invisible Iframe’s position above the

25 Figure 3.1.: Illustration for a Classic Clickjacking attack.

Funny Kittens Iframe, the victim will actually click on Delete instead on More.

UIR Contexts. According to Huang et al., the definition of UIR is that “an attacker application presents a sensitive UI element of a target application out of context to a user and hence the user gets tricked to act out of context” [74]. This definition describes the root cause of UIR. Visual Context. This context defines what the user sees. It does not include actions (e.g., clicking) on sensitive elements (e.g., buttons). To ensure target display integrity,sensitiveelementsmustbefullyvisibletotheuser.In contrast, pointer integrity requires that input mechanisms and their resulting actions are fully visible to the user. Temporal Context. The timing of a user’s action is known as the temporal context. To ensure temporal integrity, the user’s action is actually intended by the user. To compromise temporal integrity, a visible button could be replaced by the attacker right before the victim is clicking on it (e.g., with a Facebook Like button). These context definitions provide an important insight on how UIR attacks work in the important case that the user performs simple events such as a single left-click. However, in reality there exists a much broader set of user events (e.g., keystroke, right-click, and a chain of left-clicks). This could lead to new attack variants and therefore different events must be considered (shown in Section 3.1.6).

3.1.3. Events in Web Applications Browser events can be divided into different event types according to the W3C working drafts for handling browser events [85, 88]. In the following, we map common event types into different event type groups. To the best of our knowl- edge, we completely cover all commonly used user interactions. All event types can be either triggered by user or script actions. To name one example, a user can consciously trigger a click event by explicitly clicking

26 on a button with the event-handler onclick. In addition, a script can also trigger this event automatically by using the DOM’s click() method (e.g., document.getElementById("button").click()).

Resource Events. These are frame or object events that are triggered by HTTP events. Examples for resource events are error (failed to load), load (finished loading), and unload (unloading of a document or depending resource).

Mouse Events. Consciously created mouse events are usually left and right clicks. In addition, mouse events can also be generated unconsciously when the pointer is moved or when drag-and-drop actions are done. The most deeply nested element is always the target of a mouse event. Except for user interac- tions on a virtual keyboard, touch events act similar to mouse events and are thus included in the mouse event set. Examples for mouse events are click (button has been pressed and released), mousemove (moved pointing device), and drag (dragged element or text).

Keyboard Events. This event type is for example triggered when a user is pressing (keydown) or releasing a key (keyup). Virtual keyboards, from input devices like touch screens, trigger keyboard events and are therefore also in this even type group.

Multiple Events. Some events cannot be assigned to only the mouse or key- board; they can also be triggered by both variants. As an example, a user can select text in an input element by using the mouse cursor (click and mark) and also the keyboard (shift and arrow keys).

Based on these event types, we provide a definition for trustworthy events:

An event is called trustworthy when it was triggered by a conscious user action.

3.1.4. DOM Property isTrusted The W3C specification ([165], Section 3.4) describes a boolean attribute isTrusted: “The isTrusted attribute of trusted events has a value of true,whileuntrusted events have a isTrusted attribute value of false.” We investigate this attribute in detail and show that it is not related to trustworthy events.

Different isTrusted Implementations. According to the W3C, the DOM prop- erty event.isTrusted only returns true when an event was dispatched by the user agent [164]. According to the MDN, the property is defined as true “when the event was generated by a user action, and false when the event was created or modified by a script or dispatched via dispatchEvent” [123]. IE is an excep- tion because all events are true except they are created with createEvent(). This JavaScript feature can be used to create an event object and simulate an event type such as a mouse event (e.g., an automatically fired click on a button for testing Web applications).

27 isTrusted=false, but Pop-Ups are Allowed. Listing 3.1 contains a button and a hyperlink. If the button is clicked by the user, the onclick event-handler calls document.getElementByID("test").click(), and this JavaScript function selects the hyperlink (which has id="test"), and performs a script-generated click event on it. Consequently, the value of isTrusted,whichisshowninthe alert() window, is false, as described in the W3C specification. Nevertheless, window.open() is executed, and a pop-up window is displayed.

1 2 Trusted Click Listing 3.1: Pop-ups are not blocked although isTrused is false.

isTrusted=true, but Pop-Ups are Blocked. Listing 3.2 provides an example with the

1 Listing 3.2: Pop-ups are blocked although isTrusted is true.

Inheritance of Trustworthiness. Our evaluation of the behavior of isTrusted and the displaying of pop-up windows shows an interesting result; events oc- curring within a delay of one second after an initial trustworthy event are also treated as trustworthy events, although they may be triggered purely by JavaScript. More formally: let Pt = true denote the fact that the pop-up window opened at time t was not blocked by the pop-up-blocker. Let iT = t0 denote the fact that a trustworthy event was initiated by the user at time t0.Thenwehave:

true, if (iT = t0) ^ (|t − t0|1 sec) Pt := ⇢ false,else The interesting discovery is that a pop-up window will not be blocked in the event that there was once a (real) user’s click in the chain of events. This behavior was observed for the tested versions of FF and Safari (SA).

28 3.1.5. Trustworthy Scenarios The W3C UI Events specification [165] does not recommend actions that are allowed after a trustworthy event. As shown by Huang et al. [74], a missing formal definition could lead to different browser implementations and thus to browser bugs and vulnerabilities. Next to our trustworthy event definition, we address this issue by providing a description of three different trustworthy scenarios. We believe that the sci- entific community and browser vendors will get a valuable overview about this currently not examined area and thus derive new attack variants and counter- measures (cf. Section 3.1.6).

Pop-Up Scenario Need of Trustworthy Events. In the past, JavaScript code was able to auto- matically open pop-up windows when the user simply opened a website. The industry used this feature to show unwanted ads to the user and thus modern browsers distinguish between wanted and unwanted pop-up win- dows: a pop-up window should only be shown when a trustworthy event (e.g., click) was used to call the required JavaScript pop-up-code (e.g., window.open).

Evaluation. Table 3.1 lists four different types of events with each event type containing different events. Each event type includes different events. The test cases for these events were executed in four different browsers: IE 11, FF 47, GC 54, OP 41, and SA 10. Our test function for pop-ups is given in Listing 3.3.It tries to create up to five pop-up windows in case that the code is called indeed. If this is the case, all five pop-ups are displayed in FF and SA; in contrast, only one pop-up with a warning window in IE, GC, and OP.

1 Listing 3.3: Our test function for pop-ups.

In the first event type group, resource events are given. These events are inter alia triggered by loading the browser’s window or by simply reloading it. The user does not use an input device like a mouse or a keyboard and thus pop-up windows are not displayed. Mouse events are the second type of events. Our test cases cover left-clicks, right-clicks, mouse movements, dragging actions, and the usage of the mouse wheel. In the event of a left-click, pop-ups will be shown. A right-click only leads to pop-up windows in IE. Mouse movements and dragging actions do not let the tested browser open pop-up windows. The event wheel is triggered when the wheel rolls up or down over an HTML element; it does not lead to

29 the displaying of pop-up windows in FF, GC, and OP. Furthermore, this event is not supported in IE. With the third defined type called keyboard events, only GC and OP act in a pop-up scenario. IE and FF behave differently, pop-ups will be blocked. The fourth type called multiple events consists of events that can be triggered in different ways like keyboard actions and left-clicks. It shows that there are events which act differently across browsers; only some browsers allow access to the pop-up scenario and IE only in case of a left-click in combination with the event select. In IE 11 and FF 47, a left-click in combination with focus or blur does not lead to a pop-up execution. As another example, FF grants access when an input event in combination with a right-click for copy-and- paste is used. This is not the case when this event is used in combination with a keyboard action. GC and OP act exactly in the opposite way.

Events Type IE 11 FF 47 GC 54 OP 41 load, error, unload Resource 7 click, dblclick, mousedown, 3 mouseup (left-click) contextmenu (right-click) Mouse 3 7 mouseenter, mouseleave, mouse- 7 move, mouseout, mouseover (movement) drag, dragstart (dragging) 7 wheel 7 keydown, keyup, keypress Keyboard 7 3 search (keyboard, left-click) – (7,3) select (keyboard, left-click) (7,3) 3 input (keyboard, right-click paste) Multiple 7 (7, 3) (3, 7) focus (keyboard, left-click) 7 3 focusin, focusout (keyboard, left- 7 – 3 click) blur (keyboard, left-click) 7 3 (7, 3) scroll (keyboard, wheel) 7

Table 3.1.: Events and their triggered pop-up windows. 3 indicates that the pop-up was shown, 7 that it was blocked. For the category of multi- ple events, “keyboard” denotes all events of type “Keyboard”, and (3,7) means that a keyboard event did result in a pop-up, whereas the mentioned click event did not.

Clipboard Scenario Need of Trustworthy Events. Clipboard data may contain sensitive infor- mation that should not be shared with an arbitrary website. For example, password managers usually save stored passwords into the clipboard in order to insert them into login forms (e.g., for banking or shopping). Therefore, JavaScript code that is able to automatically read clipboard data could copy the password from the clipboard and send it to the attacker. For this reason,

30 browsers should only allow access to clipboard data after a conscious user ac- tion, i.e. after a trustworthy event. A moderate security problem arises in the event of copy and cut operations to the clipboard; a website should not overwrite clipboard data without an explicit permission of the user.

Evaluation. As shown in Table 3.2, the clipboard always allows copy, cut, and paste operations with the help of a keyboard or mouse pointing device (no script execution). In the event of automatically executed scripts, it is usually not possible to access the user’s clipboard. IE is an exception as it allows access to copy, cut, and paste operations (see Listing 3.4)byshowingtheuser a confirmation window which only gives access when the user explicitly clicks on Allow access.

1 //read data of type ￿￿Text'' from clipboard 2 window.clipboardData.getData(" Text"); 3 //write data of type ￿￿Text'' to the clipboard 4 var input = " This text is written to the clipboard"; 5 window.clipboardData.setData(" Text",input); Listing 3.4: JavaScript functions to access the clipboard.

By looking at the results from the pop-up scenario (cf. Table 3.1), JavaScript code can act on a higher privileged authorization level in case that the script was triggered by a trustworthy event. We found that the clipboard copy and cut capabilities are also enabled when a trustworthy event calls JavaScript code. To name an example, a listener on the event click can be used to copy data into the clipboard via clipboardData.setData.ExceptIE, event handlers which are able to open pop-up windows are also able to access the clipboard API with copy and cut capabilities within a delay of one second (e.g., via the EventTarget.addEventListener() method) [122]. Thus, our pop-up definition with Pt (cf. Section 3.1.4) also applies to these kinds of clipboard API access. Paste operations can only be accessed with the help of JavaScript code when the user triggers a trustworthy paste event via Ctrl+V and Edit->Paste.This clipboard API [149] paste event behavior is important from the security per- spective (discussed in Section 3.1.6).

Drag-and-Drop Scenario Need of Trustworthy Events. Drag-and-drop operations can be done same- origin or cross-origin. Thus, the usual access limitations of the SOP in the HTML context does not apply in this scenario. Modern browsers like GC even allow the user to drag content from the desktop into the browser’s website (e.g., for file uploads). Without trustworthy events, arbitrary data from another win- dow and environment could be stolen automatically with the help of JavaScript code.

JavaScript DOM Access. An example for transferring data via drag-and-drop is given in Table 3.3. In this table, the host document (HD) shown in Listing 3.5 includes the embedded document (ED) displayed in Listing 3.6.

31 Action via IE 11 FF 47 GC 54 OP 41 Copy / Cut Right mouse-click then copy/cut 3 Keyboard: Ctrl+C 3 Script 7 (3) Trustworthy Event and then script cf. Table 3.1 Paste Right mouse-click then paste 3 Keyboard: Ctrl+V 3 Script 7 (3) Trustworthy Event and then script (7)

Table 3.2.: Clipboard handling. 3 denotes that the text is copied, 7 that it is not copied. (3) denotes that the text is copied, but a warning is displayed. The reference to Table 3.1 means that any trustworthy event that could be used to trigger a pop-up in FF 47, GC 54, or OP 41 can be used, in combination with the JavaScript code given in Listing 3.4, to write text to the clipboard.

The first part of Table 3.3 illustrates that the code of Listing 3.5 can be used, in the same-origin case, to copy the word Test into the input field of Listing 3.6. This is possible because we select this word by using the ID HDt and afterwards we copy it into the input field with the ID EDi. To do this, one must select the embedding element with the ID EDf. In the cross-origin case, the browser does not allow the copy-action. From an attacker’s perspective, it is interesting to know whether it is possible to do actions which are restricted by the SOP [138, 25]:

1. We trigger the JavaScript function of Listing 3.5 by dragging the content of

to trigger the JavaScript function copy() with the help of the ondragstart event-handler. In this case, only same-origin access from the HD to the ED is allowed.

Iframe access IE 11 FF 47 GC 54 OP 41 JavaScript Same-Origin (SO) 3 Cross-Origin (CO) 7 Mouse Events Click calls function (SO) 3 Click calls function (CO) 7 Drag&Drop (SO, CO) 3 7

Table 3.3.: HD wants to transfer data to the Iframe’s Web page (3 access, 7 no access).

32 2. Cross-origin drag-and-drop operations are allowed in two browsers: IE 11 and FF 47. Trustworthy events like selecting the text test with the mouse, dragging it into the Iframe’s input field and dropping the selected text into this field allows to do actions that are (cross-origin) restricted with JavaScript code. GC and OP also allowed these actions in former versions (cf. Section 3.1.6).

1 Test
2 3

34 4 Listing 3.8: HTML code of the HD (scenario: drag-and-drop attack). The Iframe’s content is shown in Listing 3.9. It only consists of an input area and JavaScript code which shows an alert-window on the condition that the attacker defined content is dropped. Thus, the alert-window only appears in case that the proof-of-concept functions as expected. In a real world application, there could be a in the background which automatically looks up the dropped user input by pulling XMLHttpRequest leading to a code injection, and thus to Cross-Site Scripting.

1 9 Listing 3.9: HTML and JavaScript code of the ED (scenario: drag-and-drop attack).

Multiple Pop-Up Attack As shown in Table 3.1, a pop-up window can be generated with a trustworthy event like a click within a delay which is shorter than one second. For FF and SA, we evaluated that more than one pop-up window will not be blocked once a single pop-up is generated. In contrast, GC, OP, IE, and Edge show one pop-up window and an additional warning window as an information about the blocking of the other pop-up windows.

1 8 Spam Listing 3.10: HTML and JavaScript code of the ED (scenario: multiple pop-up attack).

An example is given in Listing 3.10. After a click on Spam the trustworthy event click is triggered and thus the function makePopups() is called. The func- tion includes a for-loop which generates 1,000 windows that could be either

35 pop-ups (this example) or new tabs (by removing the third parameter with width and height). In FF and SA, all of these windows are shown to the user. This behavior leads to a heavy memory consumption and thus heavily slows down the underlying system’s speed. It is likely that a victim will close all browser windows simultaneously and for this reason, it may also lose existing browser sessions (e.g., in other tabs). Another use case is click- by creat- ing multiple pop-ups with advertisements; an attempt to close these unwanted windows could lead to an unintended click and thus a successfully clicked ad- vertisement. The behavior of FF unexpected due to browser settings that are reach- able via about:config. Firstly, the property dom.popup_maximum (maximum number of pop-up windows) has a default value of 20. We are clearly able to generate more windows with trustworthy events. Secondly, the property dom.popup_allowed_events (events that spawn pop-ups) has the value change click dblclick mouseup notificationclick reset submit touchend. As shown in Table 3.1, we could also use other events like a left-click triggered select (not listed within dom.popup_allowed_events). Therefore, there is a lack of handling pop-up windows properly. We have reported these problems to Mozilla.

Hijacking Clipboard Data In contrast to browsers like FF, GC, and even Edge, IE allows full access to the clipboard after a confirmation on a warning window (cf. Table 3.2). Click- jacking can be used to attack an IE user and thus to get access to the saved clipboard data that may contain sensitive data like a password. We introduce two new attack sub-variants to steal clipboard data. Firstly, by stealing the second click from a double-click scenario which was described by Huang et al. [74]. Secondly, by just using a single click; this highlights the importance of looking on different trustworthy events. The first variant is displayed in Listing 3.11. With the help of social engineer- ing, the attacker lures a user to make a double click on the displayed button. The first click of the double click triggers the onclick event-handler, which shows the accessed clipboard data in an alert window (as a proof-of-concept). For the clickjacking attack, the second click of the double-click actually occurs on the Allow access button of the confirmation window. To ensure that a user always hits the Allow access button, the Double Click button will always be positioned in the middle of the screen (with slide adjustments). The second variant is targeting an impatient user. The Double Click button is named DL in X where X is a countdown in seconds until zero. An impatient user will wait until the button’s counter reaches zero to download a file, and thus the click will be timed correctly. The attacker will therefore show the confirmation dialog 300ms before the button’s counter reaches zero, such that the click will be hijacked successfully. The limitation of both attack variants is that the confirmation window must be visible for at least 300 milliseconds; this is the lower bound we have measured.

36 The Human Benchmark Project1 recorded over 51 million clicks and measured that the average reaction time of a human is 282 milliseconds (while the user was aware of being timed). Therefore, it is very likely that a user is not able to cancel the hijacked click on the confirmation window.

1 3 Listing 3.11: HTML and JavaScript code of the ED (scenario: clipboard attack).

3.1.7. Defenses Discussion We have evaluated that trustworthy events are implemented differently across browsers. Our formal definition of trustworthy events and the thereby derived descriptions of three different scenarios might help browser vendors to minimize the high number of event handling differences. An approach to help browser vendors to avoid bugs and features that may lead to security vulnerabilities is to compare their browser result with the result of the majority of other modern browsers. For example, it may be suspicious if just one out of seven tested browsers allows access (or a particular interaction) after a trustworthy event; for reasons of clarification it should be mentioned here that the set of browsers could be extended (e.g., by considering more browsers like Brave and ).

Drag-and-Drop Attack. Drag-and-drop actions are known since the introduc- tion of Web browsers, which still allow restricted draggings of for example text elements (selected text), images (image URL), and anchor-elements (anchor URL). Moreover, HTML5 has introduced a drag-and-drop API [170] that is nowadays integrated in all modern Web browsers. We constructed a drag-and-drop attack variant that can be executed in three (IE 11, Edge 20, and FF 47) tested browsers. A simple but effective countermea- sure is to prohibit drag-and-drop frame attacks by disallowing drag operations with data across frames from different origins. Browser vendors like Google and Opera allowed cross-frame drag-and-drop operations in the past; nowadays, this is not possible anymore for security reasons (cf. Section 3.1.6)

Pop-Up Attack. FF is the only tested browser which allows creating hundreds of pop-ups after a trustworthy event like a left-click within the measured delay of one second. All other tested browsers disallow the execution of multiple pop-ups and therefore the user will not be annoyed when, for example, they appear unintentionally. The majority of our tested browser behavior results

1http://www.humanbenchmark.com/tests/reactiontime/statistics

37 can therefore be used to derive a countermeasure for FF; this browser should only show one pop-up window after a trustworthy event.

Clipboard Data. Our clipboard data attack variant on IE showed that a user should not get an unlimited control over the whole clipboard data by just exe- cuting JavaScript code. For this reason, there are different access types (copy, cut, paste) that are implemented in modern browsers due to the W3C clipboard API [149]. However, the behavior of IE underlined that read access should only be allowed with a trustworthy event like a keystroke combination (e.g., STRG+V). The countermeasure of disallowing clipboard read access is very strict and it might be more convenient to get only read access if the user explicitly gives the permission by showing a clipboard permission window for a time that is significantly higher than the human response time; this should be longer than the short display time of the IE permission window (cf. Section 3.1.6). According to the Human benchmark project, only a negligible amount of the measurements (<0,1%) have a longer human response time than 500 millisec- onds. As a consequence, a browser implementation should only activate the Allow access button of the permission window after a trustworthy event and a delay of at least half a second. This ensures with a high probability that the second click will not be hijacked by an attacker.

3.1.8. Related Work Definitions & Specifications. Huang et al. [74]discussedUIR attacks and defenses with a definition of UIR. They developed a defense called InContext to mitigate UIR attacks. The W3C created a UI safety specification [109] that is based on the ideas of InContext. Similar UI contexts are mentioned in the W3C UI security and visibility API [88]. These foundations of describing trusted events do not consider conscious user actions, which we define as trustworthy events. Without these events, UIR attacks could not be executed. By looking at the concept of zones and scenarios, IE includes predefined zones like Internet, Local Intranet, and Trusted Sites [115]. This concept is partially adopted between browsers by explicitly white-listing trusted sites [68]. Trusted site lists can be used to manage whether certain actions should be automatically executed (e.g., generate cryptographic keys, play Flash files, and show pop-ups).

Attacks & Countermeasures. Grossman et al. [61] introduced Clickjacking as an attack which is nowadays considered as a class of attacks which relies on the broader set of UIR attacks. Although the attack on Flash received high me- dia attention and several bug fixes since 2008 [17], it was successfully attacked years later (e.g., in 2011 [16]). Next to JavaScript-based frame busters [140], the HTTP Header X-Frame-Options [95, 37], and nowadays even the Content- Security-Policy [148] can be used to defend against many types of UI redressing. In an evaluation about different JavaScript-based UIR protection mechanisms, Rydstedt et al. [140] pointed out that there exist attacks which can be used to attack protection mechanism and thus disable it. Balduzzi et al. [24]de- signed and implemented an automated system to analyze Clickjacking attacks.

38 Niemietz et al. [129] evaluated the security of home routers and found out that none of them are protected against UIR. Rydstedt et al. [139] published a paper about UIR on mobile sites and also on home routers. Lekies et al. [14] presented bypasses for Clickjacking defense tools like No- Script’s ClearClick. Furthermore, they introduced a new attack technique called nested Clickjacking. By showing that UI time delays as defense mechanisms are not sufficient to protect the user, Akhawe et al. [19] created examples which bypass the W3C UI safety specification [109].

Mobile Devices. Lin at al. [99] published Screenmilker, which analyzes the user interface of an Android device. By using the Android debug bride (ADB), they showed that Screenmilker is able to make screenshots during user inter- actions and they were able to steal secrets like passwords. Bianchi et al. [32] published a study on Android-based graphical user interface confusion attacks [128]. These attacks concentrate on phishing and privacy violations. Niemietz et al. enumerated different UIR attacks [125] and their countermeasures. Further- more, they provided a Tapjacking attack to compromise Android devices [128]. Based on this work, Fratantonio et al. [54] created malicious apps that com- pletely control the UI feedback loop. They furthermore showed with a user study that none of the created attacks could be detected by a user.

3.1.9. Conclusions In this paper, we provide a definition of trustworthy events, which are the target of UI Redressing attacks. We show that this concept is significantly different from the concept of trusted events as defined by the W3C. Interpretations of events as being trustworthy differ significantly between browser families, and by a non-documented inheritance mechanism trustworthiness may be transferred, within the time frame of one second, from a trustworthy event to a sequence of events triggered by JavaScript. This, for example, allowed us to circumvent the FF pop-up blocker. We investigated three scenarios where trustworthy events play a major role in protecting the security of Web applications: pop-ups, drag-and-drop, and copy- and-paste. In all three scenarios, differences in the interpretation of trustworthy events could be shown. We refined one new example attack variant in each scenario, based on a more detailed investigation of these scenarios. Finally, we discuss defense mechanisms by analyzing the causes of our trustworthy event attacks. With the definition and description of trustworthy events, we hope that this paper will contribute to a better understanding of UIR attacks, and thus improved Web application security.

3.2. Same-Origin Policy: Evaluation in Modern Browsers

In the previous Section 3.1, we have evaluated that a user’s event can be used to do high privileged actions like creating pop-up windows. Furthermore, JavaScript-based access can be strongly limited in case of cross-origin access.

39 However, trusted events can trigger cross-origin actions like drag-and-drop and thus legally bypass origin restrictions. A question that is still open is inter alia whether JavaScript-based cross- origin access is just not allowed in case of the tested iframe element. This chapter evaluates how the SOP works for the HTML context – including an evaluation of other elements – and it creates a formal definition to get a better understanding of this security policy.

Remark. The content of this Chapter 3.2 was created in a joint work with Christian Mainka and J¨org Schwenk. All of us created the formal model. My focus was the methodology and the SOP-DOM evaluation. This work was published at USENIX Security 2017 [142].

Abstract. The term Same-Origin Policy (SOP) is used to denote a complex set of rules which governs the interaction of different Web Origins within a Web application. A subset of these SOP rules controls the interaction between the host document and an embedded document, and this subset is the target of our research (SOP-DOM). In contrast to other important concepts like Web Origins (RFC 6454) or the Document Object Model (DOM), there is no formal specification of the SOP-DOM. In an empirical study, we ran 544 different test cases on each of the 10 major Web browsers. We show that in addition to Web Origins, access rights granted by SOP-DOM depend on at least three attributes: the type of the embedding element (EE), the sandbox, and CORS attributes. We also show that due to the lack of a formal specification, different browser behaviors could be detected in approximately 23% of our test cases. The issues discovered in Internet Explorer and Edge are also acknowledged by Microsoft (MSRC Case 32703). We discuss our findings in terms of read, write, and execute rights in different access control models.

3.2.1. Introduction The Same-Origin Policy (SOP) is perhaps the most important security mech- anism for protecting Web applications, and receives high attention from devel- opers and browser vendors.

Complex Set of SOP Rules. Today there is no formal definition of the SOP itself. Web Origins as described in RFC 6454 are the basis for the SOP, but they do not formally define the SOP. Documentation provided by standardization bodies [160] or browser vendors [119] is still incomplete. Our evaluation of related work has shown that the SOP does not have a consistent description – both in the academic and non-academic world (e.g., [89, 131, 98]). Therefore, recurrent browser bugs enabling SOP bypasses are not surprising. SOP rules can roughly be classified according to the problem areas which they were designed to solve (cf. Table 3.4). It is impossible to cover all these subsets

40 SOP Sub- Description Related set Work DOM access This subset describes if JavaScript code loaded into [160], [119], (this work) one “execution context” may access Web objects in an- [138], [159], other “execution context”. This includes modifications [98],[175] of the standard behavior by changing the Web Origin, for example, using document.domain. Local stor- This subset defines which locally stored Web object [145], [178] age and ([name,value] pairs) may be accessed from a JavaScript session execution context. storage XHR This subset imposes restrictions on cross-origin HTTP [161], [145], network access. It contains many ad-hoc rules and its [178], [20] main concepts have been standardized in CORS. Pseudo- Browsers may use Pseudo-protocols like about:, [178], [20] protocols javascript: and data: to denote locally generated content. A complex set of rules applies for the defini- tion of Web Origins here. Plugins Many plugins like Java, Flash, Silverlight, PDF come [86], [178] with their own variants of a SOP. Window/Tab Cross-window communication functions and [178], [20] properties: window.opener, open() and showModalDialogue(). HTTP This subset, with an extension of the Web Origin con- [180], [35], Cookies cept (path), defines to which URLs HTTP cookies will [110] be sent. This defines their accessibility in the DOM for non-httpOnly cookies.

Table 3.4.: Different subsets of SOP rules.

41 in a single research paper and even may be impossible to find a “unifying for- mula” which covers all subsets.2 However, it is possible to cover single subsets, as previous work on HTTP cookies has shown [180]. Thus, we restricted our attention to the following research questions:

• How is SOP for DOM access (SOP-DOM) implemented in modern browsers?

• Which parts of the HTML markup influences SOP-DOM?

• How does the detected behavior match known access control policies?

More precisely, we concentrate on a subset of SOP rules according to the following criteria: • Web Origins. We use RFC 6454 as a foundation.

• Browser Interactions. We concentrate on the interaction of Web ob- jects once they have been loaded. It is a difficult task to select a test set for SOP-DOM that has constantly evolved over nearly two decades. The SOP-DOM has been adapted several times to include new features (e.g., CORS) and to prevent new attacks. 15 out of 142 HTML elements have a URL attribute and may thus have a different Web Origin [162]. Additionally, sandbox and CORS attributes also modify SOP-DOM.

The Need for Testing. Amongst Web security researchers, SOP-DOM is par- tially common knowledge, but not thoroughly documented. Although this means that most researchers are familiar with many edge cases in SOP-DOM, especially those relating to attacks and countermeasures, it is likely that some of those edge cases will not be covered in this paper. Additionally, each individual researcher will be unaware of other edge cases, which may include novel vulner- abilities. For example, it is well known that JavaScript code from a different Web origin has full read and write access to the host document;nevertheless, recently Lekies et al. [98] pointed out that there is also read access from the host document to JavaScript code, which may constitute a privacy problem. Additionally, HTML5 has brought greater diversity to seemingly well-known HTML elements. For instance, the term “authority” used in RFC 6454 [26] may not be sufficient anymore if we compare the power of SVG images [114] with the following quote from RFC 6454: “an image is passive content and, therefore, carries no authority, meaning the image has no access to the objects and resources available to its origin”. Our evaluation shows that this statement is true for all image types if they are embedded via . This statement does not hold if SVG images are embedded via Listing 4.1: Classic clickjacking on Bing.com using social engineering techniques.

First, the victim has to visit attackers.org. This can be realized by sending a phishing mail or a link via a social network like Facebook. Second, the victim has to click, in our scenario, on an element defined by the attacker. As displayed in Listing 4.1, the attacker can use smileys for social engineering reasons so that the victim will be motivated to click on the Click me button. The intention behind smileys is to induce the impression that clicking on the button leads to a desirable content. However, if the victim clicks on the button, he will actually click on the logout link inside the loaded bing.com website of the Iframe. The iframe element includes CSS properties to on the one hand move the elements position over the button and to make it invisible on the other hand. Making things invisible can be achieved by using the CSS properties opacity and inter alia filter:alpha. Table 4.1 shows CSS properties that can be used to provide transparent elements in older browser versions like Internet Explorer 6.8 Nowadays, opacity is supported by all modern browsers; the value can be a number from 0 (fully transparent) to 100 (fully opaque).

Property Browser -moz-opacity: 0.50; Firefox <0.9 -khtml-opacity: 0.50; Safari <1.2 filter:alpha(opacity=50); Internet Explorer 5, 6, 7 filter:progid:DXImageTransform. Internet Explorer 8 Microsoft.Alpha(opacity=50); opacity: 0.50; Other browsers

Table 4.1.: Different CSS properties to define the transparency of HTML ele- ments.

4.1.2. Updated Element Positions One challenge with clickjacking is to define the exact position of the target element from an attacked website. Furthermore, the positioning can be more complex in case that updates due to dynamic content are regularly generated. A possible solution for this positioning challenge is to make use of HTML anchors, which can be set with the a element (e.g., Target)

8opacity, Mozilla Developer Network, https://developer.mozilla.org/en/docs/Web/CSS/ opacity,Sep.2014

67 [7]. In case that an HTML anchor is available, an attacker could reference it with the help of a hash key inside of the attacked website’s URL.

Simple Scenario. To provide a concrete example [150], we can directly jump to the menu of the website http://www.nds.rub.de by analyzing its source code. We analyze the source code to detect the right anchor in the address bar of the browser; a truncated HTML code of the Web page is given in Listing 4.2.

1