Crouching Tiger – Hidden Payload: Security Risks of Scalable Vectors

Mario Heiderich Tilman Frosch Meiko Jensen Chair for Network and Data Chair for Network and Data Chair for Network and Data Security Security Security Ruhr-University Bochum, Ruhr-University Bochum, Ruhr-University Bochum, Germany Germany Germany [email protected] [email protected] [email protected] Thorsten Holz Chair for System Security Ruhr-University Bochum, Germany [email protected]

ABSTRACT impact on state-of-the-art web browsers such as 4, Scalable (SVG) images so far played a rather 9, and 11. small role on the Internet, mainly due to the lack of proper browser support. Recently, things have changed: the W3C Categories and Subject Descriptors and WHATWG draft specifications for HTML5 require mod- ern web browsers to support SVG images to be embedded K.6.5 [Security and Protection]: Unauthorized access in a multitude of ways. Now SVG images can be embed- ded through the classical method via specific tags such as General Terms or , or in novel ways, such as with tags, CSS or inline in any HTML5 document. Security SVG files are generally considered to be plain images or , and security-wise, they are being treated as such Keywords (e.g., when an embedment of local or remote SVG images into websites or uploading these files into rich web appli- ; Web Security; Browser Security; cations takes place). Unfortunately, this procedure poses Cross Site Scripting; Active Image Injections great risks for the web applications and the users utilizing them, as it has been proven that SVG files must be consid- 1. INTRODUCTION ered fully functional, one-file web applications potentially containing HTML, JavaScript, Flash, and other interactive One of the factors behind the huge success of the World code structures. We found that even more severe problems Wide Web is its ability and capacity for viewing image files have resulted from the often improper handling of complex within a . Compared to the text-only formats, and maliciously prepared SVG files by the browsers. an image can convey considerably more information. A typi- In this paper, we introduce several novel attack techniques cal browser supports many different image file formats, such targeted at major websites, as well as modern browsers, as JPEG, PNG and GIF files, whilst the vast majority of email clients and other comparable tools. In particular, we websites on the Web contain at least one graphic in either illustrate that SVG images embedded via tag and CSS one form or another. Since image files are complex and need can execute arbitrary JavaScript code. We examine and to be parsed and rendered before they can be displayed by present how current filtering techniques are circumventable a browser, it comes as no surprise that the images have se- by using SVG files and subsequently propose an approach to curity implications. To give an example, there were sev- mitigate these risks. The paper showcases our research into eral cases in the past where the validation routine of im- the usage of SVG images as attack tools, and determines its age libraries contained security flaws leading to vulnerabili- ties [1,2,4]. For this reason, we need to consider the risk of images as the attack vectors. One image format that has up till now received very lim- ited scrutiny and little attention from the web development Permission to make digital or hard copies of all or part of this work for community is Scalable Vector Graphics (SVG [5]). This fam- personal or classroom use is granted without fee provided that copies are ily of file formats comprises several specifications and spec- not made or distributed for profit or commercial advantage and that copies ification drafts for composition and rendering of the vector bear this notice and the full citation on the first page. To copy otherwise, to based images and graphics. SVG is based on XML and was republish, to post on servers or to redistribute to lists, requires prior specific first published by the W3C in 1999. SVG images have not permission and/or a fee. CCS’11 October 17–21, 2011, Chicago, Illinois, USA. gained much traction from the web developers, as the sup- Copyright 2011 ACM 978-1-4503-0948-6/11/10 ...$10.00. port provided by major browsers was not consistent and only Several other attacks such as SVG-based cross site scripting attacks or SVG chameleons (i.e., files that are interpreted differently depending on how they are opened) are also de- livered. AII attacks are particularly important, since they are caused by faulty SVG implementations in modern brow- sers, thus affect all websites allowing users to embed exter- nal images – thereby our research significantly extends and partly falsifies the information available in the Browser Se- curity Handbook, so far only covering risks connected to browser-deployed SVG in plugin containers. Furthermore, we discuss how current state-of-the-art filtering techniques are deceivable via using SVG files. The basic idea behind all of our attacks is the fact that Figure 1: A classic GhostScript/SVG example SVG files can accommodate active content, whereby brow- sers actually interpret this content due to its being standard- compliant. This idea is related to similar attacks that take advantage of code embedded in document formats [8,13,32] a small subset of SVG features had been known to work reli- and we show that SVG images can be turned into an attack ably on a sufficiently large base of the web browsers. Brow- vector. In addition, we demonstrate the damaging potential sers, like Firefox 1.5, already supported a decent subset of of SVG files embedding arbitrary data formats and show SVG features in November 2005, showcasing SVGs such as how this property can be used to carry out attacks using the famous GhostScript Tiger shown in Figure 1. This par- Adobe Reader, Java Runtime Engine and Flash Player vul- ticular image is often used to illustrate the abilities of the nerabilities. We discuss the impact of our attack on modern vector based graphics to display complex structures. Other browsers such as Firefox 4, , and Opera browsers, namely Internet Explorer, did not support SVG, 11, showing that especially inline SVG grants new possibil- unless a user installed an external plug-in. ities for bypassing website- and browser based XSS filters. All this has significantly changed with the recent appear- To mitigate the risks introduced by SVG-based attacks, ance of HTML5: the W3C and WHATWG draft specifica- we debate and evaluate several defense strategies. We show- tions for HTML5 require modern web browsers to support case a filtering solution that is capable of removing potential SVG images’ embedment in a multitude of ways [25]. SVG malicious content from a given SVG file. Our approach does images can now for example be engrafted in a given doc- not break the functionality of the core features of fully in- ument either in the classical way via specific tags such as teractive and descriptive SVG images. Instead, we extend or tags, or in the novel ways such as with an existing and widespread filtering software to support fil- tags or inline in any HTML5 document. Internet Ex- tering of illegitimate and malicious content from SVG files, plorer 9 currently supports a large subset of SVG features without damaging the benign file structure and contents. as tests with the tech previews and available beta versions We have formerly implemented a prototype of the system show. Furthermore, both Firefox and Webkit-based brow- and tested it with 105,509 SVG images obtained from Wiki- sers, such as Chrome and , as well as Opera, provide pedia. We found that we can filter 98.5% of the files without thorough SVG support. causing any difference in the visual appearance of the im- Hitherto, SVG is mainly used in the context of screen age, and for the remaining 1.5% we determined the visual and print design, as well as in the cartography and medical deviation to be negligible in more than half of the cases. imagery, which all can be attributed to the lack of proper In summary, we make the following contributions: browser support [35]. This is however expected to change, owing to the SVG support being implemented in all modern • We are the first ones to demonstrate the security risks web browsers, consequently lifting SVG files from being a tied to SVG files in the context of the niche format for W3C compliant and plug-in-equipped brow- and comparable client-server environments. Further- sers only, to a widely used toolkit for enhancing images, - more, we argue that SVG files must not be perceived grams and rich-text documents across the board. Depending as images, but rather full stack applications, provided on the rendering client’s capabilities, an SVG file can con- the cases of them being rendered by a web browser tain interactive and animated elements. Processing events or similar client software. This holds for SVG images and raster images, embedding videos, and rich-text are also rendered via and similar tags, as well as dis- feasible. Contrary to popular belief, SVG files should thus played as standalone file in modern web browsers. not be considered to be plain images or animations, and it is necessary to treat them as fully functional, one-file web • We show several innovative attack techniques illus- applications capable of potentially containing HTML, Java- trating the potential of maliciously crafted SVG files, Script, Flash and other interactive code structures. which we call Active Image Injection (AII). We exhibit In this paper, we elaborate on the security risks of im- how SVG files can cause damage to major websites, proper SVG handling. We introduce several novel attack and discuss the damage potential of SVG files embed- techniques of using SVG images to target modern, real life ding arbitrary data formats. The hereby discussed AII web applications (such as MediaWiki installations like Wiki- attacks affect a whole browser family, rendering any pedia, DeviantArt, and other high profile websites), as well website leaving user submitted images vulnerable if the as their unsuspecting users. Specifically, in Section 3 we users visit it with the affected web browsers. Moreover, present an Active Image Injection (AII) attack, in which we show how inline SVG can be used to facilitate XSS arbitrary JavaScript code can be delivered via SVG files. filter bypasses on current web browsers. • We introduce a defense solution to prevent SVG-based Furthermore, screen-readers can (theoretically) parse attacks by filtering potentially malicious content from SVG data and describe the shapes and visuals used by a given SVG file. The large-scale evaluation results the image, allowing a broader range of users to benefit suggest that this approach can successfully defeat AII from its contents. In contrast, raster-based images are attacks on a practical level. void of this kind of support. The SVG family consists of several members and we use 2. SVG BASICS AND SPECIFICS the following three file types as examples in later sections: This section provides a brief overview of the SVG file for- mat and discusses the attack surface enabled by this image • SVG Full 1.1: SVG Full describes the full SVG fea- format, i.e., we illustrate the different ways available for an ture set including 81 different SVG elements and tags. attacker to send arbitrary SVG files to a victim. The specification is designed without a special focus on the devices parsing the SVG data. 2.1 Overview and Benefits • The Scalable Vector Graphics (SVG) file format was intro- SVG Basic 1.1: SVG Basic is supposed to deliver a duced in 1999 when it was published by the W3C in an at- subset of the SVG Full specification to ease the im- tempt to combine the best of both the specification drafts for plementation for developers of browsers for PDAs and Precision Graphics Markup Language (PGML) developed handheld devices. SVG Basic only provides 70 of the and published by Adobe, and the 81 SVG elements specified in SVG Full 1.1. Contrary (VML) developed and published by , , to SVG Tiny 1.2, the SVG Basic 1.1 features also in- Hewlett Packard, and others, all in 1998. SVG is a vec- clude support for SVG fonts. tor graphics format, i.e., it uses geometrical primitives such • SVG Tiny 1.2: SVG Tiny is specifically designed for as points, lines and curves to describe an image, while it and similar mobile devices with limited supports both static and dynamic content. The above men- computing, rendering, and display capabilities. The tioned static content and dynamic behavior are described in subset of allowed SVG elements and tags has been re- an XML-based format, which implies that SVG files are in duced to 47 elements. SVG Tiny also ships several fact text files. exclusive possibilities for event binding and external The impact of SVG on the Internet can be described with resource loading which we discuss in Section 3. the following characteristics: Additionally, the SVG specification provides interface de- • Scalability: SVG files are, as the name indicates, scal- scriptions for an SVG (DOM), which able based on their nature as vector graphics. This implies that SVG files also offer some dynamic capabilities. means that graphical output devices of any size can Users can create SVG files capable of providing event han- render SVG images without significant information loss dling, effects, time-based changes and animations, as well or facing deficiencies in display quality. In times of as zoom effects and other helpful display enhancements. A websites’ inhomogeneous output devices such as brow- large set of filters can be applied to the elements of SVG sers, feed- and screen readers, smartphones and Wire- files to even more greatly increase the possibilities for image less Markup Language (WML) compatible cellphones, transformation and . this enables web developers to publish rich online doc- The ability to combine SVG with the XML Linking Lan- uments without having to worry about the screen di- guage (XLink) features allows SVG files to link elements mension of the device requesting the document. to other elements in the same image file, other image files • Openness: Unlike classical, raster-based image for- or arbitrary objects referenced via Uniform Request Iden- mats, SVG files are neither stored in a binary format tifiers (URIs). Furthermore, these image files support the nor is there a compression scheme rendering the actual implementation of International Color Consortium (ICC) content of the file unreadable for the human eye. SVG and Standard Red-Green-Blue (sRGB) color profiles, allow- images can be enriched with meta-data and comments, ing the embedment of arbitrary content such as Flash, , so that a (handicapped) human as well as a program Java and HTML via the element. (e.g., a search engine or comparable parser) can ef- 2.2 HTML, SVG, and XML fecively extract relevant information from the file in question. In this case is ensured by storing Being historically an XML-based language, processing of more descriptive information in a given file, compared SVG documents has been quite different from the way brow- to the rather limited possibilities of image comments sers process classic HTML websites. For instance, a slight provided by GIF and PNG files, or the embedded Ex- violation of the XML syntax, such as missing closing tags changeable Image File Format () data in RAW or attribute value quotations, typically cause SVG proces- and JPEG images. Note that gzipped SVG files (also sors to exit with an error. However, with the integration of know as SVGZ) create an exception, for they do not SVG capabilities into modern browsers, this strict parsing use compression as means to reduce the file size. approach got amalgamated with their more tolerant way of processing HTML, CSS, JavaScript and the like. • Accessibility: Related to the aforementioned open- This mixture is causing browsers to process SVG through ness, an SVG image can be enriched with sufficient using two different processing modes: an HTML proces- meta-data and information to be Web Accessibility Ini- sor engine for CSS and JavaScript contents, and an ad- tiative (WAI) compliant, i.e., visually impaired users ditional SVG parser supporting XML-specific features like can extract relevant information by having their tools XML transformations (e.g. XSLT), XML Entity resolu- parse and read the meta-data embedded in the SVG. tion, and tracking of XML Namespace bindings. Depending on the particular website’s style of using SVG, the browser websites providing inline SVG. This means a developer switches between the two processing modes on the go. For and an attacker are equally able to inject arbitrary instance, encountering an inline tag within an HTML5 SVG content right into the markup tree of a HTML document causes the browser to switch from HTML mode to document. The browser will then switch its parsing XML/SVG mode. Vice versa, if the browser encounters an mode, use an intermediary layer to parse the (possi- HTML-specific tag (e.g.

) within an SVG mode context, bly non-well-formed) SVG content, clean it up, pass it it automatically closes all open SVG elements, switches to on to the internal XML parser and layout engine, and HTML mode, and renders the given tag. then commence parsing and rendering the remaining As we show in the following sections, this approach is optional HTML content. The last step likely includes error-prone and may cause a lot of SVG-related vulnerabil- even more inline SVG elements [17] that are capable ities in most state-of-the-art browser engines. of interacting with the already parsed content. In Sec- tion 3.4, we illustrate how this facilitates XSS filter 2.3 Deployment Techniques bypasses of existing websites, filter libraries, and most The capabilities, in terms of scripting and content inclu- importantly browsers and comparable user agents. sion of SVG files, strongly depend on how they are embedded in a website or loaded by the browser attempting to display 4. SVG deployed as font file (SVG Fonts): The them. In this section, we focus on five diverse manners of SVG standard specifies several possibilities to create SVG files being deployed by a webserver or . font files completely consisting of SVG data [3]. Mod- In addition, we outline the attack surface we have discov- ern browsers allow their inclusion via CSS and the ered in connection to the five deployment techniques. The @font directive. In case when the browser supports specific attack vectors we use are discussed in detail in Sec- SVG fonts, for every character with an SVG font as- tion 3, followed by Section 4 focusing on how to mitigate signed, the parser checks whether the character has a and defend against those attacks. representation as an SVG path/glyph data and applies this to the view port if possible. SVG fonts provide a 1. SVG deployed via uploaded files: A large num- prominent range of features for detailed and complex ber of tested web applications (e.g., MediaWiki and font formatting, Unicode support, alternative glyphs, , OpenStreetMaps, DeviantArt, OpenClip- default behavior for missing glyphs, and more. We Art, and several other free image hosting services) con- detected attack vectors allowing the deployment of ar- sider SVG files to be equivalent to raster images such bitrary plug-in content via SVG fonts working on a as PNG, JPEG, and GIF files in terms of security im- variety of desktop and mobile user agents. plications. MediaWiki and Wikipedia claim to block the upload of SVG files containing script code, but 5. SVG deployed via iframe, embed, or object tags: we did manage to easily bypass this restriction. As we The attack surface is comparably large to the one for show in the next section, SVG files should be displayed the classic XSS and does not differ much from the reg- and executed with a heavily limited set of features to ular