
SoK: XML Parser Vulnerabilities Christopher Späth Christian Mainka Vladislav Mladenov Ruhr-University Bochum Ruhr-University Bochum Ruhr-University Bochum Jörg Schwenk Ruhr-University Bochum Abstract the parser behavior can be influenced. Originally de- The Extensible Markup Language (XML) has become signed to define the structure (grammar) of an XML doc- a widely used data structure for web services, Single- ument, it also enables various attacks, such as Denial-of- Sign On, and various desktop applications. The core Service (DoS), Server Side Request Forgery (SSRF), and of the entire XML processing is the XML parser. At- File System Access (FSA). tacks on XML parsers, such as the Billion Laughs and In 2002, Steuck discovered the powerful XML Ex- the XML External Entity (XXE) Attack are known since ternal Entity (XXE) attack on XML parsers that allows 2002. Nevertheless even experienced companies such as FSA[60]. Leading companies like Google [15], Face- Google, and Facebook were recently affected by such book [59, 53], Apple [8] and others [63,9, 16, 17] have vulnerabilities. been recently affected by this attack. In this paper we systematically analyze known attacks The Open Web Application Security Project (OWASP) on XML parsers and deal with challenges and solutions and other resources [47, 46][71] only partially list vul- of them. Moreover, as a result of our in-depth analysis nerabilities and slightly consider countermeasures. Mor- we found three novel attacks. gan [40] provides till date the most complete compilation We conducted a large-scale analysis of 30 different of available attack vectors. A systematic sampling of 13 XML parsers of six different programming languages. parsers was conducted recently [57], however, with only We created an evaluation framework that applies differ- one prevalent kind of FSA and DoS attack within scope. ent variants of 17 XML parser attacks and executed a Attacks relying on the FTP [41] and netdoc protocol [22], total of 1459 attack vectors to provide a valuable insight as well as several bypasses [74] and novel attacks such as into a parser’s configuration. We found vulnerabilities in schemaEntity or XML Inclusion (XInclude) based SSRF 66 % of the default configuration of all tested parses. In are not addressed in any of these sources. addition, we comprehensively inspected parser features Systematic Parser Analysis. We contribute a compre- to prevent the attacks, show their unexpected side effects, hensive security analysis framework of 30 XML parsers and propose secure configurations. in six popular programming languages: Ruby, .NET, PHP, Java, Python, Perl. We identify each parser’s de- 1 Introduction fault behavior by using 17 core tests. This corresponds to all known attack vectors. Based on them, we introduce The Extensible Markup Language (XML) is a wide a metric enabling the comparison of all parsers regarding spread data structure used in many application ar- the security, by computing a Base Vulnerability Score eas ranging from desktop office tools which use it to (BVS). save their documents (*.docx), to XML-based databases Complex Attack Prevention. Finding countermeasures (MarkLogic, eXist), and web protocol standards (SAML, can be tedious, since the parser’s documentation is out- SOAP). dated and a thorough inspection of the source code is On a technical level, the parser translates an input necessary. We extended our core tests with parser- byte-stream into an XML document that can be accessed specific tests, to investigate the implication of security by APIs in different programming languages. relevant parser features and their interaction with each Security of XML parsing. By adding a Document Type other on the overall security. This results in a total of Definition (DTD) directly on top of the XML document, 1459 tests. Contribution. 1 <!DOCTYPE data [ ? We systematically discuss the so-far largest number 2 <!ENTITY a SYSTEM "file:///C:/data/a.txt"> ]> of state-of-the art XML attacks. 3 <data>&a;</data> ? We develop three novel attack vectors. Listing 3: Example of an External General Entity ? We create an evaluation framework [10] consider- ing all known attacks and apply a comprehensive The file "a.txt" is a plain text file. The parser retrieves the evaluation to 30 parsers finding 66 % vulnerable in file and replaces the reference as before. their default configuration. 1 Arachibutyrophobia ? We propose countermeasures (if possible) against Listing 4: The content of a.txt all attacks and propose a secure configuration for each parser. Internal Parameter Entities can be used to instantly ? We apply our framework to Android and reveal a modify the value of a General Entity. yet undiscovered attack. 1 <!DOCTYPE data [ 2 <!ENTITY % m "majoris"> 3 <!ENTITY a "Arachibutyrophobia %m;"> ]> 4 <data>&a;</data> 2 XML Foundations Listing 5: Example of an Internal Parameter Entity. XML is a human-readable, structured document, which The value of Entity "a" instantly changes to is subject to a set of rules. Documents adhering to these "Arachibutyrophobia majoris". rules are called well-formed. Due to space limitations, External Parameter Entities can be used to include ad- we will only discuss two components of XML here - el- ditional entity declarations, which are stored remotely. ements and DTDs. We release an extended version [11] 1 <!DOCTYPE data [ and our evaluation framework [10] to support further re- 2 <!ENTITY % extDTD SYSTEM "file:///C:/data/majoris.dtd" > search in this field. 3 %extDTD; 4 <!ENTITY a "Arachibutyrophobia %m;"> ]> 2.1 XML Elements 5 <data>&a;</data> Listing 6: Example of an External Parameter Elements structure an XML document as in Listing 1. Entity. 1 <data class="products">4</data> The corresponding DTD "majoris.dtd" is shown in Listing 1: Example of an element and an attribute. Listing 12. 1 <!ENTITY % m "majoris"> This document declares an element data with a text Listing 7: Example of an external DTD. content 4 and an attribute class with a value products. The parser first fetches the External Parameter Entity 2.2 Document Type Definition "extDTD", makes the declaration of the entity m avail- able and finally, replaces this reference. A DTD defines a grammar to reject invalid user input and is the first component declared in an XML document. In 2.3 Other XML Technologies addition, DTDs allow the declaration of storage units, so called entities. XInclude. XInclude facilitates the inclusion of an exter- nal (XML) document into the source document. The fol- Entities. There are four different types of entities: In- lowing example shows how to include a file other.xml ternal General Entities offer a neat way to define a value as a child node of the element data. and reference it arbitrarily often within the document. 1 <data> 2 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" 1 <!DOCTYPE data [ href="other.xml"/> 2 <!ENTITY a "Arachibutyrophobia"> 3 </data> 3 ]> 4 <data>&a;</data> Listing 8: XML document containing a XInclude Listing 2: Example of an Internal General Entity instruction. While processing the document, the parser replaces XSLT. Extensible Stylesheet Language Transformations the reference "&a;" with the term "Arachibutyrophobia". (XSLT) is commonly used to transform XML documents External General Entities facilitate the inclusion of ex- into other documents and formats, for example, into ternal files. JSON or PDF [27][42]. 2 XML Schema. An XML Schema defines gram- 4.2 DoS: Billion Laughs mar using an XML style syntax. The XML Internal General Entities can be abused to create an Ex- Schema standard allows for the inclusion of external ponential entity attack (Billion Laughs Attack)[28]. The Schema files by using the schemaLocation and the attack relies on a nested , but limited, level of entity re- noNamespaceSchemaLocation attributes. cursions. 1 <!DOCTYPE data [ 2 <!ENTITY a1 "dos"> 3 Attacker Capabilities 3 <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;"> 4 ... 5 <!ENTITY a13 "&a12;&a12;&a12;&a12;&a12;"> ]> For all attacks described in this paper, we assume that 6 <data>&a13;</data> the attacker is able to generate XML messages and that Listing 10: Example of the Billion Laughs Attack the XML parser can process these messages. We as- sume a DTD to contain only such grammatical restric- By defining different nesting levels of Internal General tions which do not hinder the attacker. For example, the Entities, a file of only 200 kilobytes is expanded to sev- attacker can create an XML config file and the targeted eral gigabytes (3.5GB) during the parsing process. application parses this file on startup. In case of web ap- Challenge: Thresholds. A number of parsers detect plications, the attacker can control or upload an XML file and counteract this attack by implementing a threshold to that is processed by the business logic. Hence in our eval- limit the total number of allowed entity references within uation, we directly invoke the targeted parser and thus a document. obviously control the parsed message. 4.3 DoS: Quadratic Blowup 4 Denial-of-Service Even if the parser implements such a threshold, there are other ways to execute a DoS attack by using an at- DoS attacks target system resources, such as network, tack variant known as the Quadratic Blowup Attack [64]. storage, memory or CPU processing [29]. An efficient Here, a single entity is created containing a large string way to do this is to let the application process a "prob- (e.g. 10 MB). This entity is referenced multiple times lem", thereby allocating a huge amount of resources. At within the document in order to achieve a similar result the same time the attacker can generate and send the at- as before. Since less entity references are required than tack vector using much fewer resources. As a result, an in Listing 10, the threshold limitation can be bypassed. offered service is unavailable for benign users or at least 4.4 DoS with External General Entities responds significantly slower than normal.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-