Inoculating Regular Expressions for Survivability
Total Page:16
File Type:pdf, Size:1020Kb
Insecure Context Switching: Inoculating regular expressions for survivability Will Drewry and Tavis Ormandy Google, Inc. {wad,taviso}@google.com Abstract complicating factors. Survivability engineering [43] fo- cuses on these factors and the of application risk mitiga- For most computer end–users, web browsers and Internet tion tactics at all levels, especially technical ones, to aid services act as the providers and protectors of their per- in fortifying software as it ages [19] [20] [49] [52]. Even sonal information, from bank accounts to personal cor- though there has been extensive research into survivabil- respondence. These systems are critical to users’ con- ity engineering, very few of the guidelines [49] [52], or tinued lifestyles but often show no evidence of surviv- even other known good security practices, appear com- ability [43], or robustness against present and future at- monplace in industry software, and the general state of tacks. Software defects, considered the largest risk to software security may still be as Blakley described nearly survivability [43], are quite prevalent in consumer prod- twelve years ago [7]. A basic review of software de- ucts and Web service software components [9]. Recent fect occurrence rates, as a contraindicator of survivabil- widespread security issues [17] [16] serve to emphasize ity, reinforces the bleak outlook on the current environ- this fact and show a lack investment in survivability en- ment [9] [69] [68]. gineering practices [19] [20] [49] [52] that may have mit- Survivability engineering, however, is largely a igated the risk. counterintuitive concept in normal software develop- Common software components that comprise indus- ment [52], especially when it comes to code reuse. Code try software, commercial or free, were authored and de- reuse is encouraged [46] and common, as evidenced by ployed with functional isolation in mind. Despite orig- the large number of commercial and open source soft- inal intent, many of these components are migrating in ware libraries and their extensive use across the software to Internet–connected systems. The context switch from industry. Software libraries, or any reusable components, functional isolation to extreme connectivity changes are often designed with specific usage contexts in mind. the threat environment of these components dramati- These uses may be well–defined using any number of cally [7] [52]. Most software that has undergone this sort formal, such as UML [71], or informal techniques, but of insecure context switch has received very little secu- rarely are the complete intentions effectively commu- rity attention. This paper briefly surveys recent examples nicated to the end–developer, especially as these com- of these sorts of context switches. In particular, we focus ponents age. There is no established standard for de- on the survivability and inoculation [29] of regular ex- scribing the expected threat environment of a software pression engine implementations in connected environ- library, and even if there was, there is no mechanism to ments. Through the course of this research, a number of enforce the use of such a standard. While domain spe- critical vulnerabilities were uncovered that traverse op- cific repositories aid in context–safe reuse [27], there are erating systems and applications including Adobe Flash, not many comprehensive repositories which match func- Apple Safari, Perl, GnuPG, and ICU. tional domains with security domains [60]. This leaves little option for developers except to rely on program- 1 Introduction ming language–supplied features and software compo- nents selected by their advertised functionality. This be- The existence of software defects are attributed to many havior, when not carefully considered, leads to insecure causes, ranging from expected human error to poor de- context switching. veloper education [40] [41] [98]. In the past decade, This paper contributes a thorough examination of the the evolution of software use and analysis were added as survivability and inoculation [29] of a prime example of insecure context switching: modern regular expres- a scenario. Flash suffered from an exploitable vulnera- sion [42] use. Regular expressions are an idiomatic tech- bility, but as it was not compiled in a fashion compat- nique for text search and manipulation with a presence ible with Windows Vista’s ASLR functionality, no ran- in nearly every modern programming language, from C domization occured which allowed for reliable exploita- to Ruby. Their widespread acceptance has lead regular tion on Vista. This is not an isolated incident; many expressions to be exposed unquestioningly in potentially systems and software do not support or use these fea- hostile, connected environments. This would be unsur- tures [65] [10], and when they do, they are still not im- prising, and unimportant, if regular expression engine pervious to all known attack techniques, much less future implementation was trivial or if regular expression en- developments in software security research. gines were implemented with connected environments in mind. However, the creation of robust implementations 1.1.1 Regular Expression History is still an open topic of research [47] [21], and regular ex- pressions were implemented, initially, in functional iso- Since the introduction of regular expressions to the lation as a text editor feature [15]. QED text editor [15] and the Unix environment in the 1970s [45], regular expression use, as a means of suc- 1.1 Background cinctly expressing acceptable grammars, has increased continuously. They have even garnered an accepted place Survivability of specific software components is not the in many IETF standards. A simple review of IETF stan- only measurement of overall system survivability. While dards, or RFCs, shows a continued reliance on regular a number of serious vulnerabilities are discussed later in expressions. This is likely due, in large part, to the re- the paper, the view should not be overly bleak. Many lease of a public domain, regular expression engine by legacy, and even just poorly implemented, software com- Henry Spencer in 1986 [72] [73]. ponents exist and are in common use, but there are inoc- Beginning around 1990, regular expressions moved ulation, or risk–mitigating tactics, already at play at a from a command–line environment and text editor fea- lower–level. ture to an expected piece of functionality in most Most modern operating systems deploy a number of programming languages. Starting with Tcl [75], risk–mitigating mechanisms to add to the overall robust- EMACS/Lisp [48], and Perl [94], this trend [8] contin- ness of the system. Many tactics are used explicitly to ues into modernity with PHP, Python, Ruby, Javascript, prevent software defects from becoming fully exploitable Actionscript, C#, Java, and so on. In spite of the contin- vulnerabilities. Some of these tactics are address– ued inclusion of regular expressions as core functional- space layout randomization [70], non–executable mem- ity, many of the engines derive their code, at least tan- ory pages [61] [97], and capability-enforcement sys- gentially, from Spencer’s original release [74]. This an- tems [50] [3]. In addition, compilers, like GCC [28], cestry leads to heavily patched implementations blind provide the addition of stack canaries [22] and position to newer, or even parallel, advances in this area of re- independent executable support [39]. The combined ef- search [47] [11]. Even for engines that have emancipated forts of base system libraries, language compilers, and themselves from this legacy, regular expression imple- the operating system layers add to the overall survivabil- mentation is not a trivial task. ity. While none of these techniques prevent exploitation entirely, they do increase the complexity of creating reli- 1.1.2 Regular Expression Security able exploits. In addition to these base system extensions, there is Past research into the security of regular expression en- work which provides risk–mitigation at the application gine implementation has been limited to specific features layer. Systrace [63], presented in 2003, provides a mech- in specific engines or to the overall execution–time ro- anism for intrusion prevention aiding the overall sur- bustness issues which affect most full–featured, non– vivability of the system. Unfortunately, system call– deterministic finite automata–based engines. In a re- based policy enforcement systems, like systrace, are not cent article, Cox provides a detailed discussion of the supplied by default with most operating system distri- exponential execution–time issues across multiple well– butions. There are also other software fault isolation, known implementation [11]. Additionally, these issues or software sandboxing, systems in existence [102], but are well–defined as the risks of regular expression en- their deployment is similarly scarce. gines [47] [21] [31] and of exponential algorithms in gen- The lack of sandbox integration and the unenforced eral [12]. nature of many of the base system mitigation tactics Exponential search algorithms aside, the most promi- leaves single components liable for full system surviv- nent example of regular expression security research is ability. Dowd’s recent whitepaper [17], describes such the vulnerability [78] found in PCRE’s [33] validation of 2 quantifier values. The vulnerability received substantial or virtual machine, systrace, or other approach, there are attention due to its use