An Empirical Study of Real-World Webassembly Binaries Security, Languages, Use Cases

An Empirical Study of Real-World Webassembly Binaries Security, Languages, Use Cases

An Empirical Study of Real-World WebAssembly Binaries Security, Languages, Use Cases Aaron Hilbig Daniel Lehmann Michael Pradel [email protected] [email protected] [email protected] University of Stuttgart University of Stuttgart University of Stuttgart Germany Germany Germany ABSTRACT supported and available in 93% of all global browser installations 1 WebAssembly has emerged as a low-level language for the web and as of February 2021. Beyond client-side web applications, WebAs- beyond. Despite its popularity in different domains, little is known sembly is also running on Node.js and even stand-alone runtimes. about WebAssembly binaries that occur in the wild. This paper Despite its growing popularity, the WebAssembly ecosystem is presents a comprehensive empirical study of 8,461 unique WebAs- severely understudied. To date, little is know about how the lan- sembly binaries gathered from a wide range of sources, including guage is used, for what purposes, and how this affects the security source code repositories, package managers, and live websites. We of WebAssembly-based applications. In particular, we are interested study the security properties, source languages, and use cases of the in the following research questions: binaries and how they influence the security of the WebAssembly RQ1: Source languages and tools. WebAssembly is a compilation ecosystem. Our findings update some previously held assumptions target, and in principle any programming language can be compiled about real-world WebAssembly and highlight problems that call to it. What languages are actually compiled to WebAssembly, how for future research. For example, we show that vulnerabilities that much do they contribute to the overall population, and what tools propagate from insecure source languages potentially affect a wide are used to produce the binaries? Answering these questions is range of binaries (e.g., two thirds of the binaries are compiled from relevant for understanding the impact of issues that specific source memory unsafe languages, such as C and C++) and that 21% of all languages may have and for guiding future work toward source binaries import potentially dangerous APIs from their host envi- languages and toolchains prevalent in practice. ronment. We also show that cryptomining, which once accounted for the majority of all WebAssembly code, has been marginalized RQ2: Vulnerabilities propagated from insecure source languages. (less than 1% of all binaries found on the web) and gives way to a Recent work has shown that memory vulnerabilities in insecure diverse set of use cases. Finally, 29% of all binaries on the web are source languages, such as C and C++, may be exploited in Web- minified, calling for techniques to decompile and reverse engineer Assembly binaries, sometimes even more easily than in native bi- WebAssembly. Overall, our results show that WebAssembly has left naries [18]. How large is the attack surface offered by real-world its infancy and is growing up into a language that powers a diverse WebAssembly binaries compiled from insecure languages, e.g., in ecosystem, with new challenges and opportunities for security re- terms of dangerous APIs these binaries import from JavaScript or searchers and practitioners. Besides these insights, we also share in terms of vulnerable memory allocators they ship? Answering the dataset underlying our study, which is 58 times larger than the this question will increase our understanding of the threat posed largest previously reported benchmark. by vulnerabilities compiled to the web. RQ3: Cryptomining. Previous results show [24], and recent work CCS CONCEPTS assumes [15, 25, 34, 43], that WebAssembly is frequently used for • Security and privacy ! Software and application security. cryptojacking, i.e., cryptomining performed in the browser of an unsuspecting client. Is cryptomining still an important threat today? ACM Reference Format: Aaron Hilbig, Daniel Lehmann, and Michael Pradel. 2021. An Empirical RQ4: Use cases. As a general purpose language, WebAssembly Study of Real-World WebAssembly Binaries: Security, Languages, Use Cases. can serve many purposes in web applications and beyond. What In Proceedings of the Web Conference 2021 (WWW ’21), April 19–23, 2021, are the typical use cases of WebAssembly? Given that the language Ljubljana, Slovenia. ACM, New York, NY, USA, 13 pages. https://doi.org/10. is becoming more widely adopted, it is important to understand 1145/3442381.3450138 what its use cases are and how this affects the security of the web. 1 INTRODUCTION RQ5: Minification and names. The ability to understand WebAs- sembly binaries, e.g., for auditing third-party code or for reverse WebAssembly is a fast, compact, low-level byte code language origi- engineering malware, depends on whether binaries contain mean- nally intended for client-side execution in web browsers. It is widely ingful names for program elements, e.g., functions. Do real-world WebAssembly binaries contain meaningful names or are they ob- This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their fuscated through minification? personal and corporate Web sites with the appropriate attribution. WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Answering these and other questions requires a set of WebAssembly © 2021 IW3C2 (International World Wide Web Conference Committee), published binaries that is (i) representative for how WebAssembly is used in under Creative Commons CC-BY 4.0 License. ACM ISBN 978-1-4503-8312-7/21/04. https://doi.org/10.1145/3442381.3450138 1https://caniuse.com/?search=WebAssembly WWWWWW ’21, ’21, April April 19–23, 19–23, 2021, 2021, Ljubljana, Ljubljana, Slovenia Slovenia Aaron Aaron Hilbig, Hilbig, Daniel Daniel Lehmann, Lehmann, and and Michael Michael Pradel Pradel the wild and (ii) large enough to cover the diversity of real-world int increment(int x) (func (param i32) [...]// header with the wild and (ii) large enough to cover the diversity of real-world { (result i32) // type info WebAssemblyWebAssembly usage. usage. Currently, Currently, no no such such set set of of binaries binaries exists. exists. return x + 1; local.get 0 20 00// local.get0 TheThe closest closest existing existing work work is is by by Musch Musch et et al al.. [24] [24],, who who report report } i32.const 1 41 01// i32.const1 on a study of WebAssembly usage in the top one million websites. i32.add 6a// i32.add on a study of WebAssembly usage in the top one million websites. ) 0b// end WhileWhile inspiring, inspiring, their their study study falls falls short short in in two two respects. respects. First, First, it it has has beenbeen performed performed at at a a point point in in time time when when WebAssembly WebAssembly was was still still (a) C Source Code. (b) WebAssembly (c) WebAssembly text format. binary format. inin its its infancy, infancy, with with usage usage biased biased to to early early adopters, adopters, e.g., e.g., cryptomin- cryptomin- Figure 1: Example of a function compiled to WebAssembly. ers,ers, and and a a single single toolchain toolchain dominating dominating the the ecosystem. ecosystem. Since Since then, then, Figure 1: Example of a function compiled to WebAssembly. manymany changes changes have have happened, happened, including including higher higher browser browser adoption, adoption, alternative compilers that have become available, the shutdown of • 28.8% of all binaries on the web are minified, calling for future alternative compilers that have become available, the shutdown of of instructions that cover diverse use cases, including vi- Coinhive (a common cryptomining platform) [42], and the realiza- work on decompiling and reverse engineering WebAssembly, to Coinhive (a common cryptomining platform) [42], and the realiza- sualization, interactive shells for programming languages, tion that vulnerabilities in insecure source languages can also be ensure that security analysts can understand web applications tion that vulnerabilities in insecure source languages can also be media players, game engines, data compression, and natural exploited in WebAssembly [18]. Second, the methodology proposed despite the presence of low level components. exploited in WebAssembly [18]. Second, the methodology proposed language processing. by Musch et al. [24] focuses only on binaries found on the web, and by Musch et al. [24] focuses only on binaries found on the web, and Overall,• 28.8% ourof all findings binaries on show the web that are minified,WebAssembly calling is “growing for future up”, only on those that are executed when just visiting a website. By only on those that are executed when just visiting a website. By whichwork leads on to decompiling a larger and muchand reverse more diverse engineering ecosystem WebAssem- than in only looking into client-side web applications, WebAssembly on only looking into client-side web applications, WebAssembly on its earlybly, days. to ensure From a that security security perspective, analysts this can diversity understand has several web other platforms is disregarded, e.g., on Node.js, browser extensions, other platforms is disregarded, e.g., on Node.js, browser extensions, implications.applications First, despite the fact the that presence there of are low now level many components. legitimate andand stand-alone stand-alone WebAssembly WebAssembly runtime runtime engines. engines. applications, and proportionally much

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us