Flywheel: Google's Data Compression Proxy for the Mobile

Total Page:16

File Type:pdf, Size:1020Kb

Flywheel: Google's Data Compression Proxy for the Mobile Flywheel: Google’s Data Compression Proxy for the Mobile Web Victor Agababov∗ Michael Buettner Victor Chudnovsky Mark Cogan Ben Greenstein Shane McDaniel Michael Piatek Colin Scott† Matt Welsh Bolian Yin Google, Inc. †UC Berkeley Abstract Although the number of sites that are tuned for mo- bile devices is growing, there is still a huge opportunity Mobile devices are increasingly the dominant Internet to save users money by compressing web content via a access technology. Nevertheless, high costs, data caps, proxy. This paper describes Flywheel, a proxy service and throttling are a source of widespread frustration, and integrated into Chrome for Android and iOS that com- a significant barrier to adoption in emerging markets. presses proxied web content by 58% on average (50% This paper presents Flywheel, an HTTP proxy service median across users). While proxy optimization is an that extends the life of mobile data plans by compress- old idea [15, 22, 29, 39, 41] and the optimizations we ap- ing responses in-flight between origin servers and client ply are known, we have gained insights by studying a browsers. Flywheel is integrated with the Chrome web modern workload for a service deployed at scale. We browser and reduces the size of proxied web pages by describe Flywheel from an operational and design per- 50% for a median user. We report measurement results spective, backed by usage data gained from several years from millions of users as well as experience gained dur- of deployment and millions of active users. ing three years of operating and evolving the production service at Google. Flywheel’s data reduction benefits rely on coopera- tion between the browser and server infrastructure at 1 Introduction Google. For example, Chrome has built-in support for the SPDY [11] protocol and the WebP [12] image com- This paper describes our experience building and running pression format. Both improve efficiency, yet are rarely a mobile web proxy for millions of users supporting bil- used by website operators because they require cumber- lions of requests per day. In the process of developing some, browser-specific configuration. Rather than wait- and deploying this system, we gained a deep understand- ing for all sites to adopt best practices, Flywheel applies ing of modern mobile web traffic, the challenges of de- optimizations automatically and universally, transcoding livering good performance, and a range of policy issues content on-the-fly at Google servers as it is served. that informed our design. We focus on mobile devices because they are fast be- This paper makes two key contributions. First, our ex- coming the dominant mode of Internet access. Trends are perience with Flywheel has given us a deep understand- clear: in many markets around the world, mobile traffic ing of the performance issues with proxying the mod- volume already exceeds desktop [23], and double-digit ern mobile web. Although proxy optimization delivers growth rates are typical [32]. clear-cut benefits for data reduction, its impact on la- tency is mixed. Measurements of Flywheel’s overall per- Despite these trends, web content is still predomi- formance demonstrate the expected result: compression nantly designed for desktop browsers, and as such is in- improves latency. In practice however we find that Fly- efficient for mobile users. This situation is made worse wheel’s impact on latency varies significantly depending by the high cost of mobile data. In developed markets, on the user population and metric of interest. For exam- data usage caps are a persistent nuisance, requiring users ple, we find that Flywheel decreases load time of large to track and manage consumption to avoid throttling or pages but increases load time for small pages. overage fees. In emerging markets, data access is of- ten priced per-byte at prohibitive cost, consuming up to Our second contribution is a detailed account of the 25% of a user’s total income [18]. In the face of these many design tradeoffs and measurement findings that we costs, supporting the continued growth of the mobile In- encountered in the process of developing and deploy- ternet and mobile web browsing in particular is our pri- ing Flywheel. While the idea of an optimizing proxy mary motivation. is conceptually simple, our design has evolved contin- uously in response to deployment experience. For ex- ∗Authors are listed in alphabetical order. ample, we find that middleboxes within mobile carri- ers are widespread, and often modify HTTP headers in ways that break na¨ıve proxied connections. While use of HTTPS and SPDY would prevent tampering, always en- crypting traffic to the proxy is at odds with features such as parental controls enforced by mobile carriers. Perhaps Google Datacenter HTTP Site unsurprisingly, addressing these tussles consumes signif- Figure 1: Flywheel sits between devices and origins, au- icant engineering effort. We report on the incidence and tomatically optimizing HTTP page loads. variety of these tussles, and map them to a clearer picture that most content providers still do not employ these ser- of mobile web operation with the hope that future sys- vices. tem designs will be informed by the practical concerns More recent optimizations such as WebP [12] and we have encountered. As far as we know, we are the first SPDY [11] have been available for years yet have very to publish a discussion of these tradeoffs. low adoption rates. We find that 0.8% of images on the 2 Background web are encoded in the WebP format, and only 0.9% of sites support SPDY [49]. We built Flywheel in response to the practical stumbling Users should not have to wait for sites to catch up to blocks of today’s mobile web. Ideally, Flywheel would best practices. Modern browsers such as Chrome are up- be unnecessary. Mobile data would be cheap, and con- dated as often as every six weeks, providing a constant tent providers would be quick to adopt new technologies. stream of new opportunities for optimization that are dif- Neither is true today. ficult for web developers to track. Moreover, as mobile Mobile Internet usage is large and growing rapidly. devices proliferate, the complexity of optimizing sites The massive growth of mobile Internet traffic has cre- to conform to the latest platforms (e.g. high-resolution ated a tremendous opportunity for automatic optimiza- tablets requiring higher image quality) is a daunting task tion. In Asia and Africa, 38% of web page views are for all but the most committed site owners. performed on mobile devices as of May 2014, a year- In sum, it is not surprising that most site owners do over-year increase exceeding 10% [36]. In North Amer- not take advantage of all browser- and device-specific ica, mobile page loads are 19% of total traffic volume optimizations. Just as we do not expect programmers to with 8% growth yearly. In February 2014, research firm manually unroll loops, we should not expect site owners comScore reported that time spent using the Internet on to remember to apply an ever-expanding set of optimiza- mobile devices exceeded desktop PCs for the first time tions to their sites. We need an optimizing compiler for in the United States [23]. These trends match our expe- the web—a service that automatically applies optimiza- rience at Google. Mobile is increasingly dominant. tions appropriate for a given platform. Growth in emerging markets is hampered by cost. Emerging markets are growing faster than developed 3 Design & Implementation markets. Year-over-year growth in mobile subscriptions This section describes Flywheel’s design and implemen- is 26% in developing countries compared to 11.5% in tation. The high-level design (depicted in Figure1) is developed countries. In Africa, growth exceeds 40% an- conceptually simple: Flywheel is an optimizing proxy nually [32]. Despite surging popularity, the high cost of service. Chrome sends HTTP requests to Flywheel mobile access encumbers usage. One survey of 17 coun- servers running in Google datacenters. These proxy tries in sub-Saharan Africa reports that mobile phone servers fetch, optimize, and serve origin responses. An spending was 10-26% of individual income in the lower- example optimization is transcoding a large, lossless 75% income bracket [18]. PNG image into a small, lossy WebP. The remainder of Site operators are slow to adopt new technologies. this section describes our goals, the flow of requests and Adapting websites for mobile involves manual and of- responses, and the optimizations applied in transit. We ten complex optimizations, and most sites are poorly conclude the section with a discussion of fault tolerance. equipped to make even simple changes. For example, 3.1 Goals & Constraints measurements of Flywheel’s workload show that 42% of HTML bytes on the web that would benefit from com- Flywheel’s primary goal is to reduce mobile data usage pression are uncompressed, despite GZip being univer- for web traffic. To achieve this goal we must address the sally supported in modern web browsers [13]. This is in practical requirements of integrating with the Chrome part because GZip is not enabled by default on most web browser, used by hundreds of millions of people. servers, yet only a single-line change to the server con- Opt-in. Recognizing that many users are sensitive to the figuration is needed. While hosting-providers and CDNs privacy issues of proxying web content through Google’s deal with such configuration issues on the behalf of con- servers, we choose to keep the Flywheel proxy off by de- tent providers, the pervasive lack of GZip usage indicates fault. Users must explicitly enable the service. Proxy HTTP URLs only. Flywheel applies only to quests are multiplexed. Flywheel in turn translates these HTTP URLs.
Recommended publications
  • HTTP Cookie - Wikipedia, the Free Encyclopedia 14/05/2014
    HTTP cookie - Wikipedia, the free encyclopedia 14/05/2014 Create account Log in Article Talk Read Edit View history Search HTTP cookie From Wikipedia, the free encyclopedia Navigation A cookie, also known as an HTTP cookie, web cookie, or browser HTTP Main page cookie, is a small piece of data sent from a website and stored in a Persistence · Compression · HTTPS · Contents user's web browser while the user is browsing that website. Every time Request methods Featured content the user loads the website, the browser sends the cookie back to the OPTIONS · GET · HEAD · POST · PUT · Current events server to notify the website of the user's previous activity.[1] Cookies DELETE · TRACE · CONNECT · PATCH · Random article Donate to Wikipedia were designed to be a reliable mechanism for websites to remember Header fields Wikimedia Shop stateful information (such as items in a shopping cart) or to record the Cookie · ETag · Location · HTTP referer · DNT user's browsing activity (including clicking particular buttons, logging in, · X-Forwarded-For · Interaction or recording which pages were visited by the user as far back as months Status codes or years ago). 301 Moved Permanently · 302 Found · Help 303 See Other · 403 Forbidden · About Wikipedia Although cookies cannot carry viruses, and cannot install malware on 404 Not Found · [2] Community portal the host computer, tracking cookies and especially third-party v · t · e · Recent changes tracking cookies are commonly used as ways to compile long-term Contact page records of individuals' browsing histories—a potential privacy concern that prompted European[3] and U.S.
    [Show full text]
  • The Art, Science, and Engineering of Fuzzing: a Survey
    1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature.
    [Show full text]
  • Freedom: Engineering a State-Of-The-Art DOM Fuzzer
    FreeDom: Engineering a State-of-the-Art DOM Fuzzer Wen Xu Soyeon Park Taesoo Kim Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology [email protected] [email protected] [email protected] ABSTRACT ACM Reference Format: The DOM engine of a web browser is a popular attack surface and Wen Xu, Soyeon Park, and Taesoo Kim. 2020. FreeDom: Engineering a State- has been thoroughly fuzzed during its development. A common of-the-Art DOM Fuzzer. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ’20), November 9–13, 2020, approach adopted by the latest DOM fuzzers is to generate new Virtual Event, USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10. inputs based on context-free grammars. However, such a generative 1145/3372297.3423340 approach fails to capture the data dependencies in the inputs of a DOM engine, namely, HTML documents. Meanwhile, it is unclear whether or not coverage-guided mutation, which is well-known to 1 INTRODUCTION be effective in fuzzing numerous software, still remains to beeffec- A DOM (Document Object Model) engine is a core component of tive against DOM engines. Worse yet, existing DOM fuzzers cannot every modern web browser, which is responsible for displaying adopt a coverage-guided approach because they are unable to fully HTML documents in an interactive window on an end-user device. support HTML mutation and suffer from low browser throughput. Considering its giant code base and extraordinary complexity, a To scientifically understand the effectiveness and limitations of DOM engine has always been one of the largest bug sources in a web the two approaches, we propose FreeDom, a full-fledged cluster- browser.
    [Show full text]
  • RSA Adaptive Authentication
    RSA Adaptive Authentication (Hosted) Data Gathering Techniques Guide Contact Information Go to the RSA corporate web site for regional Customer Support telephone and fax numbers: www.rsa.com Trademarks RSA, the RSA Logo and EMC are either registered trademarks or trademarks of EMC Corporation in the United States and/or other countries. All other trademarks used herein are the property of their respective owners. For a list of RSA trademarks, go to www.rsa.com/legal/trademarks_list.pdf. License agreement This software and the associated documentation are proprietary and confidential to EMC, are furnished under license, and may be used and copied only in accordance with the terms of such license and with the inclusion of the copyright notice below. This software and the documentation, and any copies thereof, may not be provided or otherwise made available to any other person. No title to or ownership of the software or documentation or any intellectual property rights thereto is hereby transferred. Any unauthorized use or reproduction of this software and the documentation may be subject to civil and/or criminal liability. This software is subject to change without notice and should not be construed as a commitment by EMC. Note on encryption technologies This product may contain encryption technology. Many countries prohibit or restrict the use, import, or export of encryption technologies, and current use, import, and export regulations should be followed when using, importing or exporting this product. Distribution Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC believes the information in this publication is accurate as of its publication date.
    [Show full text]
  • Appendix a the Ten Commandments for Websites
    Appendix A The Ten Commandments for Websites Welcome to the appendixes! At this stage in your learning, you should have all the basic skills you require to build a high-quality website with insightful consideration given to aspects such as accessibility, search engine optimization, usability, and all the other concepts that web designers and developers think about on a daily basis. Hopefully with all the different elements covered in this book, you now have a solid understanding as to what goes into building a website (much more than code!). The main thing you should take from this book is that you don’t need to be an expert at everything but ensuring that you take the time to notice what’s out there and deciding what will best help your site are among the most important elements of the process. As you leave this book and go on to updating your website over time and perhaps learning new skills, always remember to be brave, take risks (through trial and error), and never feel that things are getting too hard. If you choose to learn skills that were only briefly mentioned in this book, like scripting, or to get involved in using content management systems and web software, go at a pace that you feel comfortable with. With that in mind, let’s go over the 10 most important messages I would personally recommend. After that, I’ll give you some useful resources like important websites for people learning to create for the Internet and handy software. Advice is something many professional designers and developers give out in spades after learning some harsh lessons from what their own bitter experiences.
    [Show full text]
  • Skyfire: Data-Driven Seed Generation for Fuzzing
    Skyfire: Data-Driven Seed Generation for Fuzzing Junjie Wang, Bihuan Chen†, Lei Wei, and Yang Liu Nanyang Technological University, Singapore {wang1043, bhchen, l.wei, yangliu}@ntu.edu.sg †Corresponding Author Abstract—Programs that take highly-structured files as inputs Syntax Semantic normally process inputs in stages: syntax parsing, semantic check- Features Rules ing, and application execution. Deep bugs are often hidden in the <?xml version="1.0" application execution stage, and it is non-trivial to automatically encoding="utf- pass pass pass 8"?><xsl:stylesheet version="1.0" Syntax Semantic Application xmlns:xsl="http://www.w3 .org/1999/XSL/Transform" generate test inputs to trigger them. Mutation-based fuzzing gen- ><xsl:output xsl:use- √ attribute- Parsing Checking Execution erates test inputs by modifying well-formed seed inputs randomly sets=""/></xsl:stylesheet> Parsing Semantic or heuristically. Most inputs are rejected at the early syntax pars- Inputs Crashes ing stage. Differently, generation-based fuzzing generates inputs Errors Violations from a specification (e.g., grammar). They can quickly carry the ! ! X fuzzing beyond the syntax parsing stage. However, most inputs fail to pass the semantic checking (e.g., violating semantic rules), Fig. 1: Stages of Processing Highly-Structured Inputs which restricts their capability of discovering deep bugs. In this paper, we propose a novel data-driven seed generation approach, named Skyfire, which leverages the knowledge in the analysis [8, 9] that identifies those interesting bytes to mutate, vast amount of existing samples to generate well-distributed seed symbolic execution [10, 11, 12] that relies on constraint solving inputs for fuzzing programs that process highly-structured inputs.
    [Show full text]
  • 0789747189.Pdf
    Mark Bell 800 East 96th Street, Indianapolis, Indiana 46240 Build a Website for Free Associate Publisher Copyright © 2011 by Pearson Education Greg Wiegand All rights reserved. No part of this book shall be Acquisitions Editor reproduced, stored in a retrieval system, or transmit- Laura Norman ted by any means, electronic, mechanical, photo- copying, recording, or otherwise, without written Development Editor permission from the publisher. No patent liability is Lora Baughey assumed with respect to the use of the information contained herein. Although every precaution has Managing Editor been taken in the preparation of this book, the Kristy Hart publisher and author assume no responsibility for Senior Project Editor errors or omissions. Nor is any liability assumed for Betsy Harris damages resulting from the use of the information contained herein. Copy Editor ISBN-13: 978-0-7897-4718-1 Karen A. Gill ISBN-10: 0-7897-4718-9 Indexer The Library of Congress Cataloging-in-Publication Erika Millen data is on file. Proofreader Williams Woods Publishing Services Technical Editor Christian Kenyeres Publishing Coordinator Cindy Teeters Book Designer Anne Jones Compositor Nonie Ratcliff Trademarks All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Que Publishing cannot attest to the accuracy of this infor- mation. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Warning and Disclaimer Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information provided is on an “as is” basis.
    [Show full text]
  • European Technology Media & Telecommunications Monitor
    European Technology Media & Telecommunications Monitor Market and Industry Update H1 2013 Piper Jaffray European TMT Team: Eric Sanschagrin Managing Director Head of European TMT [email protected] +44 (0) 207 796 8420 Jessica Harneyford Associate [email protected] +44 (0) 207 796 8416 Peter Shin Analyst [email protected] +44 (0) 207 796 8444 Julie Wright Executive Assistant [email protected] +44 (0) 207 796 8427 TECHNOLOGY, MEDIA & TELECOMMUNICATIONS MONITOR Market and Industry Update Selected Piper Jaffray H1 2013 TMT Transactions TMT Investment Banking Transactions Date: June 2013 $47,500,000 Client: IPtronics A/S Transaction: Mellanox Technologies, Ltd. signed a definitive agreement to acquire IPtronics A/S from Creandum AB, Sunstone Capital A/S and others for $47.5 million in cash. Pursuant to the Has Been Acquired By transaction, IPtronics’ current location in Roskilde, Denmark will serve as Mellanox’s first research and development centre in Europe and IPtronics A/S will operate as a wholly-owned indirect subsidiary of Mellanox Technologies, Ltd. Client Description: Mellanox Technologies Ltd. is a leading supplier of end-to-end InfiniBand and June 2013 Ethernet interconnect solutions and services for servers and storage. PJC Role: Piper Jaffray acted as exclusive financial advisor to IPtronics A/S. Date: May 2013 $46,000,000 Client: inContact, Inc. (NasdaqCM: SAAS) Transaction: inContact closed a $46.0 million follow-on offering of 6,396,389 shares of common stock, priced at $7.15 per share. Client Description: inContact, Inc. provides cloud contact center software solutions. PJC Role: Piper Jaffray acted as bookrunner for the offering.
    [Show full text]
  • Towards Automated Generation of Exploitation Primitives for Web
    Towards Automated Generation of Exploitation Primitives for Web Browsers Behrad Garmany Martin Stoffel Robert Gawlik Ruhr-Universität Bochum Ruhr-Universität Bochum Ruhr-Universität Bochum [email protected] [email protected] [email protected] Philipp Koppe Tim Blazytko Thorsten Holz Ruhr-Universität Bochum Ruhr-Universität Bochum Ruhr-Universität Bochum [email protected] [email protected] [email protected] ABSTRACT ACM Reference Format: The growing dependence on software and the increasing complexity Behrad Garmany, Martin Stoffel, Robert Gawlik, Philipp Koppe, Tim Blazytko, of such systems builds and feeds the attack surface for exploitable and Thorsten Holz. 2018. Towards Automated Generation of Exploitation Primitives for Web Browsers. In 2018 Annual Computer Security Applications vulnerabilities. Security researchers put up a lot of effort to de- Conference (ACSAC ’18), December 3–7, 2018, San Juan, PR, USA. ACM, New velop exploits and analyze existing exploits with the goal of staying York, NY, USA, 13 pages. https://doi.org/10.1145/3274694.3274723 ahead of the state-of-the-art in attacks and defenses. The urge for automated systems that operate at scale, speed and efficiency is 1 INTRODUCTION therefore undeniable. Given their complexity and large user base, Software vulnerabilities pose a severe threat in practice as they web browsers pose an attractive target. Due to various mitigation are the root cause behind many attacks we observe on the In- strategies, the exploitation of a browser vulnerability became a ternet on a daily basis. A few years ago, attackers shifted away time consuming, multi-step task: creating a working exploit even from server-side vulnerabilities to client-side vulnerabilities.
    [Show full text]
  • How Far Can Client-Only Solutions Go for Mobile Browser Speed?
    How Far Can Client-Only Solutions Go for Mobile Browser Speed? Technical Report TR1215-2011, Rice University and Texas Instruments 1Zhen Wang, 2Felix Xiaozhu Lin, 1,2Lin Zhong, and 3Mansoor Chishtie 1Dept. of ECE and 2Dept of CS, Rice University, Houston, TX 77005 3Texas Instruments, Dallas, TX ABSTRACT The technical goal of this work is to answer this question, with the help of an unprecedented dataset of web browsing data conti- Mobile browser is known to be slow because of the bottleneck in nuously collected from 24 iPhone users over one year, or LiveLab resource loading. Client-only solutions to improve resource load- traces [14]. In achieving our goal, we make four contributions. ing are attractive because they are immediately deployable, scala- Firstly, we study browsing behavior of smartphone users and the ble, and secure. We present the first publicly known treatment of web pages visited by them. We find that subresources needed for client-only solutions to understand how much they can improve rendering a web page can be much more predictable than which mobile browser speed without infrastructure support. Leveraging webpage a user will visit because subresources have much higher an unprecedented set of web usage data collected from 24 iPhone revisit rate and a lot of them are shared by webpages from the users continuously over one year, we examine the three funda- same site. mental, orthogonal approaches a client-only solution can take: caching, prefetching, and speculative loading, which is first pro- Secondly, we quantitatively evaluate two popular client-only ap- posed and studied in this work.
    [Show full text]
  • Web Standards Web Standards: Mastering HTML5, CSS3, and XML Gives You a Deep Understand- Ing of How Web Standards Can Be Applied to Improve Your Website
    BOOKS FOR PROFESSIONALS BY PROFESSIONALS® Sikos, Ph.D. RELATED Web Standards Web Standards: Mastering HTML5, CSS3, and XML gives you a deep understand- ing of how web standards can be applied to improve your website. You will also find solutions to some of the most common website problems. You will learn how to create fully standards-compliant websites and provide search engine-optimized Web documents with faster download times, accurate rendering, lower development costs, and easy maintenance. Web Standards: Mastering HTML5, CSS3, and XML describes how you can make the most of web standards, through technology discussions as well as practical sam- ple code. As a web developer, you’ll have seen problems with inconsistent appearance and behavior of the same site in different browsers. Web standards can and should be used to completely eliminate these problems. With Web Standards, you’ll learn how to: • Hand code valid markup, styles, and news feeds • Provide meaningful semantics and machine-readable metadata • Restrict markup to semantics and provide reliable layout • Achieve full standards compliance Web standardization is not a sacrifice! By using this book, we can create and maintain a better, well-formed Web for everyone. CSS3, and XML CSS3, Mastering HTML5, US $49.99 Shelve in Web Development/General User level: Intermediate–Advanced SOURCE CODE ONLINE www.apress.com For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Download from Wow! eBook <www.wowebook.com> Contents at a Glance About the Author................................................................................................
    [Show full text]
  • One Engine to Fuzz 'Em All: Generic Language Processor Testing With
    One Engine to Fuzz ’em All: Generic Language Processor Testing with Semantic Validation Yongheng Cheny?, Rui Zhong?, Hong Hu, Hangfan Zhang, Yupeng Yangz, Dinghao Wu and Wenke Leey yGeorgia Institute of Technology, Pennsylvania State University, zUESTC Abstract— Language processors, such as compilers and inter- correct test cases; advanced fuzzers [42, 44] adopt higher- preters, are indispensable in building modern software. Errors level mutation in the abstract syntax tree (AST) or the in language processors can lead to severe consequences, like intermediate representation (IR) to preserve the input structures. incorrect functionalities or even malicious attacks. However, it is not trivial to automatically test language processors to find Alternatively, generation-based fuzzers leverage a precise model bugs. Existing testing methods (or fuzzers) either fail to generate to describe the input structure [1, 2, 51], and thus can produce high-quality (i.e., semantically correct) test cases, or only support syntax-correct test cases from scratch. To further improve the limited programming languages. semantic correctness, recent fuzzers adopt highly specialized In this paper, we propose POLYGLOT, a generic fuzzing analyses for specific languages [53, 74, 78]. framework that generates high-quality test cases for exploring processors of different programming languages. To achieve the However, a fuzzer will lose its generic applicability when generic applicability, POLYGLOT neutralizes the difference in it is highly customized for one specific language. Users syntax and semantics of programming languages with a uniform cannot easily utilize the specialized fuzzer to test a different intermediate representation (IR). To improve the language va- programming language, but have to develop another one lidity, POLYGLOT performs constrained mutation and semantic from scratch.
    [Show full text]