Speeding up Web Page Loads with Shandian
Total Page:16
File Type:pdf, Size:1020Kb
Speeding up Web Page Loads with Shandian Xiao Sophia Wang and Arvind Krishnamurthy, University of Washington; David Wetherall, University of Washington and Google https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/wang This paper is included in the Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16). March 16–18, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-29-4 Open access to the Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) is sponsored by USENIX. Speeding up Web Page Loads with Shandian Xiao Sophia Wang∗, Arvind Krishnamurthy∗, and David Wetherall∗† Abstract pages use JavaScript libraries such as jQuery [21] or in- Web page loads are slow due to intrinsic inefficiencies clude large customized JavaScript code in order to sup- in the page load process. Our study shows that the in- port a high degree of user interactivity. The result is that a efficiencies are attributable not only to the contents and large portion of the code conveyed to a browser is never structure of the Web pages (e.g., three-fourths of the CSS used on a page or is only used when a user triggers an resources are not used during the initial page load) but action. The second inefficiency stems from how the dif- also the way that pages are loaded (e.g., 15% of page load ferent stages of the page load process are scheduled to times are spent waiting for parsing-blocking resources to ensure semantic correctness in the presence of concur- be loaded). rent access to shared resources. This results in limited To address these inefficiencies, this paper presents overlap between computation and network transfer, thus Shandian (which means lightening in Chinese) that re- increasing PLT. The third and related inefficiency is that structures the page load process to speed up page loads. many resources included in a Web page are often loaded Shandian exercises control over what portions of the sequentially due to the complex dependencies in the page page gets communicated and in what order so that the ini- load process, and this results in sub-optimal use of the tial page load is optimized. Unlike previous techniques, network and increased PLTs. Shandian works on demand without requiring a train- Reducing PLT is hard given these inefficiencies. Hu- ing period, is compatible with existing latency-reducing man inspection is not ideal since there is no guaran- techniques (e.g., caching and CDNs), supports security tee that Web developers adhere to the ever-changing features that enforce same-origin policies, and does not best practices prescribed by experts [35]. Thus, it is impose additional privacy risks. Our evaluations show widely believed that the inefficiencies should be trans- that Shandian reduces page load times by more than parently mitigated by automated tools and techniques. half for both mobile phones and desktops while incur- Many previously proposed techniques focus on improv- ring modest overheads to data usage. ing the network transfer times. For example, techniques such as DNS pre-resolution [22], TCP pre-connect [19], 1 Introduction and TCP fast open [28] reduce latencies, and the SPDY Web pages have become the de-facto standard for bil- protocol improves network efficiency at the application lions of users to get access to the Internet. The end- layer [32]. Other techniques lower computation costs by to-end Web page load time (PLT) has consequently be- either exploiting parallelism [25, 12] or adding software come a key metric as it affects user experience and thus architecture support [41, 13]. While these techniques are is associated with business revenues [6, 4]. Reports sug- moderately effective at speeding up the individual activ- gest that Shopzilla increased its revenue 12% by reduc- ities corresponding to a page load, they have had limited ing PLT from 6 seconds to 1.2 seconds and that Amazon impact in reducing overall PLT, because they still com- found every 100ms of increase in PLT cost them 1% in municate redundant code, stall in the presence of con- sales [27]. flicting operations, and are constrained by the limited Despite its importance and various attempts to im- parallelism in the page load process. prove PLT, the end-to-end PLT for most pages is still The key and yet unresolved issue with page loads is a few seconds on desktops and more than ten seconds that the page load process is suboptimally prioritized as on mobile devices [9, 39]. This is because modern Web to what portions of a page get loaded and when. In pages are often complex. Previous studies show that this paper, we advocate an approach that precisely pri- Web pages contain more than fifty Web objects on av- oritizes resources that are needed during the initial page erage [9], and exhibit complex inter-dependencies that load (load-time state) and those that are needed only af- result in inefficient utilization of network and compute ter a page is loaded (post-load state). Unlike SPDY (or resources [39]. In our own experiments, we have iden- HTTP/2) server push and Klotski [10], which only pri- tified three types of inefficiencies associated with Web oritize network transfers at the granularity of Web ob- pages and the page load process. The first inefficiency jects, our approach prioritizes both network transfers and comes from the content size of Web pages. Many Web computation at a fine granularity (e.g., HTML elements ∗University of Washington and CSS rules), directly tackling the three inefficiencies †Google Inc. listed above. 1 USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 109 A key challenge addressed by our approach is to en- CDNs and security features such as the enforcement sure that we do not break static Web objects (e.g., exter- of same-origin policies. The resulting system is thus nal JavaScript and CSS), because caching and CDNs are both efficient and practical. commonly used to improve PLT. We make design deci- We evaluate Shandian on the top 100 Alexa • sions to send unmodified static contents in the post-load Web pages which have been heavily optimized by state thereby incurring the cost of sending a small por- other technologies. Our evaluations still show that tion of redundant content that is already included in the Shandian reduces PLT by more than half with a rea- load-time state. sonably powerful proxy server on a variety of mobile To deploy this approach transparently to Web pages, settings with varied RTT, bandwidth, CPU power, and we choose a split-browser architecture and fulfill part of memory. For example, Shandian reduces PLT by the page load on a proxy server, which can be either 50% to 60% on a mobile phone with 1GHz CPU and part of the web service itself (e.g., reverse proxies) or 1GB memory by exploiting the compute power of a third-party proxy servers (e.g., Amazon EC2). A proxy proxy server with a multicore 2.4GHz server. Unlike server is set up to preload a Web page up to a time, e.g., many techniques that only improve network or com- when the load event is fired; the preload is expected to be putation, Shandian shows consistent benefits on a fast since it exploits greater compute power at the proxy variety of settings. We also find that the amount of server and since all the resources that would normally re- load-time state is decreased while the total amount of sult in blocking transfers are locally available. When mi- traffic is increased moderately by 1%. grating state (logics that determine a Web page and the In the rest of this paper, we first review the background stage of the page load process) to the client, the proxy of Web pages and the page load process by identify- server prioritizes state needed for the initial page load ing the inefficiencies associated with page loads ( 2). § over state that will be used later, so as to convey critical Next, we present the design of Shandian ( 3) and § information as fast as possible. After all the state is fully its implementations and deployment ( 4). We evaluate § migrated, the user can interact with the page normally as Shandian in 5, discuss in 6, review related work in § § if the page were loaded directly without using a proxy 7, and conclude in 8. § § server. 2 An analysis of page load inefficiencies Note that Opera mini [26] and Amazon Silk [3] also embrace a split-browser architecture but differ in terms This section reviews the background on the Web page load process ( 2.1), identifies three inefficiencies in the of how the rendering process is split between the client § and the proxy server. Their client-side browsers only load process, and quantifies them using a measurement study ( 2.2). handle display, and thus JavaScript evaluation is han- § dled by the proxy server. This process depends on the 2.1 Background: Web page loads network, which is both slow and unreliable in mobile settings [30], and encourages the proxy server to be Web page compositions. A Web page consists of sev- placed near users. We have a fully functioning client-side eral Web objects that can be HTML, JavaScript, CSS, im- browser and encourages the proxy server to be placed ages, and other media such as videos and Flash. HTML near front-end Web servers (e.g., edge POPs) for the is the language (also the root object) that describes a Web most performance gains. page; it uses a markup to define a set of tree-structured el- Our contributions are as follows: ements such as headings, paragraphs, tables, and inputs.