DNS Weighted Footprints for Web Browsing Analytics

Total Page:16

File Type:pdf, Size:1020Kb

DNS Weighted Footprints for Web Browsing Analytics DNS weighted footprints for web browsing analytics Jos´eLuis Garc´ıa-Dorado,Javier Ramos, Miguel Rodr´ıguez,and Javier Aracil High Performance Computing and Networking Research Group, Universidad Aut´onomade Madrid, Spain. NOTE: This is a version of an unedited manuscript that was accepted for publica- tion. Please, cite as: J.L. Garc´ıa-Dorado,J. Ramos, M. Rodr´ıguez,J. Aracil. DNS weighted footprints for web browsing analytics. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 111, 35-48, (2018). The final publication is available at: https://doi.org/10.1016/j.jnca.2018.03. 008 Abstract prints), we have measured that our proposal is able to identify visits and their durations with false and true positives rates between 2-9% and over 90%, The monetization of the large amount of data that respectively, at throughputs between 800,000 and ISPs have of their users is still in early stages. 1.4 million DNS packets per second in diverse sce- Specifically, the knowledge of the websites that spe- narios, thus proving both its refinement and ap- cific users or aggregates of users visit opens new plicability. Index Terms: Browsing analytics; opportunities of business, after the convenient san- Data monetization; DNS footprint; web-user pro- itization. However, the construction of accurate files; DNSPRINTS. DNS-based web-user profiles on large networks is a challenge not only because the requirements that capturing traffic entails, but also given the use of 1 Introduction DNS caches, the proliferation of botnets and the complexity of current websites (i.e., when a user The commercial exploitation of web browsing ana- visit a website a set of self-triggered DNS queries lytics is currently a fact, whereby the collaboration for banners, from both same company and third between both marketing and Internet communi- parties services, as well for some preloaded and ties has progressively strengthened after each busi- prefetching contents are in place). In this way, ness success. The information about users' prefer- we propose to count the intentional visits users ences that the different actors in the Internet arena make to websites by means of DNS weighted foot- gather can be used not only as inputs for tailored prints. Such novel approach consists of considering online-ads but for improved and novel mechanisms that a website was actively visited if an empirical- to monetize the rich set of available data [1, 2], estimated fraction of the DNS queries of both the even more with the advent of the Big Data era. In own website and the set of self-triggered websites paying attention to web access profiles, it becomes are found. This approach has been coded in a final apparent that the identification of a competitor at- system named DNSPRINTS. After its parameteri- tracting the interest among the customers of a given zation (i.e., balancing the importance of a website company is of paramount interest to react accord- in a footprint with respect to the total set of foot- ingly. Similarly, the interest in terms of visits of the 1 population of a given geographical area is a useful ated by actual users from a web browser, but input to plan the expansion for many retailers or from scripts that poll DNS servers with sin- to assess the impact of an advertising campaign. A gle queries searching for updates, information real-world example of this is Google AdWords ser- about the service or mail server addresses as vice [3], such that Google exploits keywords, statis- well as to perform Distributed Denial of Ser- tics of keywords, and search results of its users [4] vice (DDoS) attacks. and customizes ads according to their profiles. Nonetheless, the capacity of creating users' web- • The tangled web [10]: Almost any web includes profiles is not the exclusive preserve of web search a set of external contents (e.g., resources of engines as of today, but also others, such as ISPs, other webs of the same company or, simply, have the opportunities to exploit such business banners or ads) whose domain names have to model. Actually, in a richer fashion given that be resolved in order to fully load the requested search engines are only aware of the few first ac- web itself. This causes that additional DNS cesses users make to a website. After that, typi- traffic is generated without being explicitly re- cally, web names are retained in the browsing his- quested by users and makes users' web-profiles tory (or bookmarked), or simply memorized by the imprecise, especially for business exploitation. users. This way, search engines are no longer used • Websites' embedded content preload: as gateways to such sites. 1 Alternatively to client intrusive software and HTML5 recommendation defines special add-ons installed on clients' web browsers, such as websites' attributes such as preload that toolbars [5] or cookies [6], the ISP approach to this calls for an early load of resources not being issue is the apparently simple task of capturing part of either the main website loading or the and inspecting HTTP traffic. However, two con- rendering process. Such resources generate cerns arise. The unstoppable increase of HTTPS additional DNS queries. and the implicit difficulty of capturing (needless • Link prefetching: Popular browsers such as to say, inspecting) large volumes of HTTP traf- Google Chrome, Mozilla Firefox or Inter- fic potentially distributed in different geographical net Explorer perform link prefetching which areas. While the latter remains as an open re- generates non-explicitly requested extra traf- search area [7], the authors in [8, 9] pointed out fic. Prefetching can be performed at dif- that DNS can play a major role to reveal the traf- ferent levels. Nowadays almost any browser fic behind a HTTPS flow. Further, the authors pre-resolves domains present in every link of in [10, 11] successfully correlated traffic flows [12] a website. Moreover, some browsers may and temporally-close DNS queries, thus identify- not only resolve link addresses but prefetch ing IP addresses of users and the name of the ac- cacheable contents from websites referred by cessed hosts. Unfortunately, such approaches still such links, if possible. Even more, some require to collect all HTTP/HTTPS and DNS traf- browsers, such as Internet Explorer, perform fic to construct the flows to which relate the DNS image pre-rendering. Furthermore, websites protocol, and, importantly, other collateral issues may also indicate browsers to prefetch certain emerge: links, as defined formally in the HTML5 rec- • DNS Cache: Most of DNS resolvers and clients ommendations by means of special attributes implement a cache for DNS traffic where the of the link tag. In sum, all these approaches associated IP of a given domain name is tem- generate both HTTP/HTTPS and DNS extra porarily stored. As Time To Live (TTL) for traffic that impedes generating precise users' the cache entry can be long [13], chances are browsing profiles. that a point of presence monitoring traffic can- not see clients' DNS queries although they are In this light, we search for a mechanism that re- effectively visiting a web. lying on DNS traffic to detect every time a user ac- cesses a website is able to circumvent the problem • Automated clients, botnets and worms [14, 15, 16]: Often, DNS queries are not gener- 1https://www.w3.org/TR/html5/ 2 User requests a website with a browser DNS query to resolve root domain name DNS server resolves root domain name I HTTP request for the site HTTP server sends HTML file for the site Web browser identifies Figure 1: Sample website layout, with content that dependencies on the HTML DNS Queries for generates self-triggered DNS queries embedded on other domains several banners and external services DNS server resolves domains II III that some queries are hidden by caches, and, im- IV portantly, with the capacity to distinguish between HTTP requests for requested website visits and unintentional ones (let domains' content us refer to the latter as self-triggered webs or self- triggered traffic). To do so, we propose to explic- HTTP responses for itly exploit the tangled characteristic of websites domains' content and DNS preload/prefetch capabilities of current websites/browsers. Figure 1 features a sample layout for a website Web browser loads and displays the website with several embedded ads, banners and links to both external and same company services. After the main website is requested, while users wait for Figure 2: Flow graph describing the packet flow loading, a set of DNS queries/responses are gen- of DNS and HTTP traffic when a site is requested erated. Such behavior is illustrated in Figure 2, using a web browser where multiple resolution processes are in place. The set of different resolved domains that a given website triggers, defines the DNS footprint of such a website. This way, given a list of root domain websites, i.e., those websites of interest for business purposes, we construct their associated DNS foot- prints that will be used to search in traffic aggre- gates to determine the visits per user, potentially sanitized, or per aggregation in market segments and geographical areas. 3 In fact, being less restrictive with the footprint instance car manufacturers. To this end, the de- (i.e., accept a match although not all the websites partment provides a set of websites of interest for of a footprint are found) circumvents the cache the analysis, namely those of the relevant car man- problem while being more restrictive rules out self- ufacturers. Our purpose is detecting when a user triggered visits, in other words \false" resolutions has requested a certain website using a browser, out with respect to the actual website requested by the of such set of websites, only by inspecting the DNS user.
Recommended publications
  • Re-Architecting Web and Mobile Information Access for Emerging Regions
    Re-architecting Web and Mobile Information Access for Emerging Regions by Jay Chen A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Mathematics Courant Institute of Mathematical Sciences New York University September 2011 Professor Lakshminarayanan Subramanian c Jay Chen All Rights Reserved, 2011 Acknowledgments I would like to start by expressing my deepest gratitude to my advisor, Lakshminarayanan Sub- ramanian (or just “Lakshmi”). It was Lakshmi who set me on the path toward my eventual area of research. Lakshmi has always been generous with his time, and never short on ideas or en- thusiasm. Without Lakshmi’s courage to pursue the research that inspires him, I would not have found my own passion: to build systems that benefit people - as many people as much as possible by inventing ways to bring technology to people living outside of the privileged regions of the world. Contributors to this dissertation - This thesis is based on research that I performed over the past five years with many colleagues contributing directly to the work in this dissertation. Many people helped me along the way whose help I could not have done without. The RuralCafe user study would not have been possible without the help of Saleema Amershi and Aditya Dhananjay (Chapter 6.6). Our low bandwidth transport modeling and analysis (Chapter 3.1) was an effort largely attributable to Janardhan Iyengar and long discussions with Bryan Ford. Russell Power implemented the feature reduction algorithm for CIPs (Chapter 7.2.2) in his “spare time”. Our ELF deployments (Chapters 2.2 and 5.3) were only possible with help from David Hutchful.
    [Show full text]
  • Swarovsky: Optimizing Resource Loading for Mobile Web Browsing
    IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. XX, NO. XX, XXXX 201X 1 SWAROVsky: Optimizing Resource Loading for Mobile Web Browsing Xuanzhe Liu, Member, IEEE, Yun Ma, Xinyang Wang, Yunxin Liu Senior Member, IEEE, Tao Xie Senior Member, IEEE, and Gang Huang Senior Member, IEEE Abstract—Imperfect Web resource loading prevents mobile Web browsing from providing satisfactory user experience. In this article, we design and implement the SWAROVsky system to address three main issues of current inefficient Web resource loading: (1) on-demand and thus slow loading of sub-resources of webpages; (2) duplicated loading of resources with different URLs but the same content; and (3) redundant loading of the same resource due to improper cache configurations. SWAROVsky employs a dual-proxy architecture that comprises a remote cloud-side proxy and a local proxy on mobile devices. The remote proxy proactively loads webpages from their original Web servers and maintains a resource loading graph for every single webpage. Based on the graph, the remote proxy is capable of deciding which resources are “really” needed for the webpage and their loading orders, and thus can synchronize these needed resources with the local proxy of a client efficiently and timely. The local proxy also runs an intelligent and light-weight algorithm to identify resources with different URLs but the same content, and thus can avoid duplicated downloading of the same content via network. Our system can be used with existing Web browsers and Web servers, and does not break the normal semantics of a webpage. Evaluations with 50 websites show that on average our system can reduce the page load time by 43.1% and the network data transmission by 57.6%, while imposing marginal system overhead.
    [Show full text]
  • Master Thesis
    ABSTRACT Speeding Up Mobile Browsers without Infrastructure Support by Zhen Wang Mobile browsers are known to be slow. We characterize the performance of mobile browsers and find out that resource loading is the bottleneck. Leveraging an unprecedent- ed set of web usage data collected from 24 iPhone users continuously over one year, we examine the three fundamental, orthogonal approaches to improve resource loading with- out infrastructure support: caching, prefetching, and speculative loading, which is first proposed and studied in this work. Speculative loading predicts and speculatively loads the subresources needed to open a webpage once its URL is given. We show that while caching and prefetching are highly limited for mobile browsing, speculative loading can be significantly more effective. Empirically, we show that client-only solutions can im- prove the browser speed by 1.4 seconds on average. We also report the design, realiza- tion, and evaluation of speculative loading in a WebKit-based browser called Tempo. On average, Tempo can reduce browser delay by 1 second (~20%). Acknowledgements I would like to thank my advisor, Professor Lin Zhong, for his guidance and encour- agement during my study and research at Rice University. He has not only given me in- sightful suggestions, but also helped me to develop the right way to do research. I am also grateful to work with Mansoor Chishtie from Texas Instruments, who sup- ports my research and gives me inspiring advice. I would like to thank Professor Dan Wallach and Professor T. S. Eugene Ng for serv- ing as my thesis committee. Their comments and feedback to this work are of great value.
    [Show full text]
  • Critical CSS Rules Decreasing Time to first Render by Inlining CSS Rules for Over-The-Fold Elements
    Critical CSS Rules Decreasing time to first render by inlining CSS rules for over-the-fold elements Gorjan Jovanovski [email protected] July, 2016, 41 pages Supervisor: Dr. Vadim Zaytsev Host organisation: The Next Web, http://thenextweb.com Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl Contents Abstract 3 1 Introduction 4 1.1 Problem statement...................................... 4 1.2 Research questions...................................... 5 1.3 Use cases ........................................... 5 1.3.1 Caching........................................ 6 1.3.2 Content Delivery Networks ............................. 6 1.3.3 Resolution mismatch................................. 6 1.4 Solution outline........................................ 6 2 Background 8 2.1 Cascading Style Sheets.................................... 8 2.1.1 Media queries..................................... 8 2.2 Web page rendering ..................................... 9 2.2.1 Time to first render ................................. 9 2.3 Critical path CSS....................................... 10 3 Related work 12 3.1 Academical Research..................................... 12 3.2 CSS prefetch and preload .................................. 12 3.2.1 Prefetch........................................ 12 3.2.2 Preload ........................................ 12 3.3 Content first ......................................... 13 3.4 Manual extraction .....................................
    [Show full text]
  • Responsive Design Cover
    Responsive Design cover .com Responsive Design Essentials [email protected] Table of Contents Responsive Design cover 1 Responsive Web & Mobile 1 Goal of Responsive Design Most Popular Mobile & Tablet Resolutions 2015 2 The Universal Page Responsive Design Starts with HTML & CSS Technologies Necessary for Responsive Design Responsive Considerations & Guidelines 4 Do Don’t Your Responsive Canvas | The Browser Window Setting Up Constraints A Dao of Web Design 6 Fixed 960 Pixel Grid System (before Responsive) 7 12-Column Grid 16-Column Grid 12-Column Layout 7 Content Could Span Several Columns 16-Column Layout 7 Content Could Span Several Columns Planning Responsive | Wireframes 8 Responsive Sketch Sheets Electronic Wireframe & Planning Tools HTML5 9 The Need for New HTML Elements WHATWG | New Standards Body 10 Workshop on Web Applications and Compound Documents HTML5 a Living Standard See What CSS3 & HTML5 Can Do 11 Cascading Style Sheets (CSS) 14 CSS Box Model Three Methods for Applying CSS 15 Inline Embedded External CSS Vocabulary 17 Responsive Web Design ©2016 — Kelly McCathran i Table of Contents Types of CSS Selectors 18 Element ID Class 3-Digit Hexadecimal Values 19 CSS3 20 Rounded CSS Corners Rounded Corners for Multiple Browsers 21 CSS Border-radius Prefix Rounded Corners for IE9 Media Query Intro 22 Flexible or Fluid Grid = The Formula 23 Flexible Margins or Flexible Padding CSS Box Model Review Reset CSS 24 The Viewport 25 Layout Viewport 26 Mobile Viewport or Visual Viewport 26 Viewport Control 27 Viewport Meta Tag Recommended
    [Show full text]
  • Analysis and Design of Prefetching Framework for Mozilla Firefox
    I.J. Information Engineering and Electronic Business, 2015, 5, 7-12 Published Online September 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2015.05.02 Analysis and Design of Prefetching Framework for Mozilla Firefox Neha Sharma Northern India Engineering College, Delhi, India E-mail: [email protected] Sanjay Kumar Dubey Amity University, Noida, U.P. 201303, India E-mail: [email protected] Abstract—The presence of number of web sites has web pages client is using. Now, the question is how to increased the user’s attraction towards web objects. This select the pages. For this, different prefetching techniques tremendous use came up with the future requests are used. These prefetching techniques are browser prediction depending upon the current and past access dependent. The browser with better prefetching scheme, behaviour. Use of Internet has boomed up a lot since the is more efficient. last decade. This use also came with the heavy load on Web requests or better say, web surfing patterns of the the internet. In today’s world, speed plays a significant user are almost same for the specific intervals. Even there role and hence the speed augmentation is one of the is some similarity between set of users [2]. They might biggest issues. For this, web latency reduction by access the same pattern simultaneously. prefetching is one of the good ideas. For the same, web Prefetching is the technique, where user’s next prefetching is performed, where user’s next expected expected requests are loaded previously in the web cache, requests are prefetched in the web cache of the web by applying some predictive methods.
    [Show full text]
  • A Latency-Determining/User Directed Firefox Browser Extension Philip Avery Mein
    A Latency-Determining/User Directed Firefox Browser Extension Philip Avery Mein Submitted to the graduate degree program Electrical Engineering & Computer Science and The Graduate Faculty of the University of Kansas School of Engineering in partial fulfillment of the requirements for the degree of Master of Science Thesis Committee: Dr. James P.G. Sterbenz: Chairperson Dr. Bo Luo Dr. Gary J. Minden Date Defended c 2012Philip Avery Mein The Thesis Committee for Philip Avery Mein certifies that this is the approved version of the following thesis: A Latency-Determining/User Directed Firefox Browser Extension Committee: Chairperson Date Approved i Abstract As the World Wide Web continues to evolve as the preferred choice for infor- mation access it is critical that its utility to the user remains. Latency as a result of network congestion, bandwidth availability, server processing delays, embed- ded objects, and transmission delays and errors can impact the utility of the web browser application. To improve the overall user experience the application needs to not only provide feedback to the end user about the latency of links that are available but to also provide them controls in the retrieval of the web content. This thesis presents a background and related work relating to latency and web optimization techniques to reduce this latency and then introduce an improvement to the \latency aware" Mozilla Firefox extension which was originally developed by Sterbenz et. al., in 2002. This these describes the architecture and prototype implementation, followed with an analysis of its effectiveness to predict latency and future work. Key Terms- high speed, mobile, wireless, weakly-connected, information ac- cess, web browsing, caching, firefox-addon, latency ii For my family, whose love and support throughout the course of this thesis meant sacrifice on their part.
    [Show full text]
  • Measuring and Mitigating Web Performance Bottlenecks in Broadband Access Networks
    Measuring and Mitigating Web Performance Bottlenecks in Broadband Access Networks Srikanth Sundaresan Nick Feamster Georgia Tech Georgia Tech [email protected] [email protected] Renata Teixeira Nazanin Magharei CNRS & UPMC Cisco Systems [email protected] [email protected] ABSTRACT As downstream throughput continues to increase, one might expect We measure Web performance bottlenecks in home broadband ac- the Web to get faster at home, as well. Meanwhile, Internet ser- cess networks and evaluate ways to mitigate these bottlenecks with vice providers and application providers are increasingly cognizant caching within home networks. We first measure Web performance of the importance of reducing Web page load times; even seem- bottlenecks to nine popular Web sites from more than 5,000 broad- ingly small differences in latency can introduce significant effects band access networks and demonstrate that when the downstream on usability (and revenue). The Bing search engine experiences re- throughput of the access link exceeds about 16 Mbits/s, latency is duced revenue of 1.2% with just a 500-millisecond delay [53], and the main bottleneck for Web page load time. Next, we use a router- a 400-millisecond delay resulted in a 0.74% decrease in searches based Web measurement tool, Mirage, to deconstruct Web page on the Google search engine [14]. Forrester research found that load time into its constituent components (DNS lookup, TCP con- most users expected online shopping sites to load in two seconds nection setup, object download) and show that simple latency opti- or fewer [40]. Content providers struggle to mitigate any network mizations can yield significant improvements in overall page load performance bottleneck that can slow down Web page loads in ac- times.
    [Show full text]
  • Web Tracking SNET2 Seminar Paper - Summer Term 2011
    Web Tracking SNET2 Seminar Paper - Summer Term 2011 Niklas Schmucker¨ Berlin University of Technology Email: [email protected] Abstract—This paper gives an introduction to web tracking II. WEB ANALYTICS and provides an overview over relevant technologies currently The web analytics field is concerned with the measurement found in use on the Internet. We examine motivations for web tracking and discuss issues related to privacy and security. and interpretation of web site usage data. A variety of infor- Furthermore, we present and compare countermeasures intended mation is potentially of interest to web site operators, such to protect end users. We end with a discussion of possible future as: trends and developments in the field of user tracking. The number of visitors over time, which can further be • divided into returning and new visitors. This includes how I. INTRODUCTION long individuals stay on the site and which pages they Web tracking technologies are used to collect, store and look at (also see Section V-A). connect user web browsing behavior records. The information How visitors find out about the web site. Usually three • gained thereby is of interest to various parties. Major motiva- sources of traffic are differentiated: Direct traffic (the user tions for web tracking are: enters the address into the address bar), traffic referred Advertisement companies actively collect information from other web sites (see Section V-C2), as well as search • about users and accumulate it in user profiles. These pro- engine traffic. For the latter, even the relevant search files are then used to tailor individualized advertisements. keywords can be extracted.
    [Show full text]
  • How Far Can Client-Only Solutions Go for Mobile Browser Speed? 1Zhen Wang, 2Felix Xiaozhu Lin, 1,2Lin Zhong, and 3Mansoor Chishtie 1Dept
    How Far Can Client-Only Solutions Go for Mobile Browser Speed? 1Zhen Wang, 2Felix Xiaozhu Lin, 1,2Lin Zhong, and 3Mansoor Chishtie 1Dept. of ECE and 2Dept. of CS, Rice University, Houston, TX, and 3Texas Instruments, Dallas, TX 1,2{zhen.wang, xzl, lzhong}@rice.edu and [email protected] ABSTRACT the failure of Amazon Web Services’ cloud-computing infrastruc- Mobile browser is known to be slow because of the bottleneck in ture [17] took many websites down. Finally, solutions based on resource loading. Client-only solutions to improve resource load- proxy support violate end-to-end security, which is crucial to se- ing are attractive because they are immediately deployable, scala- cure websites. ble, and secure. We present the first publicly known treatment of Not surprisingly, solutions that do not rely on infrastructure sup- client-only solutions to understand how much they can improve port, or client-only solutions, are particularly attractive because mobile browser speed without infrastructure support. Leveraging they are immediately deployable, scalable, and secure. While an unprecedented set of web usage data collected from 24 iPhone client-only solutions are likely to be less effective than those leve- users continuously over one year, we examine the three funda- raging infrastructure supports, it has been an open question how mental, orthogonal approaches a client-only solution can take: effective client-only solutions can be for mobile browsers. The caching, prefetching, and speculative loading. Speculative load- challenge to answering this question has been the lack of data ing, as is firstly proposed and studied in this work, predicts and regarding the browsing behavior of mobile users.
    [Show full text]
  • Network Agile Preference-Based Prefetching for Mobile Devices
    Network Agile Preference-Based Prefetching for Mobile Devices JunZe Han1, Xiang-Yang Li1;2, Taeho Jung1, Jumin Zhao3, Zenghua Zhao4 1Department of Computer Science, Illinois Institute of Technology, USA 2 School of Software and TNLIST, Tsinghua University 3College of Information Engineering, Taiyuan University of Technology, China 4Department of Computer Engineering, Tianjin University, China Email: [email protected], [email protected], [email protected], [email protected] experience, in this work we present a seamless transparent solution that automatically prefetches contents and switches Abstract—For mobile devices, communication via cellular networks consumes more energy than via WiFi networks, and the connection between cellular and WiFi networks. To make suffers an expensive limited data plan. On the other hand, as the content prefetching work for mobile devices, a number of the coverage and the density of WiFI networks are smaller than challenges must be addressed. In general, we need to know those of the cellular networks, users cannot purely rely on WiFi when to prefetch, and what contents to prefetch such that the to access the Internet. In this work we present a behavior-aware user’s experience is not deteriorated and the overall energy and preference-based approach to prefetch news webpages for cost and monetary cost is reduced. In this work we propose a the user to visit in the near future, by exploiting the WiFi prefetching approach that is able to predict what webpages the network connections to reduce the energy and
    [Show full text]
  • Master Thesis
    RICE UNIVERSITY Speeding Up Mobile Browsers without Infrastructure Support by Zhen Wang A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE Master of Science APPROVED, THESIS COMMITTEE Lin Zhong, Chair Associate Professor Electrical and Computer Engineering Dan Wallach Associate Professor Computer Science T. S. Eugene Ng Associate Professor Computer Science HOUSTON, TEXAS APRIL 2012 ABSTRACT Speeding Up Mobile Browsers without Infrastructure Support by Zhen Wang Mobile browsers are known to be slow. We characterize the performance of mobile browsers and find out that resource loading is the bottleneck. Leveraging an unprecedent- ed set of web usage data collected from 24 iPhone users continuously over one year, we examine the three fundamental, orthogonal approaches to improve resource loading with- out infrastructure support: caching, prefetching, and speculative loading, which is first proposed and studied in this work. Speculative loading predicts and speculatively loads the subresources needed to open a webpage once its URL is given. We show that while caching and prefetching are highly limited for mobile browsing, speculative loading can be significantly more effective. Empirically, we show that client-only solutions can im- prove the browser speed by 1.4 seconds on average. We also report the design, realiza- tion, and evaluation of speculative loading in a WebKit-based browser called Tempo. On average, Tempo can reduce browser delay by 1 second (~20%). Acknowledgements I would like to thank my advisor, Professor Lin Zhong, for his guidance and encour- agement during my study and research at Rice University. He has not only given me in- sightful suggestions, but also helped me to develop the right way to do research.
    [Show full text]