Towards Understanding Modern Web Traffic Sunghwan Ihm† Vivek S. Pai Department of Computer Science Department of Computer Science Princeton University Princeton University
[email protected] [email protected] ABSTRACT popularity of social networking, file hosting, and video streaming As Web sites move from relatively static displays of simple pages sites [29]. These changes and growth of Web traffic are expected to rich media applications with heavy client-side interaction, the to continue, not only as the Web becomes a de facto front-end for nature of the resulting Web traffic changes as well. Understanding many emerging cloud-based services [47], but also as applications this change is necessary in order to improve response time, evalu- getmigratedtotheWeb[34]. ate caching effectiveness, and design intermediary systems, such as Understanding these changes is important for overall system de- firewalls, security analyzers, and reporting/management systems. sign. For example, analyzing end-user browsing behavior can lead Unfortunately, we have little understanding of the underlying na- to a Web traffic model, which in turn can be used to generate a syn- ture of today’s Web traffic. thetic workload for benchmarking or simulation. In addition, ana- In this paper, we analyze five years (2006-2010) of real Web traf- lyzing the redundancy and effectiveness of caching could shape the fic from a globally-distributed proxy system, which captures the design of Web servers, proxies, and browsers to improve response browsing behavior of over 70,000 daily users from 187 countries. times. In particular, since content-based caching approaches [28, Using this data set, we examine major changes in Web traffic char- 49, 50] are a promising alternative to traditional HTTP object-based acteristics that occurred during this period.