Some Points Affecting Web Performance
Total Page:16
File Type:pdf, Size:1020Kb
SOME POINTS AFFECTING WEB PERFORMANCE Christophe Deleuze Laboratoire de conception et d’intégration des systèmes (LCIS) 50, rue Barthélémy de Laffemas, BP54 26902 Valence Cedex 09 France Keywords: Web performance, HTTP, persistent connections, embedded objects, web content. Abstract: Despite continuously growing network and server capacity, web performance is still often bad. Changes in version 1.1 of the hypertext transfer protocol (HTTP) have brought solutions to some of the performance problems, but daily experience shows that problems remain. In this work, we examine some characteristics of a number of popular web pages and try to get insights about what could be made to enhance overall web performance. We find that most web pages have very large numbers of small objects, something HTTP doesn’t handle very well, and that style-sheets were expected to avoid. Moreover, half of the sites we examined do not support HTTP enhancements such as persistent connections and request pipelining. Finally, most sites embed objects from multiple (and often at lot of) different hosts. 1 INTRODUCTION server model, raising scalability issues. Much work has been done on trying to address such issues: they The Internet is now a production network, used for include various content replication and load balanc- daily business. The advent of the world wide web ing techniques such as caching proxies, content distri- and its user friendly browsing capabilities has been bution networks (CDNs), and server farms (Deleuze, the major step in making the Internet mainstream. 2004). All these systems require a large and costly Because of its success and the way the system and infrastructure and fail to solve all the problems. Most protocol (HTTP – HyperText Transfer Protocol) were notably, the problems of consistency maintenance designed, major performance issues have quickly ap- among replicated servers and replication of dynamic peared. Things were getting bad enough for many content have proven very difficult to solve. people to call it the “world wide wait”. Researchers In the end, while more and more people are getting involved in the design of Internet protocols have even high speed Internet access, web performance is often called “abysmal” the web protocols (Jacobson, 1995). not as smooth as people would expect it to be. A re- Most major protocol issues were solved with ver- cent striking example is the congestion of the french sion 1.1 of HTTP, defined in (Fielding et al., 1999). government income tax web site for several weeks, Performance was greatly enhanced by the intro- forcing the government to postpone the deadline for duction of features such as persistent connections, e-citizens to declare their income (ZDNet, 2005). buffered transmission, and request pipelining (Krish- In order to try and find out how things could be namurthy et al., 1999). It is not clear, however, that enhanced, we have examined the main page of about these features are widely supported. 500 popular web sites, taken from Alexa’s “Global Despite some attempts to propose new modifica- Top 500” list (Alexa, 2005). This paper presents pre- tions to HTTP (W3C, 2001) it is now considered that liminary results of this work. We first focus on the its performance is good enough, and that the protocol content of the pages in Section 2. In Section 3 we will not change (Mogul, 1999). examine the support of two HTTP optional features Contrary to previous Internet protocols and us- designed to improve performance: persistent connec- ages that were essentially symmetric, arguably peer tions and request pipelining. Finally, in Section 4, we to peer,1 the web is based on an asymmetric client client server protocols such as FTP, whose traffic patterns 1This is obvious for mail and net news, but also true for were very symmetric. 242 Deleuze C. (2006). SOME POINTS AFFECTING WEB PERFORMANCE. In Proceedings of WEBIST 2006 - Second International Conference on Web Information Systems and Technologies - Internet Technology / Web Interface and Applications, pages 242-245 DOI: 10.5220/0001246102420245 Copyright c SciTePress SOME POINTS AFFECTING WEB PERFORMANCE consider the number of hosts that have to be contacted Figure 2 shows the mean object and picture size to download all embedded objects for a page. Con- for all the considered pages. Clearly, the mean ob- clusions and perspectives of future work are given in ject size is below a few kilobytes for most of the Section 5. pages. If we consider only the pictures (that are the vast majority of objects as we already seen) the mean size is even smaller, the median size of pictures for the whole set of pages being 2636 bytes. We must 2 WEB CONTENT note that an HTTP transaction must be performed for downloading each embedded object, and that each re- Probably the main reason why web performance isn’t sponse carry HTTP response headers along with the improving as network and server capacity grow is that object. The typical response header size in our down- web content is becoming increasingly rich. Figure 1 loads was around 300 bytes, i.e. not much less than shows the cumulative distribution function (CDF) of the objects’ sizes themselves. the number of embedded objects per page. We can see that half of the web pages have more than 25 em- bedded objects, while almost 10 % have more than 100! 20 pictures objects 1 0.9 15 0.8 0.7 10 0.6 0.5 mean size (kB) 0.4 5 0.3 Percentage of web pages 0.2 0 0.1 0 50 100 150 200 250 300 350 400 450 web pages 0 0 50 100 150 200 250 Nb of files Figure 2: Mean object and picture size. Figure 1: CDF of number of embedded objects. Table 1 shows the number of documents found and It is our opinion that such small embedded objects the total byte size for the most popular document significantly degrade web performance. It was ex- types. Pictures clearly constitute the most important pected that the advent of style-sheets, by providing type, both when considering number of files and total proper ways for web designers to control page layout, size. An interesting point to note is how few png pic- would dramatically increase the mean object size on tures there are. Globally, 77 % of files are pictures, the web (Nielsen and al., 1997). Apparently, this is while they amount for 54 % of bytes. still something that can be of some concern. The second most significant contribution to web Table 1: Document types data. pages weight seems to be formed by presentation in- type number size (kB) formation such as JavaScript files and style sheets image/gif 10648 23256 (text/css). To evaluate their weight, we divide image/jpeg 2903 22839 their size by the size of the whole page, excluding pic- application/ tures and flash files. Figure 3 shows the numbers (in x-javascript 1281 6632 increasing order) obtained for CSS files, JavaScript text/html 868 9416 text/plain 473 413 files, and both together, for the studied sites. We find text/css 452 3309 that this weight is above 25 % for about one third of application/ the sites, and above 50 % for at least 10 % of them. x-shockwave-flash 207 5836 image/png 86 192 To summarize this section, most web pages embed application/ a large number of small pictures, which can signifi- octet-stream 60 1300 cantly affect performance. Other kinds of files such image/x-icon 55 144 as style sheets and javascripts also have a significant text/javascript 19 112 weight, although much lower. 243 WEBIST 2006 - INTERNET TECHNOLOGY 1 CSS Note that it can not reorder the requests since HTTP 0.9 javascript doesn’t provide any way to explicitly associate an an- both 0.8 swer to a request. 0.7 We have tested the web sites to try and evaluate to 0.6 what extend these features are now in use. Our results 0.5 are shown in Table 2. We found that 244 hosts out 0.4 of the 494 main hosts for the considered web pages 0.3 do accept persistent connections. Thus, 50.6 % refuse 0.2 them. Only 219, i.e. 44.3 % accept the pipelining 0.1 of requests. These results are particularly significant 0 if we consider the large number of small embedded 0 50 100 150 200 250 300 350 400 450 500 web pages objects we found in Section 2. Figure 3: Weigth of CSS and JavaScript files. Table 2: Support of persistent connections and pipelining. number of hosts 3 SUPPORT OF HTTP FEATURES feature absolute % persistent connections 244 50.6 As we noted in the introduction, a major step in mak- request pipelining 219 44.3 ing HTTP more efficient has been the introduction of persistent connections and request pipelining. In the We’re currently analyzing these data. One possible first versions of HTTP, only one transaction could be explanation why persistent connections are so unpop- performed on a given TCP2 connection. When the ular is that server farm systems, such as web switches web became popular, web pages quickly started to (Deleuze, 2004) are easier to build if persistent con- embed lots of small objects, which implied opening nections are avoided. Server farms help in building and closing many connections for a single page down- powerful web servers. Since we have selected popu- load. Apart from unnecessary traffic in the network lar web sites, many of them may use such systems. and processing at the hosts, this implied significant delays for each transaction. These are due to the time necessary to open each fresh connection, and to the TCP “slow start” congestion control mechanism, that 4 MULTIPLE HOSTS prevents sending data at the full available rate in a just created connection (Padmanabhan and Mogul, 1994).