Caching Strategies for Load Reduction on High Traffic Web Applications
Total Page:16
File Type:pdf, Size:1020Kb
DIPLOMARBEIT Caching Strategies for Load Reduction on High Traffic Web Applications ausgef¨uhrt am Institut f¨urComputersprachen der Technischen Universit¨atWien unter Anleitung von Ao.Univ.Prof. Dipl.-Ing. Dr. Franz Puntigam durch Alexander Kirk Stolberggasse 12/12, 1050 Wien May 9, 2005 Datum Unterschrift 2 3 Abstract In this thesis we discuss the problem of web applications that have to work under heavy load of a high number of visitors. We evaluate the application Bandnews.org as an example and tune it using various caching strategies. They include caching by a proxy server, a compiler cache, database caching using a query cache and application based caching using Smarty. This work shows that gain in speed is possible if methods are applied care- fully. We compare and combine caching strategies to come to a stage where every page is generated in reasonable time even under high load. Kurzfassung In dieser Diplomarbeit wird das Problem von Web Applikationen behandelt, die unter hoher Last und einer großen Zahl von Benutzern arbeiten m¨ussen. Die Applikation Bandnews.org wird als Beispiel untersucht und mittels ver- schiedener Caching Strategien beschleunigt. Dies beinh¨alt das Cachen mit- tels einem Proxy Server, einem Compiler Cache, Datenbank Caching mittels Query Cache und applikationsbasiertes Caching mittels Smarty. Diese Arbeit zeigt, dass Geschwindigkeitssteigerungen m¨oglich sind, wenn die Methoden umsichtig eingesetzt werden. Die Caching Strategien werden miteinander verglichen und kombiniert, um eine Stufe zu erreichen in der jede Seite in vertretbarer Zeit geladen wird, sogar unter hoher Last. 4 Contents Contents 5 1 Introduction 9 1.1 Motivation ................................ 9 1.2 Method .................................. 10 1.3 Expected Results ............................. 11 1.4 Outline of the Thesis ........................... 11 2 Terms 13 2.1 Caching .................................. 13 2.1.1 Invalidation ............................ 14 2.1.2 Privacy .............................. 14 2.2 Load .................................... 15 2.2.1 Using the uptime command ................... 15 2.2.2 Using the top command ..................... 15 2.2.3 Load Averages .......................... 16 I Environment 17 3 Application 19 3.1 Bandnews.org ............................... 19 3.1.1 Technology ............................ 20 3.1.2 Page Structure .......................... 21 3.1.3 myBandnews ........................... 21 3.1.4 BandnewsCMS .......................... 22 4 Tools 23 4.1 Apache .................................. 23 4.1.1 History .............................. 23 4.1.2 Features .............................. 24 5 6 CONTENTS 4.1.3 Alternatives ............................ 25 4.2 PHP .................................... 26 4.2.1 History .............................. 26 4.2.2 Language Basics and Structure ................. 26 4.2.3 Integration with the web server ................. 28 4.2.4 Additional Libraries ....................... 29 4.2.5 Alternatives ............................ 29 4.3 MySQL .................................. 31 4.3.1 PEAR::DB ............................ 31 4.3.2 Query Cache ........................... 31 4.3.3 Alternatives ............................ 32 4.4 Smarty ................................... 33 4.4.1 Template Basics ......................... 34 4.4.2 Alternatives ............................ 36 4.5 Squid ................................... 38 4.5.1 Use cases ............................. 38 4.5.2 HTTP Acceleration ....................... 38 4.5.3 Alternatives ............................ 39 4.6 Advanced PHP Cache .......................... 40 4.6.1 Concept .............................. 41 4.6.2 Alternatives ............................ 42 4.7 Advanced PHP Debugger ........................ 43 4.7.1 Debugging ............................. 43 4.7.2 Profiling .............................. 43 4.7.3 Alternatives ............................ 43 4.8 ApacheBench ab ............................. 45 4.8.1 Alternatives ............................ 45 II Tuning the Application 47 5 Evaluation 49 5.1 Goal definition .............................. 49 5.2 Processing a Request ........................... 51 5.3 Possible Hooking Points ......................... 53 5.3.1 Client Request .......................... 53 5.3.2 PHP Module ........................... 54 5.3.3 Database ............................. 54 5.3.4 Application ............................ 55 5.4 Bandnews.org ............................... 56 CONTENTS 7 5.4.1 Skeleton page ........................... 56 5.4.2 Index page index.php ...................... 57 5.4.3 Search page search.php ..................... 58 5.4.4 Links page links.php ...................... 58 5.5 Testing .................................. 59 5.5.1 Preparations ........................... 60 5.5.2 Testing environment ....................... 61 6 Squid 63 6.1 Considerations .............................. 63 6.1.1 Caching of whole pages ..................... 64 6.1.2 Programmer’s view ........................ 66 6.1.3 Expected Results ......................... 68 6.2 Preparation ................................ 69 6.2.1 Configuring Apache ....................... 69 6.2.2 Configuring Squid ........................ 69 6.3 Results ................................... 71 6.3.1 skeleton-t.php ......................... 71 6.3.2 pres-skel-t.php ........................ 72 6.3.3 index.php ............................ 74 6.4 Conclusions for Squid .......................... 74 7 APC 75 7.1 Considerations .............................. 75 7.1.1 Compiler Cache .......................... 75 7.1.2 Code Optimization ........................ 76 7.1.3 Outputting Data ......................... 77 7.1.4 Programmer’s View ....................... 77 7.2 Preparation ................................ 78 7.2.1 Output Buffering ......................... 78 7.3 Results ................................... 79 7.3.1 Results for output testing .................... 82 7.4 Conclusions for APC ........................... 84 8 MySQL 85 8.1 Considerations .............................. 85 8.1.1 MySQL Query Cache ...................... 85 8.1.2 Persistent Connections ...................... 86 8.1.3 Query Tuning ........................... 86 8.2 Preparation ................................ 87 8 CONTENTS 8.3 Results ................................... 89 8.3.1 Query Cache ........................... 89 8.3.2 Persistent Connection ...................... 92 8.4 Conclusions for MySQL ......................... 94 9 Smarty Caching 95 9.1 Considerations .............................. 95 9.1.1 Caching Page Parts ....................... 95 9.1.2 Database Usage .......................... 96 9.2 Preparation ................................ 98 9.3 Results ................................... 100 9.4 Conclusions for Smarty Caching ..................... 103 10 Conclusions 105 10.1 Further Work ............................... 106 A File Sources 109 A.1 Benchmark Script ............................ 109 A.2 Patch Files ................................ 112 B List of Figures 119 C List of Tables 121 D List of Listings 123 References 125 Chapter 1 Introduction 1.1 Motivation As the Internet resp. the World Wide Web (WWW) is gaining more and more popularity, servers have to handle more requests accordingly. The more people (or simply clients) request resources (in this case files) from web servers, the faster servers have to accept and process the requests. To cope with these requirements programmers as well as system administrators must take countermeasures. From the very beginning of the WWW the requirements for servers have not only changed from the view of traffic, but also from the type of content they deliver to the client. Initially static pages had to be served, today – in 2005 – content is usually taken from a database, and dynamically generated pages are to be transferred. This development takes the main source of load away from the operating system responsible for reading the files from the hard disk or another type of memory and shifts it to the program that dynamically generates the page. Also computer hardware has evolved. This makes it possible to have web pages generated the way they are today. Generally speaking, servers are capable of serving most pages in quite a reasonable amount of time. This is true as long as only a small number of visitors request pages to be generated. The larger the number of clients, the more pages have to be generated simultaneously. Multi-tasking enables servers to do so, but CPU capacity is 9 10 CHAPTER 1. INTRODUCTION limited. If it was only for system administrators, they would add more hardware power (for instance clustering servers, load balancing). Often this can be done only to a certain extent, mainly due to financial but also for logistical reasons. From a programmer’s view, however, algorithms can be optimized (consider an algorithm in O(n2) on a fast computer which can easily be overtaken by a slower one running an O(n)) but also by caching techniques. The basis for this diploma thesis will be the analysis of caching strategies for this scenario. They will be used to speed up an existing application. The combination of various methods will be tested and benchmarked to reach a stage at which the application runs at reasonable speed even under high load. 1.2 Method We will explore the topic of this thesis using an existing web site (Band- news.org) as an example to which the caching strategies are applied. The site consists of an underlying structure which is