Scalable Web Hosting Service
Total Page:16
File Type:pdf, Size:1020Kb
Scalable Web Hosting Service Ludmila Cherkasova Computer Systems and Technology Laboratory HP Laboratories Palo Alto HPL-1999-52(R.1) October, 1999 E-mail:{cherkasova}@hpl.hp.com Web hosting Web hosting is an infrastructure service that allows to design , service, web integrate, operate and maintain all of the infrastructure server farm, web components required to run web-based applications. It includes server cluster, Web server farms, network access, data staging tools and load balancing, security firewalls. Web server farms are used in a Web hosting scalability, infrastructure as a way to create scalable and highly available performance solutions. One of the main problems in web server farm analysis, management is content management and load balancing. SpecWeb In this paper, we analyze several hardware/software solutions on the market and demonstrate their scalability problems. We outline a new scalable solution FLEX for design and management of an efficient Web hosting service. This solution can be applied to a Web hosting service implemented on different architecture platforms such as web server farms with replicated disk content, web server clusters having access to a shared file system or multi-computer systems using a global (shared) file system. A preliminary performance analysis provides a comparison of the current solutions and FLEX using a synthetic workload generator based on SpecWeb'96 benchmark. FLEX outperforms current solutions 2-7 times. Internal Accession Date Only Ó Copyright Hewlett-Packard Company 1999 Contents 1 Intro duction 3 2 Web Hosting Market and Future Forecast 7 3 Variety of Web Sites and Their Growth 10 4 Typical Hardware and Software Solutions for Web Hosting 13 5 Performance Pitfalls in Scalability of the Current Solutions 16 6 Virtual Servers and Multiple Domains 17 7 New Scalable Web Hosting Solution: FLEX 18 8 Preliminary Performance Analysis of FLEX 19 9 Conclusion 25 10 References 26 2 1 Intro duction As the p opularityof the Web grows, an increasing numb er of businesses are wishing to seize the p otential market opp ortunities that it o ers. The number of public web sites grows exp onentially, and the business users account for the ma jority of that growth. A successful business regard the Web not as a network service only, but as a market: buyers and sellers, comp etitors and partners, pro ducts and services - the core elements of the web market. As the Web increasingly b ecomes a core element of business strategy, so the task of hosting web content has b ecome mission critical. Few companies, however, have the resources, money and exp ertise to build their web site entirely in-house. Forrester Research Inc. suggests that the cost of setting up an internal web site can b e $221,000 in the rst year. For this reason, many businesses cho ose to outsource their Web hosting to Internet service providers and some equipmentvendors, which according to Forrester can slash costs by80%. Although such cost savings are an imp ortant factor, it is not the primary reason for outsourcing Web hosting. Forrester's research shows that the needs for knowledge, increased security, b etter service and supp ort are all seen as more imp ortant in the decision to outsource. Web server farms and clusters are used in a Web hosting infrastructure as a way to create scalable and highly available solutions. One p opular solution is a farm of web servers with replicated disk content (see Figure 1). Web Server Farm Web Web Web Web Server Server Server Server ...... ......... Replicated Disk Content Figure 1: Web Server Farm with Replicated Disk Content. Another p opular solution is a clustered architecture, which consists of a group of no des con- nected by a fast interconnection network, suchasaswitch. In a at architecture, each no de in a cluster has a lo cal disk array attached to it. As shown in Figure 2, the no des in a 3 cluster are divided into two logical typ es: front end (delivery, HTTP servers) and back end (storage, disks) no des. The (logical) front-end no de gets the data from the back-end no des using a shared le system. In a at architecture, each physical no de can serve as both the logical front-end and back-end, all no des are identical, providing both delivery and storage functionality Web Server Cluster (Flat Architecture) Web Web Web Web Server Server Server Server ...... High Speed Interconnect ......... Shared File System Figure 2: Web Server Cluster (Flat Architecture). In a two-tiered architecture shown in Figure 3 the logical front-end back-end no des are mapp ed to di erentphysical no des of the cluster and are distinct. It assumes some underlying software layer (e.g., virtual shared disk) which makes the interconnection architecture transparent to the The NSCA prototyp e of the scalable HTTP server based on two-tier architecture is describ ed and studied in [NSCA94, NSCA95, NSCA96]. In the all solutions, eachweb server has the access to the whole web content. Therefore, any server can satisfy anyclient request. 1 One of the main problems in web server cluster (farm) management is content management and load balancing. In this pap er, we analyze several hardware/software solutions on the market and demonstrate their scalability problems. Load balancing (of either kind) for a cluster (farm) of web servers pursues the goal to equally distribute the load across the no des. This solution interferes with another goal of ecient 2 RAM usage for the cluster (farm). The p opular les tends to o ccupy RAM space in all the 1 We often use the terms of web server cluster and web server farm interchangeably, b ecause the problems as well as the solutions are often very similar. Only in those cases when it matters, it is clearly sp eci ed. 2 We consider only workloads with working set that do es not t in a single server no des RAM. 4 Web Server Cluster (Two-Tier Architecture) Front-End Web Web Web Web Server Server Server Server Nodes ...... Virtual Shared Disk Software Layer High Speed Interconnect Virtual Shared Disk Software Layer Back-End Nodes ......... Disks Figure 3: Web Server Cluster (TwoTierServer Architecture). no des. There is a redundant replication of \hot" content through the RAM of all the no des which leaves muchlessofavailable RAM space for the rest of the content, leading to a worse overall system p erformance. This observation have led to a design of the new \lo calityaware" balancing strategies [VLN97 , LARD98 ]. The cluster of web servers is considered in [VLN97 ]. They suggest the architecture similar to NCSA approach, but to store the data on disks attached to the servers in the cluster rather than in a centralized le server ( at architecture, see Figure 2). In order to exploit the aggregate memory capacity of the cluster, a new memory management strategy is prop osed to control the data caching and data replacementachieving a go o d balance b etween an intra- cluster network trac and disk I/O. Despite that the original load balancing on a cluster is done with Round-Robin DNS [RRDNS95 ](see more discussion in Section 4), the prop osed memory management strategy allows to avoid unnecessary data replication and improve the overall p erformance of the cluster. A new lo cality-aware request distribution strategy (LARD) is prop osed for cluster-based net- work servers in [LARD98 ]. The cluster no des are partitioned into two sets: front ends and backends. Front ends act as the smart routers or switches. They implement LARD to route the incoming requests to the appropriate no de in a cluster. LARD takes into account both a do cument lo cality and the current load. Authors show that on workloads with working sets that do not t in a single server no des RAM, the prop osed strategy allows to improve throughput byafactoroftwo to four for 16 no des cluster. 5 In this pap er, we outline a new scalable solution FLEX for design and management of an ecient Web hosting service. The solution can be applied to a Web hosting service imple- mented on di erent architecture platforms such as web server farms, web server clusters or multi-computer systems [HP-MCS]. FLEX motivation is similar to the \lo calityaware" balancing strategies discussed ab ove: we would liketoavoid the unnecessary do cument replication to improve the overall p erformance of the system. However, we achieve this goal via logical partition of the content on a di erent granularity level. Since the original goal is to design a scalable web hosting service, wehaveanumber of customers and their sites as a starting p oint. Each of these sites mighthave di erent trac patterns in terms of b oth the number and typ es of les accessed and the average access rates. By monitoring the trac to each site and analyzing the combined trac to a system in whole, FLEX prop oses a balanced partitioning of the customers (web sites) by the number of nodes in the system. The elegance and simplicityof the approach consists that the desirable routing can b e done by submiting the corresp ondent con guration les to the DNS server, since eachhostedweb site has a unique domain name. The DNS server is going to route the incoming requests to a corresp ondent no de in the system (accordingly to partition provided by FLEX). FLEX can be easily implemented on a top of the current infrastructure used by Web hosting service providers. A preliminary p erformance analysis provides a comparison of the current solutions and FLEX using synthetic workload generator based on Sp ecWeb'96 benchmark.