Implementation and Evaluation of Transparent Fault-Tolerant Web Service with Kernel-Level Support
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the IEEE International Conference on Computer Communications and Networks Miami, Florida, pp. 63-68, October 2002. Implementation and Evaluation of Transparent Fault-Tolerant Web Service with Kernel-Level Support Navid Aghdaie and Yuval Tamir Concurrent Systems Laboratory UCLA Computer Science Department Los Angeles, California 90095 {navid,tamir}@cs.ucla.edu AbstractÐMost of the techniques used for increasing the 1 2 availability of web services do not provide fault tolerance for requests being processed at the time of server failure. Other schemes require deterministic servers or changes to the web Client Web Server Back-end client. These limitations are unacceptable for many current and future applications of the Web. We have developed an efficient 4 3 implementation of a client-transparent mechanism for providing fault-tolerant web service that does not have the limitations Figure 1: If the web server fails before sending the client reply (step mentioned above. The scheme is based on a hot standby backup 4), the client can not determine whether the failure was before or server that maintains logs of requests and replies. The after the web server communication with the back-end (steps 2,3) implementation includes modifications to the Linux kernel and to web browser, are widely distributed and they are typically the Apache web server, using their respective module mechanisms. We describe the implementation and present an developed independently of the web service, it is critical that evaluation of the impact of the backup scheme in terms of any fault tolerance scheme used be transparent to the client. throughput, latency, and CPU processing cycles overhead. Schemes for transparent server replication [3, 7, 18, 25] sometimes require deterministic servers for reply generation I. INTRODUCTION or do not recover requests whose processing was in progress Web servers are increasingly used for critical applications at the time of failure. We discuss some of these solutions in where outages or erroneous operation are unacceptable. In more detail in Sections II and V. most cases critical services are provided using a three tier We have previously developed a scheme for client- architecture, consisting of: client web browsers, one or more transparent fault-tolerant web service that overcomes the replicated front-end servers (e.g. Apache), and one or more disadvantages of existing schemes [1]. The scheme is based back-end servers (e.g. a database). HTTP over TCP/IP is the on logging of HTTP requests and replies to a hot standby predominant protocol used for communication between clients backup server. Our original implementation was based on and the web server. The front-end web server is the mediator user-level proxies, required non-standard features of the between the clients and the back-end server. Solaris raw socket interface, and was never intergrated with a Fault tolerance techniques are often used to increase the real web server. That implementation did not require any reliability and availability of Internet services. Web servers kernel modifications but incurred high processing overhead. are often stateless Ð they do not maintain state information The contribution of this paper is a more efficient from one client request to the next. Hence, most existing web implementation of the scheme on Linux based on kernel server fault tolerance schemes simply detect failures and route modifications and its integration with the Apache web server future requests to backup servers. Examples of such fault using Apache's module mechanism. The small modifications tolerance techniques include the use of specialized routers and to the kernel are used to provide client-transparent multicast of load balancers [4, 5, 12, 14] and data replication [6, 28]. These requests to a primary server and a backup server as well as the methods are unable to recover in-progress requests since, ability to continue transmission of a reply to the client despite while the web server is stateless between transactions, it does server failure. Our implementation is based on off-the-shelf maintain important state from the arrival of the first packet of hardware (PC, router), and software (Linux, Apache). We a request to the transmission of the last packet of the reply. rely on the standard reliability features of TCP and do not With the schemes mentioned above, the client never receives make any changes to the protocol or its implementation. complete replies to the in-progress requests and has no way to In Section II we present the architecture of our scheme and determine whether or not a requested operation has been key design choices. Section III discusses our implementation performed [1, 15, 16] (see Figure 1). based on kernel and web server modules. A detailed analysis Some recent work does address the need for handling in- of the performance results including throughput, latency, and progress transactions. Client-aware solutions such consumed processing cycles is presented in Section IV. as [16, 23, 26] require modifications to the clients to achieve Related work is discussed in Section V. their goals. Since many versions of the client software, the 63 II. TRANSPARENT FAULT-TOLERANT WEB SERVICE We have previously proposed [1] implementing transparent In order to provide client-transparent fault-tolerant web fault-tolerant web service using a hot standby backup server service, a fault-free client must receive a valid reply for every that logs HTTP requests and replies but does not actually request that is viewed by the client as having been delivered. process requests unless the primary server fails. The error Both the request and the reply may consist of multiple TCP control mechanisms of TCP are used to provide reliable packets. Once a request TCP packet has been acknowledged multicast of client requests to the primary and backup. All to the client, it must not be lost. All reply TCP packets sent to client request packets are logged at the backup before arriving the client must form consistent, correct replies to prior at the primary and the primary reliably forwards a copy of the requests. reply to the backup before sending it to the client. Upon failure of the primary, the backup seamlessly takes over We assume that only a single server host at a time may fail. receiving partially received requests and transmitting logged We further assume that hosts are fail-stop [24]. Hence, host replies. The backup processes logged requests for which no failure is detected using standard techniques, such as periodic reply has been logged and any new requests. heartbeats. Techniques for dealing with failure modes other than fail-stop are important but are beyond the scope of this Since our scheme is client-transparent, clients communicate paper. We also assume that the local area network connecting with a single server address (the advertised address) and are the two servers as well as the Internet connection between the unaware of server replication [1]. The backup server receives client and the server LAN will not suffer any permanent all the packets sent to the advertised address and forwards a faults. The primary and backup hosts are connected on the copy to the primary server. For client transparency, the source same IP subnet. In practice, the reliability of the network addresses of all packets received by the client must be the connection to that subnet can be enhanced using multiple advertised address. Hence, when the primary sends packets to routers running protocols such as the Virtual Router the clients, it ``spoofs'' the source address, using the service's Redundancy Protocol [19]. This can prevent the local LAN advertised address instead of it's own as the source address. router from being a critical single point of failure. The primary logs replies by sending them to the backup over a reliable (TCP) connection and waiting for an acknowledgment In order achieve the fault tolerance goals, active replication before sending them to the client. This paper uses the same of the servers may be used, where every client request is basic scheme but the focus here is on the design and processed by both servers. While this approach will have the evaluation of a more efficient implementation based on kernel best fail-over time, it suffers from several drawbacks. First, modifications. this approach has a high cost in terms of processing power, as every client request is effectively processed twice. A second III. IMPLEMENTATION drawback is that this approach only works for deterministic There are many different ways to implement the scheme servers. If the servers generate replies non-deterministically, described in Section II. As mentioned earlier, we have the backup may not have an identical copy of a reply and thus previously done this based on user-level proxies, without any it can not always continue the transmission of a reply should kernel modifications [1]. A proxy-based implementation is the primary fail in the midst of sending a reply. simpler and potentially more portable than an implementation An alternative approach is based on logging. Specifically, that requires kernel modification but it incurs higher request packets are acknowledged only after they are stored performance overhead (Section IV). It is also possible to redundantly (logged) so that they can be obtained even after a implement the scheme entirely in the kernel in order to failure of a server host [1, 3]. Since the server may be non- minimize the overhead [22]. However it is generally desirable deterministic, none of the packets of a reply can be sent to the to minimize the complexity of the kernel [8, 17]. Furthermore, client unless the entire reply is safely stored (logged) so that the more modular approach described in this paper makes it its transmission can proceed despite a failure of a server easier to port the implementation to other kernels or other web host [1].