A Reference Architecture for Web Servers
Total Page:16
File Type:pdf, Size:1020Kb
A Reference Architecture for Web Servers Ahmed E. Hassan and Richard C. Holt Software Architecture Group (SWAG) Dept. of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 CANADA +1 (519) 888-4567 x 4671 {aeehassa, holt}@plg.uwaterloo.ca ABSTRACT document increases with the size and the complexity of the software system. Recently, a number of tools have A reference software architecture for a domain defines been developed to decrease this cost by helping to ex- the fundamental components of the domain and the tract the architecture of a software system [7, 16, 20, relations between them. Research has shown the bene- 21]. Using these tools, reverse engineering researchers fits of having a reference architecture for product de- have developed semi-automated processes to extract the velopment, software reuse, and maintenance. Many product’s architecture from available artifacts such as mature domains, such as compilers and operating sys- the product's source code and any available documenta- tems, have well-known reference architectures. tion. In this paper, we present a process to derive a reference The reference architecture [4] for a domain is an archi- architecture for a domain. We used this process to de- tecture template for all the software systems in the do- rive a reference architecture for web servers, which is a main. It defines the fundamental components of the relatively new domain. The paper presents the map- domain and the relations between these components. ping of this reference architecture to the architectures of The architecture for a particular product is an instance three open source web servers: Apache (80KLOC), of the reference architecture. During product develop- AOL-Server (164KLOC), and Jigsaw (106KLOC). ment, the product designer refines and extends the ref- erence architecture, based on the product’s require- ments and constraints. Recently, there has been some Keywords work on the derivation of reference architectures for Software architecture, reference architecture, domain specific domains [6]. For mature domains, the major architecture, web server. components and the relations between them have been studied extensively and are well understood [12]. For example, an operating system is understood to have 1. INTRODUCTION certain major subsystems such as a file system, a mem- ory manager, a process scheduler, a network interface, Research has shown the importance of having an archi- and an inter-process communication subsystem [19]. tecture document during the development of a software Similarly, a compiler is understood to have a scanner, system [15,17]. Such a document improves developers' parser, semantic analyzer and a code generator subsys- system understanding. It provides a building plan for tem [17]. Many of the relations between such subsys- the system and reduces its maintenance cost. It pro- tems are known and are expected to exist in the archi- vides an overview description of the system. It permits tecture of products in the same domain. For example in the developer to view the major subsystems in the soft- the architecture of a compiler, the scanner is expected ware system and the relations between them. Unfortu- to pass tokens to the parser. nately, many software systems do not have an architec- ture document. The cost of manually developing this Research has shown that a reference architecture en- server. The CNN web server locates the resource that ables software reuse and reduces the development ef- contains the daily news and sends the daily news back forts [9]. The different components of the reference to the browser, which displays it to the user. architecture provide a template for design and code reuse. Software architecture analysis methods such as Figure 1 shows the interaction between the web server SAAM [11] use a reference architecture to evaluate software and the rest of the environment. A web server alternative architectures. As new products are devel- is responsible for providing access to resources that are oped in new domains, the designers develop new sets of under control of the operating system. The most concepts and names. The comparison of architecture prominent web servers include Apache, StrongHold, alternatives in new domains is hampered by the lack of Netscape's iPlanet, and Microsoft's IIS web server [13]. consistency among concepts and terminology. A refer- The web server provides access to resources that range ence architecture provides a common nomenclature from static documents, such as HTML or text, to more across all software systems in the same domain, this dynamic resources, which are created by executing pro- allows the architectures of a set of products to be de- grams on the web server's machine. Java servlets and scribed uniformly. Such uniformity establishes a com- Common Gateway Interface (CGI) are some of the mon level of understanding and assists in comparing types of programs that are executed by web servers. the different architectures. The existence of a reference The clients accessing the web server are called brows- architecture is useful in the reverse engineering of a ers. Browsers' capabilities range from simple text ori- software system in the domain [18], by providing a set ented browser to graphically capable browsers. The of suggested subsystems and relations between them most prominent browsers include Netscape Navigator, that can be expected to exist in the investigated system. Internet Explorer, and Lynx. The web server domain is an emerging domain. The architecture of different web servers has not been stud- Browser ied extensively. Little has been published about the Network reference architecture for web servers. Luckily, three implementations of a web server are available online WEB under an open source license: Apache [3], AOLServer SERVER [2], and Jigsaw [10]. Developed by three different or- ganizations with distinct requirements, built using dif- Operating ferent development techniques, and with their source System code available online, these web servers are prime can- didates to study in the derivation of a reference archi- Resouces tecture. Using the architecture of these three servers and a modest amount of web server domain knowledge, Programs we developed a reference architecture for web server. servlet/ Files CGI We will present this reference architecture and will show how it is mapped to the architecture of each of the three web servers, to validate the derivation process. Figure 1: Web server in the network environment. 2. THE WEB BROWSER DOMAIN A web server can be thought of as the operating sys- tem's portal to the web. It provides a façade to the op- erating systems resources. It encapsulates the operating Web servers provide access to many features for users system and provides the requested resources to the such as daily news, email service, etc. A user needs a browser using the functionality of the local operating web browser to access these features. For example, system. Web servers have similar functionality, for when a user using his or her browser wants to check the example, all web servers can serve simple text files. daily news, he or she enters the Uniform Resource Lo- But each web server may have extra features based on cator (URL) for the document that contains the daily its design goals, for example, not all servers can serve news, such as http://www.cnn.com/index.html. Using Java servlets. The existence of a common set of fea- the HyperText Transfer Protocol (HTTP), the browser tures leads to the existence of a common reference ar- in turn requests the daily news from the CNN web chitecture for web servers. 3. DERIVING A REFERENCE Reference Architecure for Web Servers ARCHITECTURE Starting from the source code of three different web Conceptual Conceptual Conceptual servers and no architecture documentation, we derived Architecture Architecture Architecture a reference architecture for the web server domain. We are not web server domain experts, and we were not Concrete Concrete Concrete Architecture Architecture Architecture able to interview any of the developers of the systems. Instead, we used the defining artifact of these systems: Apache AOLServer Jigsaw their source code. Figure 2: Reference architecture derivation process. Kazman [11] describes a method to derive a reference The conceptual architecture shows the system's subsys- architecture for use in conducting the SAAM analysis. tems and the inter subsystem relations that are mean- Using a set of scenarios that represent the important ingful to the system's developers. As we were not able usages of the system, a domain expert traces the im- to interview the developers, we built the conceptual plementation of the scenarios and recovers a subset of architecture using available documentation and domain the reference architecture. Kazman [1] acknowledges knowledge that we acquired from using web serv- that the derived architecture is an incomplete reference ers/browsers and installing the Apache server. We used architecture, but it is sufficient for conducting the the Portable BookShelf (PBS) tool [13] to visualize and SAAM analysis. The derived reference architecture is validate each conceptual architecture. The PBS tool limited by the quality and the quantity of the chosen recovers the concrete architecture of the system from scenarios. the source code of the software system. The concrete architecture shows the actual relations between the dif- We will now present a process for deriving a reference ferent subsystems