<<

Performance evaluation of HTTP web servers in embedded systems

DANIEL LIND

Master of Science Thesis Stockholm, Sweden 2014 2 Prestandautvärdering av HTTP webbservrar i inbyggda system

av

Daniel Lind

Examensarbete MMK 2014:05 MDA 442 KTH Industriell teknik och management Maskinkonstruktion SE-100 44 STOCKHOLM

3 Performance evaluation of HTTP web servers in embedded systems

Daniel Lind

Master of Science Thesis MMK 2014:05 MDA 442 KTH Industrial Engineering and Management Machine Design SE-100 44 STOCKHOLM

4 Examensarbete MMK 2014:05 MDA 442

Prestandautvärdering av HTTP webbservrar i inbyggda system

Daniel Lind

Godkänt Examinator Handledare 2014-02-23 Martin Edin Grimheden Sagar Behere Uppdragsgivare Kontaktperson Syntronic AB Mladen Nikitovic

Sammanfattning Detta examensarbete utfördes i samarbete med Syntronic AB. Syftet var att utröna vilken prestanda som kunde uppnås med Transfer Protocol (HTTP) servrar på utvalda hårdvaruplattformar för inbyggda system. Resultatet skulle vara användbart för den som ska välja en hårdvaruplattform till ett inbyggt system med en HTTP-server, och utvärderingen innehöll därför beteende under belastning, belastningsgränser, samt användning av systemresurser.

Prestandamätningar användes för att generera data för analys, och en förstudie utfördes för att bestämma vilka plattformar, funktionalitet och prestandaparametrar som skulle ingå i studien. Tre hårdvaruplattformar med olika prestandanivåer - BeagleBoard-xM, STK1000 och Syntronic Midrange - valdes ut. En simulerad webapplikation användes under testen och totalt testades fem HTTP-serverprogramvaror.

BeagleBoard-xM med BusyBox hade totalt sett den bästa prestandan vid körning av testapplikationen. Den hade en hög överbelastningspunkt, korta behandlingstider samt överlägset beteende under överbelastning. Midrange med en modifierad version av en server skapad av Stefano Oliveri presterade dock bättre när den inte var överbelastad. STK1000 presterade klart sämre än de andra plattformarna.

Beteendet under överbelastning och effektiviteten i utnyttjandet av systemresurer skilde sig kraftigt åt mellan de olika servrarna. Testresultaten visade också att det var stor skillnad mellan HTTP-serverprogramvarorna som kördes på samma hårdvaruplatform, och generellt sett presterade programvaror med ett begränsat antal funktioner bäst.

5 Master of Science Thesis MMK 2014:05 MDA 442

Performance evaluation of HTTP web servers in embedded systems

Daniel Lind

Approved Examiner Supervisor 2014-02-23 Martin Edin Grimheden Sagar Behere Commissioner Contact person Syntronic AB Mladen Nikitovic

Abstract This Masters Thesis was carried out in cooperation with Syntronic AB. The purpose was to determine what was possible in terms of Hypertext Transfer Protocol (HTTP) server performance on selected hardware platforms for embedded systems. The results should be valuable for those who are about to select a hardware platform for an that will contain a HTTP server, and the evaluation therefore included load limits, performance characteristics and usage.

The required data was gathered with performance measurements, and a pre-study was performed to decide on platforms, functionality and performance parameters to include in the study. Three hardware platforms with different levels of performance - BeagleBoard-xM, STK1000 and Syntronic Midrange - were selected. A simulated was used during the tests and a total of five HTTP server were tested.

BeagleBoard-xM with BusyBox httpd had the best overall performance when running the test application. It had a high overload point, low connection durations when not overloaded, and a superior overload behavior. However, Midrange with a modified version of a server made by Stefano Oliveri performed better when not overloaded. STK1000 was far behind the other two platforms in terms of performance.

The overload behavior and efficiency of system resource usage differed greatly between the servers. The test results also showed that the performance varied significantly between HTTP server software running on the same hardware platform, and generally the software with limited feature sets performed best.

6 Acknowledgements

I would like to express my gratitude to the who have supported me during the work with this thesis:

● Examiner: Mats Hanson, KTH

● Supervisors: Sagar Behere, KTH and Mladen Nikitovic, Syntronic AB

● David Näslund, Syntronic AB

Stockholm June 14, 2013,

Daniel Lind

7 Table of Contents

1. Introduction...... 10 1.1 Background...... 10 1.2 Problem description...... 11 1.3 Purpose...... 11 1.4 Methodology...... 11 1.5 Delimitations...... 12 1.6 Pre-study results...... 13

2. Available HTTP server software...... 15 2.1 Barracuda Embedded ...... 17 2.2 yaSSL Embedded Web Server...... 17 2.3 Webserver...... 17 2.3.1 Comments...... 18 2.4 KLone...... 18 2.5 Fusion Embedded™ HTTPS...... 18 2.6 BusyBox httpd...... 18 2.6.1 Comments...... 18 2.7 Appweb™...... 19 2.8 ...... 20 2.9 - tiny/turbo/throttling HTTP server...... 20 2.9.1 Comments...... 20 2.10 ...... 21 2.11 HTTP servers built on top of lwIP...... 21

3. Measuring embedded HTTP server performance...... 22 3.1 Performance parameters...... 22 3.2 Factors affecting measurement results...... 22 3.2.1 The server...... 23 3.2.2 The network...... 24 3.2.3 The web application...... 25 3.2.4 The clients...... 25 3.2.5 Conclusions...... 26 3.3 Preparation...... 26 3.4 Measurement methodology...... 26 3.5 Tools...... 27

4. Test environment...... 28 4.1 ...... 28 4.2 Servers...... 28 4.2.1 BeagleBoard-xM...... 28 4.2.2 STK1000...... 29 4.2.3 Midrange...... 30

5. Test methodology...... 31 5.1 Simulated web application, instrument panel...... 31 5.2 Simulation techniques...... 31 5.3 Measurement techniques...... 32

8 6. Test results...... 34 6.1 BeagleBoard-xM...... 34 6.2 STK1000...... 41 6.3 Midrange...... 48

7. Analysis...... 52 7.1 HTTP server software comparisons per platform...... 52 7.1.1 BeagleBoard-xM...... 52 7.1.2 STK1000...... 54 7.1.3 Midrange...... 56 7.2 Platform comparison...... 57

8. Conclusions...... 63

9. Discussion...... 64 9.1 Future work...... 64 9.1.1 Complementary performance aspects...... 64 9.1.2 Software tuning and alternatives...... 64

10. References...... 65

9 1. Introduction This report describes a Master Thesis project about performance evaluation of Hypertext Transfer Protocol (HTTP) [4] servers in embedded systems, that was carried out in cooperation with Syntronic AB [77] in 2012.

1.1 Background Connecting an embedded system to a (LAN) and/or the enables remote monitoring, control and configuration of the embedded system. If a client-server model is used and the data communication is done with one of the protocols commonly used on the Internet, such as HTTP which is discussed throughout this thesis, regular web browsers can be used as client software. This is advantageous as no custom client software has to be developed and also makes every with a and a connection to the same network as the embedded system a possible client. This kind of user interface implementation is very flexible as several users can access it simultaneously and from remote locations. Another advantage is that the user interface easily can be updated or extended by modification of the software in the embedded system.

Devices that are connected to networks, such as routers, surveillance cameras and printers, are today commonly equipped with embedded HTTP servers, but usages in other types of products, for example household appliances, are also possible.

Many options exists when it comes to HTTP server software that are suitable for embedded systems, but hardware and choices can affect the number of alternatives. For example, [75] based operating systems generally provides more HTTP server alternatives than other operating systems. When it comes to microprocessor architectures, some are well supported, e.g. some ARM [74] versions, while others might require porting or new development, as very few HTTP servers are written for them.

There are many aspects to take into consideration when selecting HTTP server software for an embedded system. As the CPU performance and the amount of memory usually are limited in an embedded system, a lightweight HTTP server, e.g. a server that uses little system resources, is often preferred. Important to note, however, is that the resource usage can differ significantly between servers that are called lightweight.

The feature sets can vary greatly between different servers. Some features that are required in many embedded systems, such as support for generation of dynamic content and TLS/SSL encryption [10], are not supported by all lightweight HTTP server software. Comet [76] and byte rate throttling are two other features that can be useful for web applications in embedded systems, but lightweight server software that implement them are quite rare.

There are several technologies available for generation of web pages with dynamic content, and these technologies have different characteristics when it comes to system resource usage. Which technology to use depends on the expected load on the system. For example, some methods use little resources under small loads, but scale badly, while others have higher initial resource usage but scale more efficiently.

Performance, for example when it comes to the number of requests the server can handle per second, or the time it takes for the server to respond, also differ greatly between server software. The type of load - for example the number of simultaneous clients, number of

10 requests sent per client and the amount of data sent in each request and response - can affect the server software performance. It is important to consider that satisfying performance for one type of load does not automatically that the server will perform equally good for another type of load.

1.2 Problem description The concept of using HTTP server functionality in embedded systems is not new and it is used in existing products developed by many companies. The client had previously used basic HTTP server functionality in systems with low loads on the HTTP server. To increase competitiveness in the market they wanted to develop more demanding applications. Therefore, they wanted to investigate the performance of different hardware platforms for embedded systems, when used as dedicated HTTP servers.

1.3 Purpose The purpose of this thesis was to determine load limits, performance characteristics and system resource usage of selected hardware platforms for embedded systems when running HTTP server software. The selected platforms should have different technical capacities, for example different processor types and amounts of memory. Comparisons should be made between them regarding their performance when used as dedicated HTTP servers. The results should be valuable for those who are about to select a hardware platform for an embedded system that will contain an HTTP server.

1.4 Methodology There were several possible methodologies to consider for this study, for example:

● Performance measurements (benchmarking). ● Interviews with people that had experience with this kind of software and hardware. ● Comparisons of specifications. ● Compilation of published studies.

Performance measurements were chosen, because it was considered more accurate than the other alternatives. Hardly any studies were found that covered this subject, which ruled out the methodology of compiling results from previous studies. Drawing conclusions from specifications was also ruled out, as there is a vast number of parameters that affect the performance of a HTTP server (see the literature study presented in chapter 3), and making accurate predictions about performance would have been more or less impossible. Interviews was also rejected as a method, because it was considered too difficult to find a satisfying number of interviewees with recent and adequate experience from using HTTP servers on several different hardware platforms.

As a preparation for the performance measurements, a pre-study was conducted in which experienced engineers answered a questionnaire with questions about which platforms, functionality and performance parameters they thought were relevant to include in the study. Based on the answers, three hardware platforms with different levels of performance were selected, and the HTTP server functionalities to test, as well as the performance parameters to measure, were chosen.

11 [1] was chosen for the communication between the clients and servers, as it is widely used and enables the embedded system to be easily connected to existing local area networks, LANs, and/or the Internet. Other advantages of Ethernet are data rates up to 100 Gb/s [2] and the possibility to use Power over Ethernet [3], which makes it possible to transfer data and supply power to the embedded system through a single cable.

As the purpose of the thesis project was to compare the different hardware platforms, one possible method would have been to test all platforms with the same software setup. The only thing changing would then have been the hardware, which would have resulted in a strict comparison of what the hardware was capable of with that particular software setup. However, that approach would have had several drawbacks. Firstly, it would not have complied with the goal to investigate the load limits on the different platforms, as software that performs well on one platform might perform bad on other platforms. Secondly, all HTTP server software that were desirable to test might not have been available for all the selected hardware platforms. Because of the above mentioned reasons, an approach with different software on the different platforms was chosen instead. It was also decided that the choices of operating systems to use would be based on what was typically used on the platforms.

Research was carried out to find available HTTP server software that was suitable for use in embedded systems, and the results are described in chapter 2. Among the available alternatives, a few servers were selected to be used in the performance measurements. The selection criteria were the following, the servers should be: a) developed for, or ported to, at least one of the chosen platforms; b) supported by the operating system chosen for the particular platform; ) lightweight and/or specifically designed for use in embedded systems.

To be able to test the performance of the HTTP servers, a simulated web application was used. As described in chapter 3, there are many factors that affect the performance, and this should be taken into consideration when performing performance measurements. If the goal for example is to find the most suitable HTTP server software for a certain application, then it’s desirable to use a test setup that closely resembles the setup that will be used in the final system, as this will result in more accurate data. Because of this, a simulated web application that resembled a real world use case for a HTTP server in an embedded system was developed. As the purpose of the tests was to study load limits and performance characteristics, it was also of importance to simulate a web application where those factors matters.

1.5 Delimitations The following delimitations were done due to time constraints:

● The number of hardware platforms to test was limited to three. ● Ethernet was used for all tests, i.e. no performance comparisons were made between Ethernet and other communication technologies. ● All of the tested platforms had 100 Mbit/s Ethernet controllers and no tests were performed with either lower or higher performing Ethernet controllers. ● All tests were performed in a LAN environment, i.e. the possible performance impact of wide area network (WAN) characteristics was not measured. ● The number of HTTP server software that were tested on each hardware platform was, besides time constraints, also limited by availability. At most three HTTP server software were tested per hardware platform.

12 ● Only HTTP server software aimed at hosting web applications in the of web sites, i.e. web applications with human interaction on the client side, were considered. ● Only solutions where the entire web server application was placed in the embedded system were tested, i.e. no solutions were tested where some material and/or computations were provided by external servers. ● Default HTTP server software settings were used for all tested servers, except for a few exceptions described in the test methodology chapter. ● One operating system per platform were used during the tests. ● Default operating system settings were used during all tests. ● All tests were performed with artificially generated loads. ● All tests were performed with one simulated example application. Measurements and comparisons between different types of web applications were not conducted.

1.6 Pre-study results The pre-study consisted of three open questions that were answered by five engineers at Syntronic AB. The first question was “Which hardware platforms for embedded systems do you think are relevant to use in this study?”. The answers to this question consisted mostly of microprocessors and , but also two hardware platforms. The following hardware were considered relevant for this study:

● Atmel AVR 32-bit [78] ● [79] ● ARM Cortex-M3 [80] ● ARM Cortex-M4 [81] ● ARM Cortex-A8 [82] ● ARM Cortex-A9 [83] ● Syntronic Midrange (uses ARM Cortex-M3) ● Arduino [84]

Of these options ARM Cortex-M3, Atmel AVR32 and ARM Cortex-A8 were chosen, as they have different technical capacities and therefore represent three different levels of hardware performance.

The second question was “Which web server functionality would you like to see in embedded systems?”. The following functionality and technologies were suggested:

● Audio streaming ● Video streaming ● Responding to HTTP requests with HTTP responses containing HTML ● applets ● Dynamically generated responses ● Firmware upgrade ● [85] together with HTML5 [86] ● Presentation of real-time data

These answers were very diverse and all functionality mentioned above could not be tested because of time constraints. The following functionalities were chosen as they could be combined into a single example application that could be simulated in the tests: responding to HTTP requests with HTTP responses containing HTML, dynamically generated responses and presentation of real-time data.

13 The third question was “Which performance parameters are important for the functionality you mentioned above?”. The answers contained the following suggestions:

● Speed ● Robustness ● Volatile memory usage ● Nonvolatile memory usage ● Number of possible connections per time unit ● The responsiveness of the web interface ● Data throughput

The following measurements, at different levels of load, were chosen to cover all the suggested parameters:

● HTTP reply rate ● HTTP reply rate standard deviation ● Durations for successful TCP [87] connections ● Standard deviations for durations of successful TCP connections ● CPU usage ● Volatile memory usage ● Nonvolatile memory usage ● Network throughput ● Communication errors ● Server crashes

14 2. Available HTTP server software This chapter gives an overview of the HTTP server software that was available for use in resource constrained environments in March 2012. It is not a complete list of available server software, but it should give the reader an idea about what kinds of software there were to choose between. The servers described are either specifically designed for use in embedded systems, or designed for low system resource usage in general. They are all general purpose servers that are not tied to a specific hardware platform.

Only servers with the capability to generate web pages with dynamic content are described, as that is required if the server should be used for monitoring of an embedded system. It is also a very useful feature if the server is used for configuration of the embedded system, as it makes it possible to display the current settings and verify that the desired changes have been made. An overview of the HTTP server software that are described in this chapter is presented in Table 2.1., and more detailed descriptions are provided in chapters 2.1 to 2.10. Chapter 2.11 contains brief descriptions of three demonstration servers (not included in Table 2.1.) that can be used as a starting point when developing HTTP servers on top of lwIP [91] (a lightweight TCP/IP implementation).

The quality and amount of documentation varies greatly between the different servers, which is reflected in the level of detail in the descriptions in this chapter. Specific “Comments” sections are used in some places in this chapter to cleanly separate the author’s reflections from the facts about the servers.

As a complement to the server descriptions below, it’s worth mentioning that if very little system resources are available, a custom HTTP server that only implements the functionality that is needed by the web application that it serves, can be a good solution. Such a server can be made very lightweight by, for example, only implementing a small part of the HTTP protocol. An extreme example would be a web application that only consists of one page with dynamically generated content and doesn’t rely on data provided in the HTTP request. In such applications the parsing of the HTTP request can be completely omitted by the HTTP server, which results in reduced volatile and nonvolatile memory usage, as well as reduced CPU load.

15 Table 2.1. Overview of HTTP server software that are covered in this chapter. Abbreviations used: (CGI), (SSI), Embedded Server Pages (ESP), Fast Common Gateway Interface (FastCGI), Simple Common Gateway Interface (SCGI). Server Type Dynamic SSL/TLS generation technologies support

Barracuda Embedded Framework Lua or C/C++ scripts Yes Web Server [12], [13]

yaSSL Embedded Standalone application or CGI, SSI Yes Web Server [14] embedded in another application

Boa Webserver [5] Standalone application CGI No

KLone [20] Framework CGI or scripts that are Yes compiled into the server’s binary file

Fusion Embedded™ Framework CGI Yes HTTPS [22]

BusyBox httpd [23] Standalone application CGI No

Appweb™ [26] Standalone application or CGI, ESP or in-memory Yes framework modules for Ejscript and PHP

Cherokee [43] Standalone application CGI, FastCGI, SCGI, Yes uWSGI or SSI

thttpd [54] Standalone application CGI, SSI No

Lighttpd [88] Standalone application CGI, FastCGI, SCGI, SSI, Yes among others

16 2.1 Barracuda Embedded Web Server The Barracuda Embedded Web Server, developed by Real Time Logic, is a library that can be used to assemble web servers, and it is designed specifically for use in embedded systems [12], [13]. It can be embedded into an application, for example a firmware on a , or assembled to run as a standalone in an operating system.

The library is written in ANSI C and the use of abstraction layers for TCP/IP, kernel primitives and I/O access simplifies the porting process to different kinds of embedded systems [13]. It has been ported to several operating systems designed for embedded systems, such as ThreadX/NetX, INTEGRITY, VxWorks, QNX, Windows CE, embOS, SMX, MQX and Linux based operating systems. It is designed for 32 and 64 bit microprocessors and runs on, for example, , PowerPC, ColdFire and ARM.

The server’s functionality can be extended by the use of plugins [13]. These plugins add support for many features such as SSL/TLS [10] and dynamic generation of web pages through scripts written in either C/C++ or the Lua programming language. Minimum requirements are 250 KB ROM and 60 KB RAM for the basic web server and 600 KB ROM and 500 KB RAM if all plugins are enabled.

Four licensing alternatives were available at the time of writing: “royalty based binary package”, “royalty free binary package”, “royalty free source code package” and “free developer license for independent consultants” [19].

2.2 yaSSL Embedded Web Server The yaSSL Embedded Web Server [14] is based on the [16] web server and is designed for resource-constrained embedded systems. The binary size is less than 100 KB with SSL/TLS enabled and 40 KB without SSL/TLS. SSL/TLS functionality is provided by the CyaSSL library [17] that supports SSL 3.0, TLS 1.0, TLS 1.1 and TLS 1.2. Generation of dynamic web pages can be done with Common Gateway Interface (CGI) [9] or Server Side Includes (SSI) [53].

The server is written in ANSI C but Python and C# bindings exists [14]. The source consists of a single .c file and does not depend on any external libraries, except for CyaSSL if SSL/TLS support is enabled. Several operating systems are supported, including Windows, Mac OS, , *nix, *bsd, ThreadX, VxWorks, QNX, OpenWrt, Tron, iTron, Microitron, OpenCL and MontaVista.

The software is distributed under both the GPLv2 [18] license and a commercial license [15].

2.3 Boa Webserver Boa is a lightweight web server designed to be fast and secure [5]. This is partly achieved by having a very limited set of features. The only supported technology for dynamic generation of web pages is CGI [9] and it does not support SSL/TLS.

The server is written in C and developed for GNU/Linux, but unofficial ports for other operating systems exist [6]. If Boa is dynamically linked the binary size can be as small as 61 KB on Linux, and if its statically linked with uClibc [11] the binary size can be as small as 92 KB.

17 Boa is open-source and is distributed under the GNU General Public License (GPL) [8].

2.3.1 Comments The server does not seem to be under active development anymore, as the official states that the latest version was released in 2005 [7]. However, the fact that the software is open-source and distributed under the GPL makes it possible to do modifications to the code, such as bug fixes, ports to other operating systems or addition of features, if desired.

2.4 KLone KLone [20] is a framework used to develop web applications for embedded systems. It consists of two parts, a web server and a software development kit (SDK) that is used to create the web pages. Dynamic web pages are generated with CGI or scripts containing a mix of HTML and C/C++, that are compiled and linked to the web server, resulting in a single binary file. The web server supports SSL/TLS.

The size of the server’s binary file is about 130 KB with dynamic linking and SSL/TLS enabled [20]. RAM usage varies between 110 KB with static linking without SSL/TLS and 350 KB with dynamic linking and SSL/TLS enabled through OpenSSL [21].

The server is written in ANSI C99 [20] and has been ported to and tested on GNU/Linux 2.x, QNX Momentics 6.5.0, VxWorks 6.x, FreeBSD 4.x, 5.x, 6.x, 8.x, NetBSD 2.0.X, 2.1, 3.X and 5.X, OpenBSD 3.8 and 4.7, OpenSolaris 2009-06, 3.1.8, Darwin 7, 8 / MacOSX 10.3, 10.4, 10.5 and XP.

The server is open-source and can be distributed under the GPLv2 license or a royalty-free commercial license [20].

2.5 Fusion Embedded™ HTTPS Fusion Embedded HTTPS [22] from Unicoi Systems, Inc. is a web server that supports CGI and SSL/TLS, but a version without SSL/TLS is also available. The server contains an API that makes integration with other applications possible.

It is written in ANSI C and is designed to have few dependencies in the form of operating system features, hence making it easy to port. The size of the binary file is between 7 KB and 11 KB. The server is distributed under a royalty-free license.

2.6 BusyBox httpd BusyBox [23] is a lightweight replacement for the GNU Coreutils [25] developed for use in embedded systems that runs Linux. One of its features is a small HTTP server, called httpd, that supports CGI [24]. BusyBox is distributed under GPLv2.

2.6.1 Comments This server can be a very convenient alternative when a simple HTTP server is needed in a system where other BusyBox features already are being used. It might also be a reason to choose BusyBox instead of other lightweight GNU Coreutils replacements.

18 2.7 Appweb™ Appweb [26] is a feature-rich, but still lightweight, web server designed for efficient hosting of web applications [31]. It is event-driven and multi-threaded [33], and the design goals include low request latency, efficient memory usage and the ability to handle small loads well [31]. Minimum memory usage is 800 KB and an arena-based memory allocator is used to prevent memory leaks and increase performance [33]. Appweb has many features, including support for execute in place (XIP), HTTP/1.1 and CGI/1.1 [42].

Appweb can be used either as a stand-alone web server or embedded in another application with the use of the Appweb server library [32]. A separate program is provided that monitors the server if it is used as a stand-alone process and restarts the server if necessary. Ports are available for Linux, Windows, Mac OSX, Solaris, VxWorks and FreeBSD [33], [42]. Supported CPU architectures are: ARM, MIPS, i386/X86/X86_64, PowerPC, SH and Sparc [33], and full cross-compilation support is provided [42].

Several features are separated into modules which are loaded at run-time, and the user can choose which modules that should be loaded by modifying a configuration file [34]. This makes it possible to minimize the memory usage by only loading the modules that are required by the application. Which functionality that should be included can also be configured at compile time by modification of a header file, which makes it possible to minimize the size of the binary file.

Four modules are provided for dynamic generation of web pages [32]: a module for CGI [38], an in-memory module for a server-side JavaScript web application framework called Ejscript [35], an in-memory module for PHP [36] and a module for the Embedded Server Pages (ESP) [39] that uses the C programming language to generate dynamic web pages. Older versions of Appweb also contained an embedded gateway interface (EGI) handler [37], which was a module that added support for in-process CGI. Running Ejscript or PHP in-memory gives increased performance, but can result in decreased server reliability, as bugs in the web application can affect the web server process [32].

A module is also provided for SSL/TLS [32] and Appweb supports both OpenSSL [21] and PeerSec MatrixSSL [41], [40].

Appweb uses a technique called “sandboxing” which makes it possible for the user to limit the server’s use of system resources [27]. The user can for example configure how much heap memory the server is allowed to allocate, decide which action to take when the memory limit is reached, configure the number of threads used to service requests and how much stack space that should be allocated for each of these threads [28]. Some of the configuration options, such as the maximum number of requests allowed for a single connection, can be used to limit the consequences of a denial-of-service attack.

Appweb can be distributed under an open-source GPL license or a commercial license available from Embedthis Software [30]. The external modules are distributed under their own licenses [29].

19 2.8 Cherokee Cherokee [43] is a lightweight web server that is completely modular. It supports SSL/TLS and several methods for generation of dynamic web pages. The basic server’s only dependency is the C standard library [45] and ports are available for Unix, Linux and Windows [43]. The size of the binary file depends on the options selected during compilation, and a static build for an embedded device can create a binary file as small as 200 KB [45]. Some configuration options for control of the server’s use of system resources are available, such as the possibility to specify the number of threads used when handling incoming requests [46].

SSL/TLS support is provided by the OpenSSL backend by default, but the server’s modular design makes it possible to implement other backend libraries [45]. Five different technologies are provided for generation of dynamic web pages [44]: the Common Gateway Interface (CGI) [9], the Fast Common Gateway Interface (FastCGI) [49], the Simple Common Gateway Interface (SCGI) [50], the uWSGI protocol (uWSGI) [51] and Server Side Includes (SSI) [52]. Modules for audio and video streaming are also available [44].

Cherokee is distributed under the GNU General Public License version 2 [18], but alternative licensing schemes can be obtained from the company Octality [48], [47].

2.9 thttpd - tiny/turbo/throttling HTTP server thttpd is a small web server [54]. It has few features and is designed to use small amounts of memory, which is accomplished partly with a scaling strategy that does not involve forking. It is written in C and can be compiled on FreeBSD, SunOS 4, Solaris 2, BSD/OS, Linux and OSF. The size of the executable file is about 50 KB [58].

CGI 1.1 is supported [55] and a few options are available for control of the CGI programs [57]. It is for example possible to set a time limit for the execution of CGI programs. This can be valuable as it frees up system resources if there are bugs in the CGI programs that for example causes infinite loops. It is also possible to set the priority level for the CGI processes, thereby controlling the amount of CPU time given to the process by the operating system. A CGI program that implements Server Side Includes (SSI) is available [56].

The server also includes a feature called URL-traffic-based throttling, which makes it possible to set both maximum and minimum byte rates on single or groups of URLs [55].

2.9.1 Comments URL-traffic-based throttling can be useful if there are specific web pages that are more important than others and should get priority in a situation where the byte rate is a limiting factor. If the server for example serves a dynamic page that shows warnings if a safety critical failure occurs, that page can be given higher maximum and minimum byte rates than less important pages.

20 2.10 Lighttpd Lighttpd stands out in this list, as it is not designed for resource constrained systems. On the contrary, it is actually optimized for high performance environments and large numbers of concurrent connections [88]. What makes Lighttpd interesting in the context of embedded systems is that it aims to have low system resource usage compared to other servers capable of the same levels of performance.

Lighttpd is a modularized, feature packed HTTP server that has an extensive amount of configuration options [90]. It supports SSL and has several modules for generation of web pages with dynamic content, such as modules for CGI, FastCGI, SCGI and SSI.

The server is distributed under the revised BSD license [89].

2.11 HTTP servers built on top of lwIP lwIP is a lightweight TCP/IP implementation developed for use in embedded systems [91]. It is designed to be easy to port to different platforms, and can be used with an underlying operating system or as a standalone application [92].

Two examples of HTTP servers built on top of lwIP can be found on the project’s website [93]. The first example [94] demonstrates how to make a very basic HTTP server that uses C code to generate web pages. The second example [95] is much more advanced and implements both SSI and CGI for generation of web pages with dynamic content.

Many more demonstration servers exist, and some of them also include examples of web applications. One of these is a server made by Stefano Oliveri [96] that demonstrates how to create bidirectional communication over TCP/IP using a HTTP server that serves a web page containing a .

21 3. Measuring embedded HTTP server performance This chapter discusses techniques that can be used to measure the performance of HTTP servers, as well as factors that affect the measurement result.

3.1 Performance parameters Many different performance parameters can be used when measuring the performance of a HTTP server. Which parameters to choose depends on the purpose of the measurements. If you for example want to find out if the network the HTTP server is connected to acts as a performance bottleneck, then the network’s throughput in the unit kilobits per seconds can be useful.

Which performance parameters that are relevant to measure for a HTTP server in an embedded system depends on the load characteristics and the type of web application that the server serves. In other words, a performance parameter that is crucial in one application can be irrelevant to measure in other applications.

A performance parameter that is very common in results published on the Internet is the number of requests per second the server can respond to under different load conditions. This is a very important parameter if the web application for example is a monitoring system that requests data from the server at a certain frequency. On the other hand, if the web application is used for configuration of an embedded system and is used once every month by one user, then this performance parameter might be irrelevant to measure.

Two other common performance parameters are response time, measured in milliseconds, and throughput, measured in bytes per second. The response time and throughput does in some use cases only affect the user experience and not the actual functionality, but it can still be important, even in these cases, as users can be annoyed if the web application is too unresponsive. However, long response times and low throughput can be seen as acceptable in some cases, where the main functionality of the embedded system uses a large percentage of available system resources and the web server functionality has a low priority. Low response times and high throughput can in other use cases be very important, for example if the web application is used for manual control of some functionality in the embedded system.

3.2 Factors affecting measurement results There are many factors that affect HTTP server performance and performance measurement results. Some of the factors might seem trivial, while others, like certain operating system settings, can be easy to miss. The factors include, but are not limited to, those listed below and are here divided into four categories:

1. The server. 2. The network. 3. The web application. 4. The clients.

22 3.2.1 The server As shown by Pariag et al. [67] both the server software architecture and tuning of server software parameters can have a significant impact on performance. Many of the servers designed for use in embedded systems are optimized for low system resource usage and usually has settings that can be used to limit their system resource usage with the tradeoff of decreased performance. Therefore, it is important to review, and potentially change, the default settings, in order to make sure that they are optimized for the intended application. Many servers are also modularized, and as a rule of thumb only modules that are required by the application should be activated. Otherwise you run a risk of degraded performance by having a module active, even if it is not used by the web application. Some server features can have a major impact on the server’s performance. Coarfa et al. [68] showed that the use of TLS can result in substantially increased CPU load and lowered server throughput.

A characteristic that differs between HTTP servers is their overload behavior. A server is considered to be overloaded when the response rate is lower than the request rate, or when the response times gets unreasonably high. Experiments performed by Voigt [69] show that overload behavior differs between different HTTP server software and that the overload behavior for some servers is affected by the type of load the server is subjected to. Voigt measured the throughput, measured in connections per second, the servers could achieve for different request rates. Further increasing the request rate after overload had been reached, resulted in either an almost constant throughput or a decreased throughput, depending on server software and type of load. These two kinds of behaviors are illustrated in Figure 3.2.1.1 below. Experiments performed by Banga and Druschel [70] as well as Titchkosky et al. [72] showed the same kind of throughput degradation during overload. One of the tested servers in the experiments performed by Voigt reached its maximum throughput at 400 requests per second and showed a decrease in throughput as high as 100 connections per second when the request rate was increased from 400 to about 580 requests per second. Voigt’s experiments further showed that the ability to maintain reasonable response times during overload differs between servers.

Figure 3.2.1.1 Two examples of overload behavior. The left graph describes a behavior with constant throughput after the overload point, the right graph describes a behavior with decreased throughput.

An experiment performed by Titchkosky et al. [72] showed that the response rate of an overloaded server can oscillate over time, even when the request rate is constant. The results from the experiment performed by Titchkosky et al. further showed that some servers have the ability to reach a response rate close to an overload request rate during a short period of time, a few seconds in that particular experiment, before more drastic performance degradation occurs.

23 Titchkosky et al. tested several techniques for generation of web pages with dynamic content. Two of these, PHP and , were tested in combination with the same HTTP server, as modules for Apache 1.3.27. The results showed significant differences in overload points for PHP and Perl, which indicates that the choice of technique for generation of web pages with dynamic content can have a major influence on a server’s performance. In addition to the choice of programming language used for the scripts that generate the web pages, you also have several techniques to choose between when it comes to ways of executing the code, such as CGI, FastCGI and different kinds of server modules.

The choice of operating system can also affect the server performance, and some operating system settings can act as bottlenecks and significantly decrease the server’s performance if they are not properly tuned. An example of such a variable in Linux is the maximum number of file descriptors that a single process is allowed to open [65]. This variable affects some server software architectures more than others, as described by Midgley [65]. Server software that use several system processes to process requests are less likely to be affected, while single process servers that use one file descriptor per connection can get severe performance degradation if the limit is too low. Other operating system features, such as logging of incoming connections, can also decrease the server’s performance if they are activated.

Another example of a potential operating system setting bottleneck is the somaxconn kernel variable that is used in most UNIX-based TCP/IP implementations to limit the sum of the lengths of the queues used for storing new connections before they are completely established and passed on to the HTTP server for processing of incoming data [70]. The server TCP will ignore new connection requests as long as these queues are full, and according to Banga and Druschel [70] this behavior can limit the throughput of the server if the somaxconn variable is set too low. The reason for this is that new connection requests will be dropped even if the server software is capable of processing them. Banga and Druschel further states that the sum of the lengths of these queues depends on the round-trip delay between the server and the clients, the connection request rate and the rate at which the HTTP server processes requests. An increase in round-trip delays results in an increase in the sum of the queue lengths. An effect of this that is very important to consider is that a somaxconn setting that works well in a LAN can become a bottleneck in a WAN, where the round-trip delays typically are longer.

On many platforms there are several TCP/IP implementations to choose between, and an important consideration is that some of them that are developed for embedded systems are optimized for low memory usage, and not for high speed.

Other software running on the same hardware as the HTTP server can of course degrade the HTTP server’s performance, by using shared resources. This can be an important factor in embedded systems where the HTTP server is not considered to be the main functionality, and runs with low priority.

Many parts of the server’s hardware may become bottlenecks for the server’s performance. Servers can for example be memory, CPU or I/O bound.

3.2.2 The network There are a number of factors related to the network that the HTTP server is connected to that affect the performance of the server. Bandwidths, round-trip delays and packet loss rates are examples of these factors, and differs in different kinds of networks. Networks where

24 HTTP servers are used can be split into two major categories, local area networks (LANs) and wide area networks (WANs). The main difference between the two is that WANs generally have higher packet loss rates and round-trip delays.

Banga and Druschel [70] measured how packet loss rates and round-trip delays in WANs affect HTTP server performance. They performed their benchmarks in a LAN, but placed a router in front of the benchmarked servers and used software in the router to artificially create round-trip delays and packet loss. They benchmarked two HTTP servers, Apache 1.2.4 and Zeus 1.3.0. The results from the experiments with artificial round-trip delays showed that a 200 millisecond delay caused Apaches throughput to decrease about 54% compared to the throughput at 0 milliseconds delay. For Zeus the throughput decreased about 20%. Based on their results, Banga and Druschel concluded that “... wide-area network delays have a significant impact on the performance of Web servers.”. Another conclusion that can be made is that the performance degradation caused by round-trip delays differs between servers.

Banga and Druschel [70] also performed benchmarks with a constant round-trip delay, but varying packet loss rates. The results from this experiment showed that also packet loss rates can cause a significant degradation of server throughput. Banga and Druschel explained the throughput decreases, caused by both round-trip delay and packet loss, with increased durations for HTTP transactions, which increased the number of concurrent connections that the servers had to handle, for a fixed request rate.

3.2.3 The web application The type of web application, and also the design of the web application, that the HTTP server serves can affect the server’s performance. Titchkosky et al. [72] benchmarked two versions of the Apache web server, 1.3.27 and 2.0.45, and measured response rates for both delivery of web pages with static content and delivery of web pages with dynamic content. The web pages with dynamic content was generated by PHP scripts on Apache 1.3.27 and on Apache 2.0.45 benchmarks were performed with both PHP and Perl. Both PHP and Perl were running as Apache modules. After analysis of the test results, Titchkosky et al. concluded that “... dynamic page generation alone can reduce the server's peak response rate by a factor of 3 to 4.”.

As described above, in section 3.2.2, the HTTP server’s throughput can be affected by the durations for each HTTP transaction. A result of this is that the throughput can be affected by the execution time for the code that is used to generate the responses. Hence, activities such as fetching sensor values, fetching information from other processors in the embedded system, or querying a can degrade the HTTP server’s throughput.

3.2.4 The clients The characteristics of the load that the clients generate on a HTTP server affects the server’s performance. One example of such characteristics is the burstiness of the generated traffic. Banga and Druschel [64] tested the effects of burstiness on the throughput of the NCSA 1.5.1 server and concluded that “... even a small amount of burstiness can degrade the throughput of a Web server.”.

The clients hardware and software can also affect the performance. For example, slow client machines may increase the durations of HTTP transactions and cause server throughput degradation, in the same way as round-trip delays and packet loss.

25 3.2.5 Conclusions The large number of factors affecting a HTTP server’s performance, and the fact that some of these factors have a significant impact, makes it possible to make some important conclusions. Firstly, it is difficult to compare results of benchmarks done with different test setups. Secondly, one should be careful when generalizing benchmarking results from a test environment, as the HTTP server’s actual performance in the production environment can differ greatly, if for example the networks packet loss rate is different.

3.3 Preparation There are many possible reasons to why you would like to carry out performance measurements of an HTTP server in an embedded system, for example finding the most suitable server software for a certain application, verifying that a system can handle defined requirements, or finding bottlenecks to be able to optimize a system. Whatever the reason, the recommended approach is to create a test environment that as closely as possible emulates the production environment, within the constraints of budget and time, in order to get as useful and accurate results as possible. The reason for this is the many factors that will affect the server’s performance, as described in section 3.2.

In order to set up the test environment, there are a few major areas to consider. What should be measured, which performance parameters are relevant? What type of web application? What type of load should be used? For example, how many clients should access the server simultaneously, user behavior patterns, amount of data in both requests and responses, load distribution between different web pages in the application? What type of environment should be simulated? For example, LAN, WAN, or both LAN and WAN? Software and hardware on the server and client side?

3.4 Measurement methodology The test setup that most accurately simulates real world scenarios is to use one software client per client machine. This can be a feasible setup if the goal is to simulate a use case with just a few clients, but if the goal is to for example simulate the load of a few hundred clients, then this approach gets impractical and probably quite expensive. A common approach is to instead use a small number of client machines running load generation software that simulates a large number of clients. This approach has the advantage of keeping costs lower and can, in some cases, decrease the complexity level of the system.

If the server will be accessed through a WAN in the real world application, then either a WAN or some kind of WAN simulation should be used during the tests, as it will make the test results more accurate. If the test results should be repeatable, which is crucial if the goal with the measurements is to compare different servers or tune a server, then a WAN simulation that has the same behavior for every test run is more appropriate than performing the tests with the use of a real WAN. If the tested server for example is accessed through the Internet, then the results from different test runs might not be comparable, as the load in the network can be very different between the different test runs. Banga and Druschel [64, 70] propose a test setup where the clients are connected to the server through a router, as an artificial delay can be added in the router’s forwarding mechanism to simulate WAN delays. The router can also be used to simulate other WAN effects, such as packet loss and bandwidth fluctuations [70]. However, it is important to make sure that the router isn’t a bottleneck.

26 If the goal of the measurements is to find a server's performance limits, then, as stated by Midgley [65], the results of the measurements will only be accurate if the server is the only bottleneck in the test environment. Banga and Druschel [64, 70] underline the importance of taking the clients performance into consideration, as poor client performance can distort the measurement results. Some factors that can affect a HTTP server’s performance, such as operating system settings related to TCP, can also affect the performance of a client machine that is running load generation software. On the other hand, if the test is used for performance verification of an entire system, both the network and clients can be bottlenecks. In this case it’s crucial that the test environment is as similar to the production environment as possible.

A method, proposed by Midgley [65], that can be used to determine if the clients are a bottleneck, is to perform two test runs with different amounts of client machines, but with the same resulting load on the server. The clients are not a bottleneck if the results for the two test runs are equal. Furthermore, Midgley advice against the method of using system load measurements in the client to determine if it is a bottleneck, as some benchmarking tools report 100% CPU usage even when they haven’t reached their maximum capacity. Tools that measure the network load can be used to determine if the network is a bottleneck. If the network load is close to the network's maximum limit, then the network might be a bottleneck.

A technique that can be used to find load limits, that is commonly described in the literature, is to perform a series of test runs with different levels of load generated from the client machines. This can for example be used to find a server’s overload point for a certain type of load. A decision that can have a great impact on the measurement results from this kind of study is the duration for each test run. The reason for this is the fact that some HTTP servers can handle load levels above their overload point for a short period of time, as described in section 3.2.1. The ideal duration for each test run is therefore the duration that the server would experience the load in the production environment, with an added safety margin. However, this is not always possible due to time and budget constraints. The problem of selecting a duration that gives accurate results is discussed by Titchkosky et al. [72], but no general solution is proposed and the study resorts to a trial and error approach.

3.5 Tools There are many tools available, both free and commercial, that can be used to measure the performance of HTTP servers. The feature sets vary greatly between them, but it is common that the same tool is used both for making measurements and generating traffic. Below you can find a list of some of the available tools.

- HTTP performance measurement tool [59] ● ab - Apache HTTP server benchmarking tool [60] ● Apache JMeter™ [61] ● Tsung [62] ● weighttp [63] ● Load Tester [66] ● Webserver Stress Tool [71] ● curl-loader [97] ● OpenSTA [98] ● IxLoad [99] ● Spirent Avalanche [100]

27 4. Test environment This chapter describes the hardware and software that was used in the tests. The test environment consisted of one client machine and three servers. The client machine was connected directly to one server at a time with an Ethernet cable. This minimal and isolated network setup was used to eliminate the risk of interference that would be present if other equipment was connected to the network. The use of an isolated LAN in favor of a LAN or WAN with competing communication increased the reliability of the tests.

4.1 Client The software that was used for measuring the web servers’ performance was httperf-0.9.0 [59], compiled without debug mode support. Among the options described in chapter 3 httperf was chosen for several reasons. It could measure all the parameters that were selected in the pre-study and it could be configured to simulate the characteristic load that the tested application would have been subjected to in the real world. Furthermore it provided measurements of disturbances from other software running on the client.

The client machine that was used for running httperf was a HP ProBook 6550b with a Intel Core i5 M450 processor, 4 GB memory and a 38.8 GB hard drive. The operating system was the 64-bit version of 12.04 LTS. The Ethernet controller was an Intel 82577LC with 1 Gbit/s capacity.

4.2 Servers One hardware platform was selected for each processor that was selected in the pre study. The requirements for the platforms was that they had to have support for 10/100 Mbit/s Ethernet. Each of the selected platforms are described below.

4.2.1 BeagleBoard-xM The BeagleBoard-xM [101] was the most powerful of the three hardware platforms and its specifications can be seen in Table 4.2.1.1.

The operating system that was used was the Ångström Distribution version 2010.7 using Linux 2.6.32. The Ångström Distribution was chosen as it had support for the BeagleBoard-xM and several web servers. The operating system was installed on a micro SD card.

The program that was used for measuring CPU load and memory usage was the sar utility in sysstat version 9.0.6. Three servers were tested: BusyBox httpd v1.13.2, Lighttpd/1.4.26 and Cherokee Web Server 0.99.24. They were chosen as they were supported by Ångström and could generate HTTP responses with dynamically generated content.

28 Table 4.2.1.1. BeagleBoard-xM specifications. Processor architecture ARM Cortex-A8

Processor model TI DM3730

Processor performance 2000 DMIPS, 1 GHz

Board version BeagleBoard-xM Rev A

Volatile memory 512 MB external

Nonvolatile memory 4 GB micro SD card

Ethernet controller 10/100 Mbit/s, external with MAC and PHY

4.2.2 STK1000 The STK1000 [102] was the second most powerful of the three hardware platforms and its specifications can be seen in Table 4.2.2.1.

The operating system that was used was AVR32 Linux using the 2.6.35.4 kernel. AVR32 Linux was chosen as it was shipped together with the board and also supported several web servers. The operating system was installed on a SD card.

The program that was used for measuring CPU load and memory usage was the sar utility in sysstat version 9.0.5 and the top utility in BusyBox v1.16.2. Three servers were tested: BusyBox httpd v1.13.2, Lighttpd/1.4.26 and thttpd/2.25b. They were chosen as they were supported by AVR32 Linux and could generate HTTP responses with dynamically generated content.

Table 4.2.2.1. STK1000 specifications. Processor architecture Atmel AVR32 AP7

Processor model AT32AP7000

Processor performance 210 DMIPS, 140 MHz

Board version STK1000

Volatile memory 8 MB external SDRAM, 32 kB internal

Nonvolatile memory 256 MB SD card, 8 MB parallel flash

Ethernet controller 10/100 Mbit/s, internal MAC, external PHY

29 4.2.3 Midrange The Midrange platform, developed by Syntronic AB [77], was the least powerful of the three hardware platforms and its specifications can be seen in Table 4.2.3.1.

The operating system that was used was FreeRTOS V5.4.2. FreeRTOS was chosen as it is the operating system that is most frequently used by Syntronic on this platform. The operating system was installed in an internal flash memory.

The server that was tested was a modified version of a server made by Stefano Oliveri (SSO) [96]. The server was originally bundled with an example web application, but that application was removed and replaced with the test application. This server was chosen as it was the only server that was ported to the platform and could generate HTTP responses with dynamically generated content. The server and operation system was compiled with GCC, without debug information and with the -O2 optimization flag set.

CPU load and memory usage was not measured on the Midrange platform, see chapter 5.3 for details.

Table 4.2.3.1. Midrange specifications. Processor architecture ARM Cortex-M3

Processor model STM32F103VCT6

Processor performance 90 DMIPS, 72 MHz

Board version Syntronic Midrange

Volatile memory 48 kB internal

Nonvolatile memory 256 kB internal, optional SD card

Ethernet controller 10/100 Mbit/s, external via SPI

30 5. Test methodology This chapter describes the methodology that was used for the tests, as well as the simulated web application. 5.1 Simulated web application, instrument panel

As described in chapter 1.6, the functionality that the simulated web application should include was:

● Responding to HTTP requests with HTTP responses containing HTML. ● Dynamically generated responses. ● Presentation of real-time data.

The type of web application that was chosen was an instrument panel, as it is commonly used in embedded systems and incorporates all these functionalities. The simulated web application implemented the real-time aspect by sending a web page containing an script as the first response to a new client. The AJAX script then automatically generated requests for new data to display on the instrument panel at a fixed frequency. The responses to the AJAX requests were generated dynamically on the server and contained the new values for the instrument panel. All the traffic was sent over a LAN using HTTP and the responses consisted of HTML.

The simulated load consisted of one client that was requesting instrument panel updates from the server during an extended amount of time. The simulated load can also be interpreted as several concurrently connected clients. If the total load for example is 100 TCP connection initiations per second, it could be seen as one client requesting updates 100 times per second, or 100 clients requesting updates one time per second. However, the possible scenario of all clients sending requests at the same time, and causing bursts of requests to the server, was not simulated. 5.2 Simulation techniques

The AJAX requests were simulated by configuring httperf to send requests at a fixed frequency. Httperf was configured to initiate a new TCP connection for every HTTP request. The web page containing the AJAX script was not simulated, as it is requested only once for every new client, whilst the requests for new data can occur an unlimited amount of times for each client. The load generated by the request for the initial web page is hence negligible compared to the load generated by the AJAX requests when the load consists of a few clients that are connected for an extended amount of time.

On the BeagleBoard-xM and STK1000 platforms the responses were generated by CGI scripts that were called by the tested HTTP server software. On the Midrange platform the responses were generated directly by the HTTP server software. In a real instrument panel application, the code that generated the responses would have been responsible for reading sensor values and include these values in the responses. However, the code used in the tests generated the same output every time it was called, but generated that output using the same technologies that would have been used to generate dynamic responses. The reason for this was that the time it takes to read real sensor values can be non-deterministic, which would make comparisons of the measurement results unreliable. Random output from the code that generated the responses was considered, but not chosen. Instead the code

31 generated the exact same data for every test, in order to make the results repeatable and the different test runs comparable. As described in chapter 3, the amount of time it takes to generate HTTP responses can affect the server’s performance, as the system resources tied to the request will be held during the time when the response is generated. Furthermore, the purpose of the tests were to measure the performance of the HTTP servers, not the performance of sensors.

The size of each generated response was 256 bytes, including both HTTP headers and content, in all tests. Such a small amount of data was sent in order to simulate responses only containing data, and not an entire web page. This is a common technique used to minimize server and client load, as well as network throughput. It is a suitable technique to use in embedded systems as it limits resource usage.

All tested servers did not support HTTP keep-alive and it was therefore disabled on all servers in order to improve the comparability between the test results for different servers. With the chosen httperf configuration, initiating a new TCP connection for every HTTP request, enabling HTTP keep-alive would probably have decreased the performance of the servers using keep-alive by sustaining connections server side that the client didn’t use. On the other hand, if the httperf configuration was changed to send several HTTP requests per TCP connection, then keep-alive might improve the performance of the servers supporting keep-alive, but it would not have been possible to use that kind of httperf configuration for testing the servers that didn’t support keep-alive. 5.3 Measurement techniques

The testing of each server consisted of test runs at different load levels on the HTTP server, starting from a very low load and ending well beyond the server’s overload point. For each test run httperf was configured to create a load with a constant TCP connection initiation rate, and that load was sustained for 180 seconds. A test duration of 180 seconds was chosen as the httperf manual version 0.9 [73] states that a test duration of at least 150 seconds is recommended to obtain standard deviations that are meaningful. This meant that the least amount of measurements performed during each test run was 180, when the load was set to its minimum value, i.e. one connection initiation per second. The maximum amount of measurements performed during a test run was 45 000.

The fact that the TCP connection initiation rate was constant made it possible to create overload in the server, as the load generating software didn’t wait for responses from the server before initiating new TCP connections. The server was considered to be overloaded when the average HTTP reply rate became lower than the TCP connection initiation rate. When overload was detected, more test runs were made with loads between the load where overload was detected and the tested load just below that. This was done in order to find the overload point and observe the behavior around the overload point. Knowing the overload behavior of a server used in embedded system can be very important in some applications. A server that for example causes the operating system to crash when it is overloaded should be avoided if high availability is important.

Httperf measures its own CPU usage and the results can be used to validate that other processes running on the client machine are not disturbing the tests [73]. The server was rebooted between every test run to reset caches and the server’s and operating system’s states in general. This was done in order to isolate the test runs from each other so that a

32 test run wasn’t affected by the previous one. This is important in particular when an error has occurred during a previous run.

The sar utility was used on the BeagleBoard-xM and STK1000 to measure both CPU utilization and memory usage on the servers. Sar was configured to make a measurement once every second and write the results to a file. A separate series of tests were made with STK1000 using the top utility in BusyBox instead of sar to measure CPU utilization and memory usage. The reason for this was that sar returned unexpected results.

CPU utilization and memory usage was not measured on the Midrange platform. The technique provided by FreeRTOS, the operating system used on the Midrange platform, to measure CPU utilization consists of measuring the percentage of time spent in the operating systems idle task. As the platform was used as a dedicated HTTP server, the server tasks were running 100% of the time, with no time spent in the idle task. Memory usage was not measured as the HTTP server didn’t allocate memory dynamically.

The HTTP servers’ nonvolatile memory utilization on the BeagleBoard-xM and STK1000 was measured with the BusyBox du utility. On the Midrange platform it was measured by compiling FreeRTOS both with and without the HTTP server and calculating the difference.

33 6. Test results This chapter presents the test results achieved with the methodology described in the previous chapter. The results for each hardware platform are presented separately. The CPU utilization for the load generating software, httperf, on the client machine was 99.7 percent, or higher, in every test.

6.1 BeagleBoard-xM Figure 6.1.1. shows how the average HTTP reply rate varied with the TCP connection initiation rate for the three servers tested on the BeagleBoard-xM. Both Lighttpd and Cherokee became overloaded at about 110 TCP connection initiations per second, while BusyBox httpd became overloaded at 125 TCP connection initiations per second. All three servers reached their maximum average HTTP reply rates at TCP connection initiation rates slightly above the rates were they became overloaded. The average HTTP reply rates then declined slightly when the TCP connection initiation rate was further increased.

Figure 6.1.1. Average HTTP reply rates. Higher is better.

34 Standard deviations for the HTTP reply rates can be seen in Figure 6.1.2. Both BusyBox httpd and Cherokee had stable HTTP reply rates, even under overload. Lighttpd had very stable HTTP reply rates for TCP connection initiation rates up to about 50 connection initiations per second, but for higher TCP connection initiation rates the HTTP reply rate for Lighttpd became unstable.

Figure 6.1.2. HTTP reply rates standard deviations. Lower is better.

35 The chart in Figure 6.1.3 shows the average durations for successful TCP connections. A connection was considered to be successful if both an HTTP request and an HTTP reply was successfully transmitted using the connection. Similar behavior was observed for all servers. The average durations for successful TCP connections were relatively low up until the point where the servers became overloaded. The durations increased heavily after that point, but the rate varied between the servers. BusyBox httpd increased to a level around 1000 milliseconds per connection, Cherokee to around 2300 and Lighttpd to around 3200.

Figure 6.1.3. Average durations for successful TCP connections. Lower is better.

36 The standard deviations for durations of successful TCP connections can be seen in Figure 6.1.4. BusyBox httpd and Cherokee had standard deviations close to zero up until the points where they became overloaded. At that point the standard deviations for both servers increased rapidly to about 800 milliseconds. The levels then further increased, but at a slower pace. Lighttpd ended up at the same levels as the two other servers, but the increase was almost linear and started at an early stage, way below the overload point.

Figure 6.1.4. Standard deviations for durations of successful TCP connections. Lower is better.

37 The network throughput, which displayed the same pattern as the average HTTP reply rate, can be seen in Figure 6.1.5. All servers leveled out at about 300 kilobits per second.

Figure 6.1.5. Network throughput.

38 All three servers had a CPU usage slightly below 4 percent when idling. Lighttpd and Cherokee had very similar CPU usage characteristics, as can be seen in Figure 6.1.6. Their increase in CPU usage was almost linear, from idle to 100 percent, which occurred at about 100 connection initiations per second. BusyBox httpd had also an almost linear increase, but reached 100 percent at about 125 connection initiations per second.

Figure 6.1.6. Average CPU usage. Lower is better.

39 All three servers used about 20 megabytes of memory at idle. BusyBox httpd showed the most efficient memory usage, as can be seen in Figure 6.1.7. It’s memory usage only increased slightly from idle up until 100 connection initiations per second, and then increased to about 25 megabytes. Cherokee behaved similarly but flattened out at about 31 megabytes. Lighttpd used significantly more memory than the other two servers. It’s memory usage increased to about 68 megabytes at 115 connection initiations per second and had a slight decrease after that level had been reached.

Figure 6.1.7 Maximum memory usage. Lower is better.

Table 6.1.1 shows the nonvolatile memory usage of the three servers. Lighttpd had the lowest usage at 164 kilobytes. BusyBox httpd used 612 kilobytes and Cherokee far more than the other with over 3000 kilobytes.

Table 6.1.1 Nonvolatile memory usage. BusyBox httpd Lighttpd Cherokee

Nonvolatile memory 612 164 3063 usage in kilobytes

The only type of error that was registered by httperf during the test was client timeout errors, that is, no response after five seconds. Some client timeout errors occurred in every test of Lighttpd, even when the server was not overloaded. The other two servers only had client timeouts when they were overloaded. None of the servers crashed under load and they were always responsive after the tests were finished.

40 6.2 STK1000 Figure 6.2.1 shows the average HTTP reply rates for the servers that were tested on the STK1000. Lighttpd and thttpd had very similar behavior. Lighttpd got overloaded at 9 connection initiations per second and thttpd at 7. The average HTTP reply rates then declined slowly and almost linear for both servers. BusyBox httpd reached overload at the much higher rate of 21 connection initiations per second. After that point the rate decreased steeply, but then temporarily recovered somewhat for slightly higher loads.

Figure 6.2.1. Average HTTP reply rates. Higher is better.

41 All three servers had low HTTP reply rate standard deviation until their overload points, as can be seen in Figure 6.2.2. The standard deviations then increased dramatically for all three servers when they reached overload. Maximum was about 5 replies per second for Lighttpd and thttpd and 12 for BusyBox httpd. The standard deviations then declined for higher loads.

Figure 6.2.2. HTTP reply rates standard deviations. Lower is better.

42 Figure 6.2.3 shows average durations for successful TCP connections. BusyBox httpd had the most stable average durations for successful TCP connections, starting at 31 milliseconds for one connection initiation per second and slowly increasing to 58 milliseconds at 30 connection initiations per second. Lighttpd was stable at 34 to 39 milliseconds up to 20 connection initiations per second, and then the durations started to increase strongly. Lighttpd has no measurement for 30 connection initiations per second as there were no successful TCP connections at that rate. Thttpd was stable at 36 to 46 milliseconds up to 15 connection initiations per second, and then the durations started to increase.

Figure 6.2.3. Average durations for successful TCP connections. Lower is better.

43 BusyBox httpd had the lowest standard deviations for durations of successful TCP connections, as can be seen in Figure 6.2.4. It started at 5 milliseconds and slowly increased to 54 milliseconds as the load was increased. Lighttpd was close to BusyBox httpd up until 15 connection initiations per second, then the standard deviation increased almost linearly from 20 to 141 when the connection initiation rate was increased from 15 to 25. Thttpd was the server with the highest standard deviations. It behaved similarly to the other servers up until 10 connection initiations per second, then the standard deviations started to increase rapidly, ending at over 1200 milliseconds. At 25 connection initiations per second the standard deviation decreased temporarily.

Figure 6.2.4. Standard deviations for durations of successful TCP connections. Lower is better.

44 The network throughput, which displayed almost the same pattern as the average HTTP reply rates, can be seen in Figure 6.2.5. The only difference compared to the average HTTP reply rates was an increase in network throughput after 20 connection initiations per second for thttpd.

Figure 6.2.5. Network throughput.

45 The average CPU usage increased almost linearly for all three servers and can be seen in Figure 6.2.6. BusyBox httpd reached its maximum of 78 percent at 21 connection initiations per second, Lighttpd reached 48 at 9 and thttpd reached 41 at 7.

Figure 6.2.6. Average CPU usage. Lower is better.

46 The sar measurements showed an almost constant memory usage of about 4.4 megabytes for all three servers, as can be seen in Figure 6.2.7. The measurements with top showed the same behavior.

BusyBox httpd ran out of memory after 34 seconds at 22 connection initiations per second, 76 seconds at 25 and 23 seconds at 30. Out of memory triggered the so called Out of Memory Killer program (OOM Killer) which started to kill server processes. When the server load stopped, the HTTP server was unresponsive and it was not possible to open a terminal via the boards .

Lighttpd ran out of memory after 102 seconds at 10 connection initiations per second, 22 seconds at 15, 2 seconds at 20 and immediately at 25 and 30. Out of memory triggered the OOM Killer which started to kill server processes. The server was responsive when the server load stopped.

Thttpd ran out of memory after 143 seconds at 8 connection initiations per second, 9 seconds at 10 and immediately at 15, 20, 25 and 30. Out of memory triggered the OOM Killer which started to kill server processes. The server was responsive when the server load stopped, except for the test with 25 connection initiations per second which made the HTTP server unresponsive and made it impossible to open a terminal via the boards serial port.

Figure 6.2.7 Maximum memory usage. Lower is better.

47 Table 6.1.1 shows the nonvolatile memory usage of the three servers.

Table 6.1.1. Nonvolatile memory usage. BusyBox httpd Lighttpd thttpd

Nonvolatile memory 516 488 68 usage in kilobytes

For BusyBox httpd the httperf tool reported no errors until the server ran out of memory. Both client timeouts and connection resets by the server was reported for loads that made the server run out of memory.

For Lighttpd the httperf tool reported client timeout errors in every test. Refused connections and connection resets by the server was reported for loads that made the server run out of memory.

For thttpd the httperf tool reported no errors until the server ran out of memory. Client timeouts, refused connections and connection resets by the server was reported for loads that made the server run out of memory.

6.3 Midrange Figure 6.3.1 shows the average HTTP reply rates for SSO. The server got overloaded slightly above 155 connection initiations per second and had a steep decrease afterward.

Figure 6.3.1. Average HTTP reply rates. Higher is better.

48 The HTTP reply rates standard deviation was zero until the overload point of 155 connection initiations per second was reached. It then peaked to 76 before declining again. See Figure 6.3.2.

Figure 6.3.2. HTTP reply rates standard deviations. Lower is better.

49 The average duration for successful TCP connections was stable at 5 milliseconds up until the overload point, see Figure 6.3.3. It increased rapidly after that point, ending up at approximately 670 milliseconds.

Figure 6.3.3. Average durations for successful TCP connections. Lower is better.

The standard deviations for durations of successful TCP connections can be seen in Figure 6.3.4 and showed the same pattern as the average duration for successful TCP connections.

Figure 6.3.4. Standard deviations for durations of successful TCP connections. Lower is better.

50 The network throughput displayed the same pattern as the average HTTP reply rate and can be seen in Figure 6.3.5.

Figure 6.3.5. Network throughput.

The nonvolatile memory utilization was 31 kilobytes. No errors was reported by httperf during measurements up until the overload point, but after that point both client timeouts and connection resets by the server occurred. The server never crashed during the tests and was always responsive after the load generation software was turned off.

51 7. Analysis The HTTP server software were analysed for each hardware platform and the analyses are presented in chapter 7.1. As a result of this analysis one server software per platform was selected to be part of a platform performance comparison in chapter 7.2. The selection criteria that was used was highest overload point, lowest connection durations, throughput stability, connection duration stability and least number of server crashes. These criteria were chosen as they reflect the overall performance of the servers, as well as their reliability.

The httperf manual [73] states that httperfs results are reliable if the total CPU utilization on the client machine, as reported by httperf, is close to 100 percent. This means that the results from this study were reliable, as the total CPU utilization was 99.7 percent or higher in every test. In other words, no other processes where interfering with the measurements and load generation on the client machine. The fact that the overload point was reached for all the tested servers indicate that the client machine was not a bottleneck. None of the servers were able to saturate the network, hence the network was not a bottleneck either.

The abbreviation ci/s is used in this chapter for connection initiations per second.

7.1 HTTP server software comparisons per platform In this chapter, the test results for each platform are analysed separately.

7.1.1 BeagleBoard-xM Table 7.1.1.1 shows a comparison between the HTTP server software that were tested on the Beagleboard-xM platform. All the server software showed the same kind of performance degradation during overload. None of the servers crashed during the tests.

The results show that BusyBox httpd had the highest overload point, lowest connection durations overall and stable reply rates and connection durations. Connection errors only occurred during overload. BusyBox httpd was therefore selected to be part of the platform comparison in chapter 7.2.

Lighttpd showed unstable behavior for both reply rate and connection duration. It had client timeouts in every test. Cherokee had stable reply rates and connection durations, and connection errors only during overload.

For all server software, the overload point was reached when the average CPU usage reached 100 percent. This means that all server software were CPU bound.

52 Table 7.1.1.1. Comparison between the HTTP server software tested on the Beagleboard-xM platform. BusyBox httpd Lighttpd Cherokee

Overload point 125 100 110 [TCP ci/s]

Overload throughput Slightly declining Slightly declining Slightly declining

Throughput stability Stable Unstable over 50 Stable without overload TCP ci/s

Connection duration at 154 215 153 the overload point [ms]

Max measured 1148 3414 2479 connection duration [ms]

Connection duration Stable Unstable Stable stability without overload

Saturates the network No No No (100 Mbit/s)

CPU bound Yes Yes Yes

Memory bound No No No

Max memory usage [MB] 25 68 31

Nonvolatile memory 0.61 0.16 3.06 usage [MB]

Communication errors None Client timeouts None without overload

Communication errors Client timeouts Client timeouts Client timeouts during overload

Server crashes None None None

Feature set Limited Extensive Extensive

53 7.1.2 STK1000 Table 7.1.2.1 shows a comparison between the HTTP server software that were tested on the STK1000 platform. All the server software had stable throughputs and connection durations, but significant differences could be seen in overload points and connection durations. All the server software also showed the same kind of performance degradation during overload.

As described in chapter 6.2, all server software ran out of memory when overloaded. The maximum memory usage listed in Table 7.1.2.1 is therefore the platforms total amount of memory. Both thttpd and BusyBox httpd became unresponsive after some test runs in which the server ran out of memory, and it also became impossible to open a terminal via the boards serial port. This kind of crash never occurred with Lighttpd, which always was responsive when the server load was turned off.

The results show that BusyBox httpd had the highest overload point, lowest connection durations overall and stable reply rates and connection durations. BusyBox httpd had the most stable average durations for successful TCP connections, starting at 31 milliseconds for one connection initiation per second and slowly increasing to 58 milliseconds at 30 connection initiations per second. This small increase during overload was far superior to all the other server software tested, regardless of platform. Connection errors only occurred during overload. Although server crashes occurred when out of memory, BusyBox httpd was still selected to be part of the platform comparison in chapter 7.2, as its overload point and overall connection durations were superior to the other servers.

The tests performed on the STK1000 platform shows the importance of selecting appropriate HTTP server software, as the performance may differ significantly. For example, BusyBox httpds overload point was three times higher than thttpds, 21 versus 7 TCP ci/s. Max measured connection duration was 58 ms for BusyBox httpd and 1646 ms for thttpd. These performance differences are in line with the results from the study performed by Pariag et al. [67], that was discussed in chapter 3.2.1.

54 Table 7.1.2.1. Comparison between the HTTP server software tested on the STK1000 platform. BusyBox httpd Lighttpd thttpd

Overload point 21 9 7 [TCP ci/s]

Overload throughput Steeply decreasing Steeply decreasing Steeply decreasing

Throughput stability Stable Stable Stable without overload

Connection duration at 41 38 48 the overload point [ms]

Max measured 58 810 1646 connection duration [ms]

Connection duration Stable Stable Stable stability without overload

Saturates the network No No No (100 Mbit/s)

CPU bound No No No

Memory bound Yes Yes Yes

Max memory usage [MB] 8 8 8

Nonvolatile memory 0.52 0.49 0.07 usage [MB]

Communication errors None Client timeouts None without overload

Communication errors Client timeouts, Client timeouts, Client timeouts, during overload connection resets connection resets, connection resets, refused refused connections connections

Server crashes Always after None During one test run running out of when running out of memory memory

Feature set Limited Extensive Medium

55 7.1.3 Midrange Table 7.1.3.1 shows a summary of the test results for the Midrange platform. SSO showed stable behavior in both throughput and connection duration before the overload point, but had drastic performance degradation during overload. Communication errors only occurred during overload and no server crashes occurred. Since SSO was the only server software that was tested on the Midrange platform, it was used for the performance comparison in chapter 7.2.

Table 7.1.3.1. Summary of the test results for the Midrange platform. SSO

Overload point 155 [TCP ci/s]

Overload throughput Steeply decreasing

Throughput stability Stable without overload

Connection duration at 5 the overload point [ms]

Max measured 668 connection duration [ms]

Connection duration Stable stability without overload

Saturates the network No (100 Mbit/s)

CPU bound N/A

Memory bound N/A

Max memory usage [MB] N/A

Nonvolatile memory 0.03 usage [MB]

Communication errors None without overload

Communication errors Client timeouts, during overload connection resets

Server crashes None

Feature set Very limited

56 7.2 Platform comparison Table 7.2.1 shows a comparison of the HTTP server software that were selected from each hardware platform. A few similarities can be seen between the selected server software. All of the best performing servers, according to the chosen criteria, had limited feature sets. BusyBox httpd was the best performing server software on both platforms where it was tested.

The only platform that was memory bound was the STK1000, and the STK1000 was also the only platform where HTTP server software crashed during testing. The crashes always occurred when the server ran out of memory, which triggered the OOM Killer, which started to kill server processes. As mentioned in chapter 2, some servers, such as Appweb, have settings that can be used to specify a top limit for the servers’ memory usage. Using such a server would of course make it possible to prevent the triggering of the OOM Killer because of server load, which in turn could prevent the server from crashing.

57 Table 7.2.1. Comparison of the selected HTTP server software from each platform. BeagleBoard-xM STK1000 Midrange BusyBox httpd BusyBox httpd SSO

Overload point 125 21 155 [TCP ci/s]

Overload throughput Slightly declining Steeply decreasing Steeply decreasing

Throughput stability Stable Stable Stable without overload

Connection duration at 154 41 5 the overload point [ms]

Max measured 1148 58 668 connection duration [ms]

Connection duration Stable Stable Stable stability without overload

Saturates the network No No No (100 Mbit/s)

CPU bound Yes No N/A

Memory bound No Yes N/A

Max memory usage [MB] 25 8 N/A

Nonvolatile memory 0.61 0.52 0.03 usage [MB]

Communication errors None None None without overload

Communication errors Client timeouts Client timeouts, Client timeouts, during overload connection resets connection resets

Server crashes None Always after None running out of memory

Feature set Limited Limited Very limited

58 Figure 7.2.1 shows a comparison of the average HTTP reply rates for the server software selected for each platform. The highest overload point was measured on the Midrange platform, at 155 TCP ci/s. However, the throughput steeply decreased during overload. The BeagleBoard-xM had a somewhat lower overload point, 125 TCP ci/s, but showed a much smaller decrease in throughput after the overload point. This is important to consider if you in an application is running the risk of getting beyond the overload point. Figure 7.2.1 also clearly shows that the best performing server software on the STK1000 platform was far behind the other platforms in this aspect of performance.

All server software on all platforms showed performance degradation during overload that was in line with expectations, based on findings from experiments performed by Voigt [69], Banga and Druschel [70], and Titchkosky et al. [72], that was discussed in chapter 3.2.1.

Figure 7.2.1. Average HTTP reply rates for the selected server software for each platform. Higher is better.

59 Figure 7.2.2 shows a comparison of the HTTP reply rates standard deviations for the server software selected for each platform. All three servers had stable throughput up until their overload points, but the BeagleBoard-xM had significantly more stable throughput than the other two servers during overload.

Figure 7.2.2. HTTP reply rates standard deviations for the selected server software for each platform. Lower is better.

60 Figure 7.2.3 shows a comparison of the average durations for successful TCP connections for the server software selected for each platform. Before the overload point, STK1000 had longer connection durations than the other two servers, but it was less affected during overload. Where STK1000 had a slight increase in connection durations after the overload point, the others had steep increases. A possible explanation to why the STK1000 was less affected during overload is that it was memory bound and not CPU bound. Hence, the requests that the server had enough memory to process, could be processed effectively.

Figure 7.2.3. Average durations for successful TCP connections for the selected server software for each platform. Lower is better.

61 Figure 7.2.4 shows a comparison of the standard deviations for durations of successful TCP connections for the server software selected for each platform. Before the overload point, STK1000 had slightly less stable connection durations than the other two servers, but it was less affected during overload. Where STK1000 had a slight increase in connection duration stability after the overload point, the others had steep increases.

Figure 7.2.4. Standard deviations for durations of successful TCP connections for the selected server software for each platform. Lower is better.

62 8. Conclusions BeagleBoard-xM with BusyBox httpd had the best overall performance when running the test application. It had a high overload point, low connection durations when not overloaded, and a superior overload behavior. However, Midrange with SSO performed better when not overloaded. STK1000 was far behind the other two platforms in terms of performance.

On the BeagleBoard-xM platform with BusyBox httpd it was possible to achieve an overload point of 125 TCP connection initiations per second and average TCP connection durations less than 10 milliseconds for lower loads. The same figures for Midrange with SSO was 155 TCP connection initiations per second and 6 milliseconds. STK1000 with BusyBox httpd managed 21 TCP connection initiations per second and 35 milliseconds. See Figure 7.2.1 and Figure 7.2.3 for more information.

The test results showed that the performance differed greatly between HTTP server software. This was particularly apparent on the STK1000 platform. The fact that the least powerful hardware platform, Midrange, had the highest overload point and shortest connection durations further emphasises the importance of the software.

Generally, HTTP server software with limited feature sets performed best. It is therefore important to consider that a HTTP server software with a larger feature set may be less efficient. Always be thorough when deciding which features that should be used, to avoid excessive features.

The overload behavior differed greatly between the servers. Hence, the overload behavior is important to consider for applications that might experience loads above the server’s overload point. Significant differences could be seen in throughput, throughput stability, connection durations and connection duration stability during overload. BeagleBoard-xM with BusyBox httpd was for example able to sustain an average HTTP response rate during overload that was close to its maximum average rate, while both STK1000 with BusyBox httpd and Midrange with SSO showed steep decreases.

The efficiency of system resource usage - CPU, volatile memory and nonvolatile memory - differed by a considerable amount between the tested HTTP server software. The largest differences was measured on the BeagleBoard-xM. As an example, at 50 TCP connection initiations per seconds BusyBox httpd used about 36 percent less CPU than the other two server software and about 49 percent less volatile memory than Lighttpd. BusyBox httpd used 612 kilobytes of nonvolatile memory while Lighttpd used 164 and Cherokee 3063.

The experience from STK1000 with HTTP server software crashes due to insufficient memory show the importance of being able to limit the HTTP server software’s resource usage, in order to ensure availability during high loads.

63 9. Discussion The purpose of this thesis was to determine what was possible at the time in terms of HTTP server performance on selected hardware platforms for embedded systems, regarding load limits, performance characteristics and system resource usage. It can be concluded that the chosen methodology and test implementation was successful in achieving interesting and useful results worth considering when selecting a hardware platform for an embedded system that should contain a HTTP server. Furthermore, this thesis adds knowledge to a scarcely researched area, as no similar studies were found during the literature study.

Those who are about to select a platform can draw valuable general conclusions from this thesis, for example regarding the importance of software and differences in performance characteristics. However, the specific load limits may for example be different for other web applications and/or load characteristics.

9.1 Future work There are several possibilities for future work related to the subject of this thesis. Below are some suggestions.

9.1.1 Complementary performance aspects Complementary tests, using the same methodology as in this thesis, could expand the understanding of performance on the different platforms. One could for example perform test runs with longer durations, several days or months, to be able to determine the stability of the different platforms under high load. Another interesting area to explore would be different load characteristics, such as burstiness (discussed in chapter 3.2.4). Finally, the performance impact of different kinds of web applications could be studied, as well as the effects of using encrypted traffic in the form of TLS.

9.1.2 Software tuning and alternatives One of the conclusions of this thesis was the importance of software. It would therefore be interesting to study how big performance gains that can be made by tuning of operating system and HTTP server software settings. On top of this, one could also test other methodologies for generation of web pages with dynamic content, such as FastCGI. The significant performance differences between the tested server software indicates potential benefits of running tests with other server software and operating systems for each platform. Finally, this thesis focused solely on servers using the HTTP protocol, however, the relatively new WebSocket protocol can potentially be more efficient in some use cases, and is therefore an interesting technology to research.

64 10. References

[1] IEEE 802.3 working group IEEE Std 802.3™-2008 http :// standards . ieee . org / getieee 802/ download /802.3-2008_ section 1. pdf http :// standards . ieee . org / getieee 802/ download /802.3-2008_ section 2. pdf http :// standards . ieee . org / getieee 802/ download /802.3-2008_ section 3. pdf http :// standards . ieee . org / getieee 802/ download /802.3-2008_ section 4. pdf http :// standards . ieee . org / getieee 802/ download /802.3-2008_ section 5. pdf Retrieved 2012-01-25.

[2] IEEE 802.3 working group IEEE Std 802.3ba™-2010 http :// standards . ieee . org / getieee 802/ download /802.3 ba -2010. pdf Retrieved 2012-01-26.

[3] IEEE 802.3 working group IEEE Std 802.3at™-2009 http :// standards . ieee . org / getieee 802/ download /802.3 at -2009. pdf Retrieved 2012-01-25.

[4] R Fielding, et al. Hypertext Transfer Protocol -- HTTP/1.1 http :// tools . ietf . org / / rfc 2616 Retrieved 2012-01-25.

[5] L Doolittle, J Nelson Boa Webserver Documentation http :// www . boa . org / documentation / Retrieved 2012-03-07.

[6] J Nelson An Overview of the Boa Web Server http :// www . osnews . com / story /2217/ An _ Overview _ of _ the _ Boa _ Web _ Server Retrieved 2012-03-07.

[7] L Doolittle, J Nelson Boa Webserver News http :// www . boa . org / news .html Retrieved 2012-03-07.

[8] Free Software Foundation, Inc. GNU GENERAL PUBLIC LICENSE Version 1, February 1989 http :// www . gnu . org / licenses / old - licenses / gpl -1.0- standalone .html Retrieved 2012-03-07.

65 [9] Robinson, K Coar The Common Gateway Interface (CGI) Version 1.1 http :// tools . ietf . org / html / rfc 3875 Retrieved 2012-03-07.

[10] T Dierks, E Rescorla The Security (TLS) Protocol Version 1.2 http :// tools . ietf . org / html / rfc 5246 Retrieved 2012-03-07.

[11] uClibc http :// www . . org / Retrieved 2012-03-07.

[12] Real Time Logic Barracuda Embedded Web Server http :// barracudaserver . com / Retrieved 2012-03-08.

[13] Real Time Logic Barracuda Embedded Web Server Manual http :// barracudaserver . com / ba / doc / Retrieved 2012-03-08.

[14] yaSSL yaSSL Embedded Web Server http :// www . yassl . com / yaSSL / Products - yassl - embedded - web - server .html Retrieved 2012-03-09.

[15] yaSSL yaSSL Embedded Web Server User Manual http :// www . yassl . com / documentation / yasslEWS - Manual .pdf Retrieved 2012-03-09.

[16] Mongoose http :// code . . com / p / mongoose / Retrieved 2012-03-09.

[17] yaSSL CyaSSL Embedded SSL Library http :// www . yassl . com / yaSSL / Products - cyassl .html Retrieved 2012-03-09.

[18] Free Software Foundation, Inc. GNU GENERAL PUBLIC LICENSE Version 2, June 1991 http :// www . gnu . org / licenses / old - licenses / gpl -2.0- standalone .html Retrieved 2012-03-09.

66 [19] Real Time Logic Barracuda Embedded Web Server SDK Price List http :// barracudaserver . com / PriceList .html Retrieved 2012-03-09.

[20] KoanLogic KLone http :// www . koanlogic . com / klone / Retrieved 2012-03-09.

[21] The OpenSSL Project OpenSSL http :// www . . org / Retrieved 2012-03-09.

[22] Unicoi Systems, Inc. Fusion EmbeddedTM HTTPS http :// www . unicoi . com / product _ briefs / https .pdf Retrieved 2012-03-12.

[23] BusyBox - The Swiss Army Knife of Embedded Linux http :// . net / downloads / BusyBox .html Retrieved 2012-03-12.

[24] BusyBox HTTP (httpd) http :// . openwrt . org / doc / howto / http .httpd Retrieved 2012-03-12.

[25] Free Software Foundation, Inc. Coreutils - GNU core utilities http :// www . gnu . org / software / coreutils / Retrieved 2012-03-12.

[26] Embedthis Software Appweb http :// appwebserver . org / Retrieved 2012-03-14.

[27] Embedthis Software Sandboxing http :// www . appwebserver . org / products / appweb / secure - web - servers . html # sandboxing Retrieved 2012-03-14.

[28] Embedthis Software Sandbox Directives http :// www . appwebserver . org / products / appweb / doc / guide / appweb / users / dir / sandbox .html Retrieved 2012-03-14.

67 [29] Embedthis Software Open Source License Agreement http :// www . appwebserver . org / products / appweb / doc -4/ licenses / gpl .html Retrieved 2012-03-14.

[30] Embedthis Software Licensing http :// www . appwebserver . org / products / appweb / doc -4/ licenses / index .html Retrieved 2012-03-14.

[31] Embedthis Software Embedthis Appweb™ Design Goals http :// appwebserver . org / products / appweb / doc -4/ guide / appweb / goals .html Retrieved 2012-03-15.

[32] Embedthis Software Appweb Architecture http :// appwebserver . org / products / appweb / architecture .html Retrieved 2012-03-15.

[33] Embedthis Software Appweb™ — for Dynamic Web Applications http :// appwebserver . org / products / appweb / embedded - web - server .html Retrieved 2012-03-15.

[34] Embedthis Software Appweb™ FAQ http :// appwebserver . org / support / faq .html Retrieved 2012-03-15.

[35] Embedthis Software Using Ejscript http :// appwebserver . org / products / appweb / doc -4/ guide / appweb / users / ejs .html Retrieved 2012-03-16.

[36] Embedthis Software Using PHP http :// appwebserver . org / products / appweb / doc -4/ guide / appweb / users / .html Retrieved 2012-03-16.

[37] Embedthis Software Embedded Gateway Interface™ http :// appwebserver . org / products / appweb / doc -2/ guide / appweb / / egi .html Retrieved 2012-03-16.

68 [38] Embedthis Software Using CGI http :// appwebserver . org / products / appweb / doc -4/ guide / appweb / users / cgi .html Retrieved 2012-03-16.

[39] Embedthis Software Embedded Server Pages http :// appwebserver . org / products / appweb / doc -4/ guide / esp / users / using .html Retrieved 2012-03-19.

[40] Embedthis Software Configuring SSL http :// appwebserver . org / products / appweb / doc -4/ guide / appweb / users / ssl .html Retrieved 2012-03-19.

[41] AuthenTec, Inc. PeerSec Networks MatrixSSL™ http :// www . peersec . com / matrixssl .html Retrieved 2012-03-19.

[42] Embedthis Software Embedthis Appweb™ Features http :// appwebserver . org / products / appweb / features .html Retrieved 2012-03-19.

[43] The Cherokee Project Why Cherokee? http :// www . cherokee - project . com / doc / basics _ why _ cherokee .html Retrieved 2012-03-19.

[44] The Cherokee Project Modules: Handlers http :// www . cherokee - project . com / doc / modules _ handlers .html Retrieved 2012-03-20.

[45] The Cherokee Project Requirements http :// www . cherokee - project . com / doc / basics _ requirements .html Retrieved 2012-03-19.

[46] The Cherokee Project Advanced Configuration http :// www . cherokee - project . com / doc / config _ advanced .html Retrieved 2012-03-20.

[47] The Cherokee Project Other: Frequently Asked Questions http :// www . cherokee - project . com / doc / other _ faq .html Retrieved 2012-03-19.

69 [48] Octality http :// www . octality . com / Retrieved 2012-03-21.

[49] M Brown FastCGI Specification http :// www . . com / devkit / doc / fcgi - spec .html Retrieved 2012-03-21.

[50] N Schemenauer SCGI: A Simple Common Gateway Interface alternative http :// www . python . ca / scgi / protocol .txt Retrieved 2012-03-21.

[51] uWSGI http :// projects . unbit . it / / Retrieved 2012-03-21.

[52] The Cherokee Project Handler: Server Side Includes http :// www . cherokee - project . com / doc / modules _ handlers _ ssi .html Retrieved 2012-03-21.

[53] yaSSL yaSSL Embedded Web Server, Features Overview http :// www . yassl . com / yaSSL / Docs - yassl - embedded - web - server - Features .html Retrieved 2012-03-21.

[54] ACME Laboratories thttpd - tiny/turbo/throttling HTTP server http :// www . acme . com / software / thttpd / Retrieved 2012-03-22.

[55] J Poskanzer thttpd man page http :// www . acme . com / software / thttpd / thttpd _ man .html Retrieved 2012-03-22.

[56] J Poskanzer ssi man page http :// www . acme . com / software / thttpd / ssi _ man .html Retrieved 2012-03-22.

[57] ACME Laboratories Configuration Options http :// www . acme . com / software / thttpd / options .html Retrieved 2012-03-22.

70 [58] ACME Laboratories Web Server Comparisons http :// www . acme . com / software / thttpd / benchmarks .html Retrieved 2012-03-22.

[59] Hewlett-Packard Development Company, L.P. httperf http :// www . hpl . hp . com / research / linux / httperf / Retrieved 2013-03-07.

[60] The Apache Software Foundation ab - Apache HTTP server benchmarking tool http :// httpd . apache . org / docs /2.4/ programs / ab .html Retrieved 2013-03-07.

[61] The Apache Software Foundation Apache JMeter™ http :// jmeter . apache . org / Retrieved 2013-03-07.

[62] Nicolas Niclausse Tsung http :// tsung . erlang - projects . org / Retrieved 2013-03-07.

[63] weighttp http :// redmine . lighttpd . net / projects / weighttp /wiki Retrieved 2013-03-07.

[64] G Banga, P Druschel Measuring the capacity of a web server In Proceedings of the USENIX Symposium on Internet Technologies and Systems Pages 61-71, Monterey, California, USA, December 1997

[65] J Midgley The Linux HTTP Benchmarking HOWTO http :// www . xenoclast . org / doc / / HTTP - benchmarking - HOWTO / HTTP -bench marking - HOWTO .html Published 2001-07-06, retrieved 2012-04-13.

[66] Web Performance, Inc. Load Tester http :// www . webperformance . com / Retrieved 2013-03-07.

71 [67] D Pariag, et al. Comparing the Performance of Web Server Architectures In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 Pages 231-243, Lisbon, Portugal, 2007

[68] C Coarfa, P Druschel, D Wallach Performance analysis of TLS Web servers In ACM Transactions on Computer Systems, Vol. 24, No. 1 Pages 39-69, ACM, New York, NY, USA, February 2006

[69] T Voigt Overload Behaviour and Protection of Event-driven Web Servers In Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing Pages 147-157, Springer-Verlag, London, UK, UK, 2002

[70] G Banga, P Druschel Measuring the capacity of a Web server under realistic loads In Issue 1, 2, Pages 69-83, Springer Netherlands, 1999-06-16

[71] Paessler AG Webserver Stress Tool http :// www . paessler . com /webstress Retrieved 2013-03-07.

[72] L Titchkosky, M Arlitt, C Williamson A performance comparison of dynamic Web technologies In ACM SIGMETRICS Performance Evaluation Review Issue 3, Volume 31, Pages 2-11, ACM, New York, NY, USA, December 2003

[73] Hewlett-Packard Development Company, L.P. httperf Manual Page - version 0.9 http :// www . hpl . hp . com / research / linux / httperf / httperf - man -0.9. txt Retrieved 2012-04-17.

[74] ARM Ltd. http :// www . arm . com / Retrieved 2012-12-13.

[75] Organization, Inc. The Linux Kernel Archives http :// www . kernel . org / Retrieved 2012-12-13.

72 [76] Alessandro Alinone Comet and http :// cometdaily . com /2007/10/19/ comet - and - push - technology / Published 2007-10-19, retrieved 2012-12-13.

[77] Syntronic AB http :// www . syntronic .se Retrieved 2012-12-13.

[78] Atmel Corporation Atmel AVR 8-bit and 32-bit microcontroller http :// www . atmel . com / products / microcontrollers / avr / default .aspx Retrieved 2013-01-09.

[79] Intel Corporation Intel® Atom™ Processor http :// www . intel . com / content / www / us / en / processors / atom / atom - processor .html Retrieved 2013-01-09.

[80] ARM Ltd. Cortex-M3 Processor http :// www . arm . com / products / processors / cortex - m / cortex - m 3. php Retrieved 2013-01-09.

[81] ARM Ltd. Cortex-M4 Processor http :// www . arm . com / products / processors / cortex - m / cortex - m 4- processor .php Retrieved 2013-01-09.

[82] ARM Ltd. Cortex-A8 Processor http :// www . arm . com / products / processors / cortex - a / cortex - a 8. php Retrieved 2013-01-09.

[83] ARM Ltd. Cortex-A9 Processor http :// www . arm . com / products / processors / cortex - a / cortex - a 9. php Retrieved 2013-01-09.

[84] Arduino http :// www . arduino . cc / Retrieved 2013-01-09.

[85] I Fette, A Melnikov The WebSocket Protocol http :// tools . ietf . org / html / rfc 6455 Retrieved 2013-01-09.

73 [86] R Berjon, et al. HTML5 http :// www . w 3. org / TR /2012/ CR - html 5-20121217/ Retrieved 2013-01-09.

[87] Information Sciences Institute, University of Southern California TRANSMISSION CONTROL PROTOCOL http :// tools . ietf . org / html / rfc 793 Retrieved 2013-01-09.

[88] Lighttpd http :// www . lighttpd . net / Retrieved 2013-03-04.

[89] Lighttpd Revised BSD license http :// www . lighttpd . net / assets /COPYING Retrieved 2013-03-04.

[90] Lighttpd Wiki - Docs Configuration File Options http :// redmine . lighttpd . net / projects / lighttpd / wiki / Docs _ ConfigurationOptions Retrieved 2013-03-04.

[91] A Dunkels, L Woestenberg lwIP - README http :// git . savannah . gnu . org / cgit / lwip . git / plain /README Retrieved 2013-03-05.

[92] A Dunkels sys_arch interface for lwIP 0.6++ http :// git . savannah . gnu . org / cgit / lwip . git / plain / doc / sys _ arch .txt Retrieved 2013-03-05.

[93] lwIP http :// savannah . nongnu . org / projects / lwip / Retrieved 2013-03-05.

[94] lwIP Basic HTTP server demonstration http :// git . savannah . gnu . org / cgit / lwip / lwip - contrib . git / tree / apps /httpserver Retrieved 2013-03-05.

[95] lwIP HTTP server with CGI and SSI http :// git . savannah . gnu . org / cgit / lwip / lwip - contrib . git / tree / apps / httpserver _ raw Retrieved 2013-03-05.

74 [96] S Oliveri STR91x Ethernet demo http :// developers . stf 12. net / Ethernet -demo Retrieved 2013-03-05.

[97] R Iakobashvili, M Moser curl-loader http :// curl - loader . sourceforge . net / Retrieved 2013-03-07.

[98] OpenSTA http :// . org / Retrieved 2013-03-07.

[99] Ixia IxLoad http :// www . ixiacom . com / products / network _ test / applications / ixload / index .php Retrieved 2013-03-07.

[100] Spirent Communications Spirent Avalanche http :// www . spirent . com / Ethernet _ Testing / Software /Avalanche Retrieved 2013-03-07.

[101] BeagleBoard.org BeagleBoard-xM http :// beagleboard . org / Products / BeagleBoard -xM Retrieved 2013-06-14.

[102] Atmel Corporation STK1000 http :// www . atmel . com / tools / MATURESTK 1000. aspx Retrieved 2013-06-14.

75