Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C
Comparison of Dynamic Web Content Processing Language Performance Under a LAMP Architecture
Musa Jafar [email protected]
Russell Anderson [email protected]
Amjad Abdullat [email protected]
CIS Department, West Texas A&M University Canyon, TX 79018
Abstract
LAMP is an open source, web-based application solution stack. It is comprised of (1) an oper- ating system platform running Linux, (2) an Apache web server, (3) a MySQL database man- agement system, and (4) a Dynamic Web Content Processor (tightly coupled with Apache) that is a combination of one or more of Perl, Python and PHP scripting languages paired with their corresponding MySQL Database Interface Module. In this paper, we compare various performance measures of Perl, Python and PHP separately without a database interface and in conjunction with their MySQL Database Interface Modules within a LAMP framework. We per- formed our tests under two separate Linux, Apache and Dynamic Web Content environments: an SE Linux environment and a Redhat Enterprise Linux environment. A single MySQL data- base management system that resided on a separate Redhat Linux box served both environ- ments. We used a hardware appliance framework for our test configuration, generation and data gathering. An appliance framework is repeatable and easily configurable. It allows a performance engineer to focus effort on the design, configuration and monitoring of tests, and the analysis of test results. In all cases, whether database connectivity was involved or not, PHP outperformed Perl and Python. We also present the implementation of a mechanism to handle the propagation of database engine status-codes to the web-client, this is important when automatic client-based testing is performed, because the HTTP server is incapable of automatically propagating third-tier applications status-codes to the HTTP client.
Keywords : LAMP solution stack, Web Applications Development, Perl, Python, PHP, MySQL, APACHE, Linux, DBI, Dynamic Web Content Processor, HTTP 1.1 status-code.
refers to the non-proprietary, open source, 1. INTRODUCTION web development, deployment, and produc- The term LAMP, originally formalized by tion platform that is comprised of individual Dougherty (2001) of O'Reilly Media, Inc., open source components. LAMP uses Linux
Proc CONISAR 2008, v1 (Phoenix): §2732 (refereed) c 2008 EDSIG, page 1 Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C for the operating system, Apache for the performance of PHP and C under LAMP, web server, MySQL for database manage- WAMP and WIMP architectures – replacing ment, and a combination of Perl, Python or Linux with Windows and Apache with IIS. PHP as language(s) to generate dynamic Although the LAMP architecture is prevailing content on the server. More recently, Ruby as a mainstream web-based architecture, was added to the platform. Lately, Some unfortunately, there does not appear to be a deployments replaced Apache with lighthttpd complete study that addresses the issues from Open Source or IIS from Microsoft. related to a pure LAMP architecture. The Depending on the components replaced, the closest are the studies by Gousios (2002), platform is also known as WAMP when Mi- Cecchet (2003) and Titchkosky (2003). crosoft Windows replaces Linux or WI MP when Microsoft Windows replaces Linux and In this paper, we perform a complete study IIS server replaces Apache. Doyle (2008) comparing the performance of Perl, Python provided more information on the evolution and PHP separately and in conjunction with of the various dynamic web content proces- their database connectors under LAMP archi- sors framework and the various technologies tecture using two separate Linux environ- for web applications development. ments (SE Linux and Redhat Enterprise Li- nux). The two environments were served by Pedersen (2004), Walberg (2007) and Me- a third Linux environment hosting the back- nascé (2002) indicated that to measure a end MySQL database. The infrastructure web application’s performance, there are that we used to emulate a web-client and to many factors to consider that are not neces- configure and generate tests was a hardware sarily independent. It requires the fine tun- appliance-based framework. It is fundamen- ing and optimization of bandwidth, tally different from the infrastructure used processes, memory management, CPU by the authors just mentioned. The frame- usage, disk usage, session management, work allowed us to focus on the task where granular configurations across the board, a performance engineer designs, configures, kernel reconfiguration, accelerators, load runs, monitors, and analyzes tests without balancers, proxies, routing, TCP/IP parame- having to write code, distribute code across ter calibrations, etc. Usually, performance multiple machines, and synchronize running tests are conducted in a very controlled en- tests. An appliance framework is also rep- vironment. The research presented in this licable. Tests are repeatable and easily re- paper is no exception. Titchkosky (2003) configured. provided a good survey of network perfor- mance studies on web-server performance In the following sections, we present our under different network settings such as benchmark framework and what makes it wide area networks, parallel wide area net- different. In section three we present our works, ATM networks, and content caching. test results. Section four is an elaboration on our methodology and section five is the Other researchers performed application- summary and conclusions of the paper. All solution web server performance test com- figures are grouped in an appendix at the parisons under a standard World Wide Web end of the paper. environment for a chosen set of dynamic web content processers and web servers. 2. THE BENCHMARK TESTING FRA- These are closer to the research presented MEWORK in this paper. Gousios (2002) compared processing performance of servlets, FastCGI, The objective of this research was to com- PHP and Mod-Perl under Apache and Post- pare the performance of the three dynamic gresSQL. Cecchet (2003) compared the per- web content processors: Perl, PHP and Py- formance of PHP, servlets and EJB under thon within a LAMP solution-stack under six Apache and MySQL by completely imple- test scenarios. menting a client-browser emulator. The The first three scenarios were concurrent- work of Titchkosky (2003) is complementary user scenarios. We measured and compared to that of Cecchet (2003). They used the average-page-response-time at nine dif- Apache, PHP, Perl, Serve-Side Java (Tomcat, ferent levels: 1, 5, 10, 25, 50, 75 100, 125 Jetty, Resin) and MySQL. They were very and 150 concurrent users respectively. Sce- elaborate in their attempt to control the test nario one is a pure CGI scenario, no data- environment. Ramana (2005) compared the
Proc CONISAR 2008, v1 (Phoenix): §2732 (refereed) c 2008 EDSIG, page 2 Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C base connectivity is required. Scenario two using each of the content generator lan- is a simple database query scenario. Scena- guages and each of the configured number rio three is a database insert-update-delete of concurrent users, the web clients re- transaction scenario. For example, for the quested a simple CGI-Script execution that 25 concurrent users test of any of the above dynamically generated a web page, return- 3 scenarios, 25 concurrent users were estab- ing “Hello World”. No database connectivity lished at the beginning of the test. Whenev- was involved. Figure 1 is a sequence dia- er a user is terminated a new user was es- gram representing the skeleton of the sce- tablished to maintain the 25 concurrent user nario. Figures 4 and 5 are the comparison level for the duration of the test. Under plots under SE Linux and Redhat Linux of each test scenario, we performed 54 tests as the test scenario. follows: scenario Two (a Simple Database For each platform (SE Linux, Redhat) query): For each platform, using each of For each language (Perl, Python, PHP), the content generator languages and each of For each number of concurrent users the configured number of concurrent users, (1,5,10,25,50,75,100,125,50) the web clients requested execution of a simple SQL query against the database, 1- configure a test, formatted the result, and returned the con- 2- perform and monitor a test tent page to the web client. The SQL query was:"Select SQL_NO_CACHE count(*) from 3- gather test results Person ". Figure 2 is a sequence diagram End For representing the skeleton of test scenarios two and three. Figures 6 and 7 are the End For comparison plots under SE Linux and Redhat 4- Tabulate, analyze and plot the average- Linux of the test scenario. page-response time for Perl, Python and Scenario Three (a Database In- PHP under a platform sert-Update-Delete Transaction): For End For each platform, using each of the content generator languages and each of the confi- The next three scenarios were transactions gured number of concurrent users, the web per second scenarios. The objective of these clients requested the execution of a transac- tests was to stress the LAMP solution stack tion against the database (which included an to the level where transactions start to fail insert, an update, and a delete), formatted and the transaction success rate for the du- the result and returned the content page to ration of a test is below 80%. For example, the web client. Figures 8 and 9 are the at the 25 transactions per second level of comparison plots under SE Linux and Redhat Perl under SE Linux, we fed 25 new transac- Linux of the test scenario. tions per second regardless of the status of the previously generated transactions. If a Transactions per second test: Scenarios failure rate of 20% or more was exhibited, Four through Six we considered the 25 transactions per second as cutoff point and we did not need These were stress tests. Scenarios four, to go to a test with higher transactions per five, and six were performed using a con- second. Failures are attributed to time-outs, stant transactions per second setting rather database deadlocks, or the lack of resources than the previous concurrent user setting. on the web server, database server or the As stated earlier, the appliance will generate database management system itself. The the configured transactions every second three test scenarios were the same pure CGI independent of the status of the previously scenario, simple database query scenario, submitted transactions. These tests were and an insert-update-delete transaction sce- performed progressively, increasing the nario. transaction rate, until the success rate dropped below 80%. Table 2 is a tabulation Concurrent Connections Test Scenarios of the maximum number of tolerated trans- actions per second until we achieved the scenario One (Pure CGI, No data- threshold degradation rate. base Connectivity): For each platform,
Proc CONISAR 2008, v1 (Phoenix): §2732 (refereed) c 2008 EDSIG, page 3 Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C
Test Metrics Gathered coming and outgoing traffic in KBPS; min, max and current time to TCP SYNC/ACK in To generate realistic and high volume traffic milliseconds; min, max and current round within a replicable environment, a hardware trip time in milliseconds; min, max and cur- appliance from Spirent Communication (Spi- rent time to TCP first byte in milliseconds; rent 2003) was used, the Avalanche 220EE. TCP hand-shake parameters; min, max and This device is designed specifically to assess current response time per URL milliseconds; network and server capacity by generating and all HTTP status-codes. The appliance large quantities of realistic and user confi- gathered these metrics and provided us with gurable network traffic. It implements a the summaries listed in 4-second intervals. client-browser emulator with all the funda- In other words, each data point gathered mental capabilities of HTTP 1.0 and 1.1 pro- was a summary of a 4-second interval of tocol (session management, cookies, SSL, traffic activities. For example the appliance certificates, etc.). Through user defined set- provided us with summaries of the minimum tings, thousands of web clients from multiple page response time, average-page-response sub networks can be simulated to request time, and maximum page response time for services from HTTP servers. For testing each of the 4-second intervals for all users purposes, two separate front end LAMP (Li- that were served within the interval. Accor- nux, Apache Perl, Python, PHP, and data- dingly, for a 6 minutes duration test, we ga- base connectors) environments were dep- thered cumulative and real time summary loyed: an SE Linux installation and a Redhat statistics for 90 intervals. This paper Enterprise server installation. The two presents and compares the average-page- hardware platforms were identical; the con- response time of each test cumulatively. figurations of Apache, Perl, Python, PHP and Other network traffic statistics (TCP statis- their connectors on the two platforms were tics, connection management statistics, etc.) as identical as possible. The backend MySQL are outside the scope of this paper. Database server resided on a separate Red- hat Enterprise server. Two identical switch- A comparison of our framework with pre- es and a router were used for the test envi- vious studies exemplifies the benefits of ronment to manage the appliance, to pro- conducting network tests using an appliance vide connectivity to the HTTP servers and to framework. For example, Gousios (2002) the database server. Specification of all test performed four benchmark tests comparing configurations and their management was FASTCGI, mod-Perl, PHP and servlets. They done using the appliance interface. Investi- used a combination of Apache JMeter (“a gators did not have to write elaborate shell desktop application designed to load test scripts, distribute the test environment over functional behavior and measure perfor- multiple clients, and manually or program- mance”) to load Perl scripts to fork matically gather results. All of this was ac- processes and generate load. All the server- complished by the appliance and its accom- side components were run on the same ma- panying analyzer. Figure 3 is a screenshot chine, using PostgresSQL instead of MySQL. of the user interface to configure a test by They did not use a distributed environment. the appliance. Table 1 is a sample test out- Gousios did not benchmark a complete LAMP put, See (Kenney 2005) for more elaborate framework either. Comparing that frame- description of the appliance capabilities. work and the effort required to generate benchmark tests, then gather and analyze Whether it was concurrent users or transac- the results, lends strong credence to the in- tions per second, each test was composed of clusion of special purpose network ap- 4 phases: (1) a warm-up phase, (2) a ramp- pliances to conduct performance analyses. up phase, (3) a 4 minute steady phase and Another reason to use dedicated appliances (4) a cool-down phase. The total duration of is that Gousios was “not able to perform the each test was 6 minutes. For each test per- tests for more than 97 to 98 clients because formed under the six scenarios, the following the benchmark program exhausted the data was collected: a pcap log file of all net- physical memory of the client machine” work traffic of the test (a pcap file is a net- (Gousios, 2002). Gousios used shell-based work traffic dump of packets in and out of a scripting to gather data and perform calcula- network interface of a machine); desired and tions and did not use the pcap log contents current load; cumulative attempted, suc- from a network analyzer to gather data, cessful and unsuccessful transactions; in-
Proc CONISAR 2008, v1 (Phoenix): §2732 (refereed) c 2008 EDSIG, page 4 Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C which would have made their results more Figure 5 shows the same results under Red- reliable. Cecchet (2003) performed elabo- hat Linux. Under both environments, PHP rate “performance comparison of middleware outperformed Perl and Python. The Linux architectures for generating dynamic web flavor was not a factor in these tests. In content” implementing the TPC-W transac- looking at the graphs, one can see that the tional web e-commerce benchmark specifi- test results for both SE and Redhat Linux cation from tpc.org. However, Cecchet’s were almost identical. platform was Apache, Tomcat, PHP, EJB and servlets, which is not a complete LAMP ar- Scenario Two (Simple database query) chitecture. Cecchet implemented an elabo- rate HTTP client emulator that required labo- As previously described, test scenario two rious scripting. In our case, the appliance adds a simple database query to scenario provided this functionality. Titchkosky one. The following is the Perl script code for (2003) extended the work of Cecchet the scenario. The Python and PHP scripts (2003), relying heavily on shell scripting for are similar. Uniquely identifying names distribution of tests. Also, monitoring and have been replaced by xxx. Line breaks analysis relied heavily on raw network com- were added to beautify the two columns mands (netstat, httpref, sar, etc.) and web format. server log analyses. Titchkosky’s approach #!/usr/bin/perl was labor intensive, hard to code, hard to use strict; use DBI(); use warnings; reconfigure, and had to be programmed to my $sqlStatement = get the full spectrum of network traffic. In "Select SQL_NO_CACHE count(*) all of these cases, it was unclear how data- from perlperson"; base server timeouts, deadlocks, connection my $dsn = errors, etc. were accounted for, managed "DBI:mysql:database=xxx;host='xxx';port and propagated to the HTTP client. In a =xxx"; multi-tier web environment, the reason for my $dbh = DBI->connect($dsn, failure is not necessarily an HTTP error. "xxx","xxx",{PrintError=>0}); 3. TEST RESULTS if(my $x = DBI::err) { responseBack(501, "Unknown $x DBI Error"); Scenario One (Pure CGI, No database } Connectivity) my $sth=$dbh-> prepare($sqlStatement); As discussed earlier for each of the concur- if(my $x = DBI::err) { rent connections benchmark, a total of 54 responseBack(501, tests were performed. These are 27 tests "Unknown $x DBI Error"); for each of the two Linux platforms (9 tests } for each of the three languages within a my $mysh = $sth->execute(); platform). The following is the Perl script if(my $x = DBI::err) { code for the scenario. The Python and PHP responseBack(501, scripts are similar. "Unknown $x DBI Error"); } #!/usr/bin/perl my $result; print "Content-type: text/html\n\n"; my $resultSet = print "
Welcome to Perl-MySQL
"; print “Perl
"; print “Perl
”; $sth->fetchrow_hashref()){ print ” Hello from Perl”; $resultSet = print “
\n"; "$resultSet + $result+"; Figures 4 shows the average page response if(my $x = DBI::err) { time for a web-page request in milliseconds responseBack(501, averaged for the duration of the six minutes "Unknown $x DBI Error"); test under SE Linux. The horizontal axis } records the “number of concurrent users” of } the test; the vertical axis measures the “av- $sth->finish(); erage-page-response time” for that test. if(my $x = DBI::err) { responseBack(501,
Proc CONISAR 2008, v1 (Phoenix): §2732 (refereed) c 2008 EDSIG, page 5 Jafar, Anderson, and Abdullat Fri, Nov 7, 4:00 - 4:30, Pueblo C
"Unknown $x DBI Error"); print("HTTP/1.1 501 $errNo $err\n"); } exit; $dbh->disconnect(); } responseBack(200, "OK", $resultSet); if(!mysql_select_db("xxx",$link)){ sub responseBack { $errNo = mysql_errno($link); my $err = 501; $err = mysql_error($link); my $message = header("HTTP/1.1 501 $errNo $err"); "Connection to Database Failed"; header("Content-Length: 0"); my $results = "$err $message"; print("HTTP/1.1 501 $errNo $err\n"); if(@_){ exit; ($err, $message, $results) = @_; } } function microtime_float() { print("status: $err $message\n"); list($usec, $sec) = print("content-type: text/html\n\n"); explode(" ", microtime()); print("\n"); return ((float)$usec + (float)$sec); print("
\n"); } print( function sendError($errNo,$err){ "$results
\n"); exit; print("\n"); } print("\n"); $currTime = microtime_float(); exit; $sql ="insert into PHPTestInsert(ts) } values($currTime)"; $result = mysql_query($sql, $link); Figures 6 and 7 show the results of the tests if(!$result) sendEr- under SE Linux and Redhat Linux. Under ror(mysql_errno($link),mysql_error($link)); both environments, PHP coupled with its DBI $sql = "select 1 from PHPTestInsert significantly outperformed Perl and Python. where ts = $currTime for UPDATE"; For the SE Linux environment, however, $result = mysql_query($sql, $link); both Perl and Python exhibited no difference if(!$result) in performance. Python DBI performed a lot sendError(mysql_errno($link), better under the Redhat environment than mysql_error($link)); the SE Linux environment. $sql = "update PHPTestInsert set volume = volume + 1 where ts = $currTime"; Scenario Three (Concurrent Insert- $result = mysql_query($sql, $link); Update-Delete) if(!$result) Scenario three adds the transaction over- sendError(mysql_errno($link), head of updates and delete to the database. mysql_error($link)); The following is the PHP script code for the $sql = "delete from PHPTestInsert scenario. The Python and PHP scripts are where ts = $currTime"; similar. $result = mysql_query($sql, $link); if(!$result) sendError(mysql_errno($link),”); if(mysql_errno($link)){ print(“Hello PHP”); $errNo = mysql_errno($link); print(“
"); } print("