Volume 1, Number 1 http://jisar.org/1/1/ December 1, 2008
In this issue:
Comparison of Dynamic Web Content Processing Language Performance Under a LAMP Architecture
Musa J. Jafar Russell Anderson West Texas A&M University West Texas A&M University Canyon, TX 79018 USA Canyon, TX 79018 USA
Amjad A. Abdullat West Texas A&M University Canyon, TX 79018 USA
Abstract: LAMP is an open source, web-based application solution stack. It is comprised of (1) an operating system platform running Linux, (2) an Apache web server, (3) a MySQL database management system, and (4) a Dynamic Web Content Processor (tightly coupled with Apache) that is a combination of one or more of Perl, Python and PHP scripting languages paired with their corresponding MySQL Database Interface Module. In this paper, we compare various performance measures of Perl, Python and PHP separately without a database interface and in conjunction with their MySQL Database Interface Modules within a LAMP framework. We performed our tests under two separate Linux, Apache and Dynamic Web Content environments: an SE Linux environment and a Redhat Enterprise Linux environment. A single MySQL database management system that resided on a separate Redhat Linux box served both environments. We used a hardware appliance framework for our test configuration, generation and data gathering. An appliance framework is repeatable and easily configurable. It allows a performance engineer to focus effort on the design, configuration and monitoring of tests, and the analysis of test results. In all cases, whether database connectivity was involved or not, PHP outperformed Perl and Python. We also present the implementation of a mechanism to handle the propagation of database engine status-codes to the web-client, this is important when automatic client-based testing is performed, because the HTTP server is incapable of automatically propagating third-tier applications status-codes to the HTTP client.
Keywords: LAMP solution stack, Web Applications Development, Perl, Python, PHP, MySQL, APACHE, Linux, DBI, Dynamic Web Content Processor, HTTP 1.1 status-code
Recommended Citation: Jafar, Anderson, and Abdullat (2008). Comparison of Dynamic Web Content Processing Language Performance Under a LAMP Architecture. Journal of Information Systems Applied Research, 1 (1). http://jisar.org/1/1/. ISSN: 1946-1836. (Preliminary version appears in The Proceedings of CONISAR 2008: §2732. ISSN: 0000-0000.)
This issue is on the Internet at http://jisar.org/1/1/ JISAR 1 (1) Journal of Information Systems Applied Research 2
The Journal of Information Systems Applied Research (JISAR) is a peer-reviewed academic journal published by the Education Special Interest Group (EDSIG) of the Association of Infor- mation Technology Professionals (AITP, Chicago, Illinois). • ISSN: 1946-1836. • First issue: 1 Dec 2008. • Title: Journal of Information Systems Applied Research. Variants: JISAR. • Phys- ical format: online. • Publishing frequency: irregular; as each article is approved, it is published immediately and constitutes a complete separate issue of the current volume. • Single issue price: free. • Subscription address: [email protected]. • Subscription price: free. • Electronic access: http://jisar.org/ • Contact person: Don Colton ([email protected])
2008 AITP Education Special Interest Group Board of Directors Paul M. Leidig Don Colton Robert B. Sweeney Grand Valley State University Brigham Young Univ Hawaii U South Alabama EDSIG President 2005-2006 EDSIG President 2007-2008 Vice President 2007-2008
Wendy Ceccucci Ronald I. Frank Kenneth A. Grant Quinnipiac Univ Pace University Ryerson University Member Svcs 2007-2008 Director 2007-2008 Treasurer 2007-2008
Albert L. Harris Thomas N. Janicki Kevin Jetton Kathleen M. Kelm Appalachian St Univ NC Wilmington Texas St U San Marcos Edgewood College JISE Editor Director 2006-2009 Chair ISECON 2008 Director 2007-2008
Alan R. Peslak Steve Reames Patricia Sendall Penn State Angelo State Univ Merrimack College Director 2007-2008 Director 2008-2009 Secretary 2007-2008
Journal of Information Systems Applied Research Editors Don Colton Alan R. Peslak Thomas N. Janicki Professor Associate Professor Associate Professor BYU Hawaii Penn State UNC Wilmington Editor Associate Editor Associate Editor
This paper was selected for inclusion in the journal as the CONISAR 2008 best paper. The accep- tance rate is typically 1% for this category of paper based on blind reviews from six or more peers including three or more former best papers authors who did not submit a paper in 2008.
EDSIG activities include the publication of JISAR and ISEDJ, the organization and execution of the annual CONISAR and ISECON conferences held each fall, the publication of the Journal of Information Systems Education (JISE), and the designation and honoring of an IS Educator of the Year. • The Foundation for Information Technology Education has been the key sponsor of ISECON over the years. • The Association for Information Technology Professionals (AITP) provides the corporate umbrella under which EDSIG operates.
c Copyright 2008 EDSIG. In the spirit of academic freedom, permission is granted to make and distribute unlimited copies of this issue in its PDF or printed form, so long as the entire document is presented, and it is not modified in any substantial way.
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 3
Comparison of Dynamic Web Content Processing Language Performance Under a LAMP Architecture
Musa Jafar [email protected]
Russell Anderson [email protected]
Amjad Abdullat [email protected]
CIS Department, West Texas A&M University Canyon, TX 79018 USA
Abstract LAMP is an open source, web-based application solution stack. It is comprised of (1) an oper- ating system platform running Linux, (2) an Apache web server, (3) a MySQL database man- agement system, and (4) a Dynamic Web Content Processor (tightly coupled with Apache) that is a combination of one or more of Perl, Python and PHP scripting languages paired with their corresponding MySQL Database Interface Module. In this paper, we compare various performance measures of Perl, Python and PHP separately without a database interface and in conjunction with their MySQL Database Interface Modules within a LAMP framework. We per- formed our tests under two separate Linux, Apache and Dynamic Web Content environments: an SE Linux environment and a Redhat Enterprise Linux environment. A single MySQL data- base management system that resided on a separate Redhat Linux box served both environ- ments. We used a hardware appliance framework for our test configuration, generation and data gathering. An appliance framework is repeatable and easily configurable. It allows a performance engineer to focus effort on the design, configuration and monitoring of tests, and the analysis of test results. In all cases, whether database connectivity was involved or not, PHP outperformed Perl and Python. We also present the implementation of a mechanism to handle the propagation of database engine status-codes to the web-client, this is important when automatic client-based testing is performed, because the HTTP server is incapable of automatically propagating third-tier applications status-codes to the HTTP client. Keywords : LAMP solution stack, Web Applications Development, Perl, Python, PHP, MySQL, APACHE, Linux, DBI, Dynamic Web Content Processor, HTTP 1.1 status-code
web server, MySQL for database manage- 1. INTRODUCTION ment, and a combination of Perl, Python or The term LAMP, originally formalized by PHP as language(s) to generate dynamic Dougherty (2001) of O'Reilly Media, Inc., content on the server. More recently, Ruby refers to the non-proprietary, open source, was added to the platform. Lately, Some web development, deployment, and produc- deployments replaced Apache with lighthttpd tion platform that is comprised of individual from Open Source or IIS from Microsoft. open source components. LAMP uses Linux Depending on the components replaced, the for the operating system, Apache for the platform is also known as WAMP when Mi-
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 4 crosoft Windows replaces Linux or WI MP closest are the studies by Gousios (2002), when Microsoft Windows replaces Linux and Cecchet (2003) and Titchkosky (2003). IIS server replaces Apache. Doyle (2008) In this paper, we perform a complete study provided more information on the evolution comparing the performance of Perl, Python of the various dynamic web content proces- and PHP separately and in conjunction with sors framework and the various technologies their database connectors under LAMP archi- for web applications development. tecture using two separate Linux environ- Pedersen (2004), Walberg (2007) and Me- ments (SE Linux and Redhat Enterprise nascé (2002) indicated that to measure a Linux). The two environments were served web application’s performance, there are by a third Linux environment hosting the many factors to consider that are not neces- backend MySQL database. The infrastruc- sarily independent. It requires the fine tun- ture that we used to emulate a web-client ing and optimization of bandwidth, proc- and to configure and generate tests was a esses, memory management, CPU usage, hardware appliance-based framework. It is disk usage, session management, granular fundamentally different from the infrastruc- configurations across the board, kernel re- ture used by the authors just mentioned. configuration, accelerators, load balancers, The framework allowed us to focus on the proxies, routing, TCP/IP parameter calibra- task where a performance engineer designs, tions, etc. Usually, performance tests are configures, runs, monitors, and analyzes conducted in a very controlled environment. tests without having to write code, distribute The research presented in this paper is no code across multiple machines, and syn- exception. Titchkosky (2003) provided a chronize running tests. An appliance frame- good survey of network performance studies work is also replicable. Tests are repeatable on web-server performance under different and easily reconfigured. network settings such as wide area net- In the following sections, we present our works, parallel wide area networks, ATM benchmark framework and what makes it networks, and content caching. different. In section three we present our Other researchers performed application- test results. Section four is an elaboration solution web server performance test com- on our methodology and section five is the parisons under a standard World Wide Web summary and conclusions of the paper. All environment for a chosen set of dynamic figures are grouped in an appendix at the web content processers and web servers. end of the paper. These are closer to the research presented in this paper. Gousios (2002) compared 2. THE BENCHMARK TESTING FRAME- processing performance of servlets, FastCGI, WORK PHP and Mod-Perl under Apache and Post- gresSQL. Cecchet (2003) compared the per- The objective of this research was to com- formance of PHP, servlets and EJB under pare the performance of the three dynamic Apache and MySQL by completely imple- web content processors: Perl, PHP and Py- menting a client-browser emulator. The thon within a LAMP solution-stack under six work of Titchkosky (2003) is complementary test scenarios. to that of Cecchet (2003). They used The first three scenarios were concurrent- Apache, PHP, Perl, Serve-Side Java (Tomcat, user scenarios. We measured and compared Jetty, Resin) and MySQL. They were very the average-page-response-time at nine dif- elaborate in their attempt to control the test ferent levels: 1, 5, 10, 25, 50, 75 100, 125 environment. Ramana (2005) compared the and 150 concurrent users respectively. Sce- performance of PHP and C under LAMP, nario one is a pure CGI scenario, no data- WAMP and WIMP architectures – replacing base connectivity is required. Scenario two Linux with Windows and Apache with IIS. is a simple database query scenario. Sce- Although the LAMP architecture is prevailing nario three is a database insert-update- as a mainstream web-based architecture, delete transaction scenario. For example, unfortunately, there does not appear to be a for the 25 concurrent users test of any of complete study that addresses the issues the above 3 scenarios, 25 concurrent users related to a pure LAMP architecture. The were established at the beginning of the test. Whenever a user is terminated a new
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 5 user was established to maintain the 25 con- 5 are the comparison plots under SE Linux current user level for the duration of the and Redhat Linux of the test scenario. test. Under each test scenario, we per- Scenario Two (a Simple Database formed 54 tests as follows: query): For each platform, using each of For each platform (SE Linux, Redhat) the content generator languages and each of For each language (Perl, Python, PHP), the configured number of concurrent users, For each number of concurrent users the web clients requested execution of a (1,5,10,25,50,75,100,125,50) simple SQL query against the database, formatted the result, and returned the con- 1- configure a test, tent page to the web client. The SQL query 2- perform and monitor a test was:"Select SQL_NO_CACHE count(*) from Person ". Figure 2 is a sequence diagram 3- gather test results representing the skeleton of test scenarios End For two and three. Figures 6 and 7 are the comparison plots under SE Linux and Redhat End For Linux of the test scenario. 4- Tabulate, analyze and plot the average- Scenario Three (a Database Insert- page-response time for Perl, Python and Update-Delete Transaction): For each PHP under a platform platform, using each of the content genera- End For tor languages and each of the configured number of concurrent users, the web clients The next three scenarios were transactions requested the execution of a transaction per second scenarios. The objective of these against the database (which included an in- tests was to stress the LAMP solution stack sert, an update, and a delete), formatted the to the level where transactions start to fail result and returned the content page to the and the transaction success rate for the du- web client. Figures 8 and 9 are the com- ration of a test is below 80%. For example, parison plots under SE Linux and Redhat at the 25 transactions per second level of Linux of the test scenario. Perl under SE Linux, we fed 25 new transac- tions per second regardless of the status of Transactions per second test: Scenarios the previously generated transactions. If a Four through Six failure rate of 20% or more was exhibited, These were stress tests. Scenarios four, we considered the 25 transactions per sec- five, and six were performed using a con- ond as cutoff point and we did not need to stant number of transactions per second set- go to a test with higher transactions per ting rather than the previous concurrent second. Failures are attributed to time-outs, user setting. As stated earlier, the appliance database deadlocks, or the lack of resources will generate the configured transactions on the web server, database server or the every second independent of the status of database management system itself. The the previously submitted transactions. three test scenarios were the same pure CGI These tests were performed progressively, scenario, simple database query scenario, increasing the transaction rate, until the and an insert-update-delete transaction sce- success rate dropped below 80%. Table 2 is nario. a tabulation of the maximum number of tol- Concurrent Connections Test Scenarios erated transactions per second until we achieved the threshold degradation rate. Scenario One (Pure CGI, No database Connectivity): For each platform, using Test Metrics Gathered each of the content generator languages and To generate realistic and high volume traffic each of the configured number of concurrent within a replicable environment, a hardware users, the web clients requested a simple appliance from Spirent Communication CGI-Script execution that dynamically gen- (Spirent 2003) was used, the Avalanche erated a web page, returning “Hello World”. 220EE. This device is designed specifically No database connectivity was involved. Fig- to assess network and server capacity by ure 1 is a sequence diagram representing generating large quantities of realistic and the skeleton of the scenario. Figures 4 and user configurable network traffic. It imple-
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 6 ments a client-browser emulator with all the the summaries listed in 4-second intervals. fundamental capabilities of HTTP 1.0 and 1.1 In other words, each data point gathered protocol (session management, cookies, was a summary of a 4-second interval of SSL, certificates, etc.). Through user de- traffic activities. For example the appliance fined settings, thousands of web clients from provided us with summaries of the minimum multiple sub networks can be simulated to page response time, average-page-response request services from HTTP servers. For time, and maximum page response time for testing purposes, two separate front end each of the 4-second intervals for all users LAMP (Linux, Apache Perl, Python, PHP, and that were served within the interval. Ac- database connectors) environments were cordingly, for a 6 minutes duration test, we deployed: an SE Linux installation and a gathered cumulative and real time summary Redhat Enterprise server installation. The statistics for 90 intervals. This paper pre- two hardware platforms were identical; the sents and compares the average-page- configurations of Apache, Perl, Python, PHP response time of each test cumulatively. and their connectors on the two platforms Other network traffic statistics (TCP statis- were as identical as possible. The backend tics, connection management statistics, etc.) MySQL Database server resided on a sepa- are outside the scope of this paper. rate Redhat Enterprise server. Two identical A comparison of our framework with previ- switches and a router were used for the test ous studies exemplifies the benefits of con- environment to manage the appliance, to ducting network tests using an appliance provide connectivity to the HTTP servers and framework. For example, Gousios (2002) to the database server. Specification of all performed four benchmark tests comparing test configurations and their management FASTCGI, mod-Perl, PHP and servlets. They was done using the appliance interface. In- used a combination of Apache JMeter (“a vestigators did not have to write elaborate desktop application designed to load test shell scripts, distribute the test environment functional behavior and measure perform- over multiple clients, and manually or pro- ance”) to load Perl scripts to fork processes grammatically gather results. All of this was and generate load. All the server-side com- accomplished by the appliance and its ac- ponents were run on the same machine, us- companying analyzer. Figure 3 is a screen- ing PostgresSQL instead of MySQL. They did shot of the user interface to configure a test not use a distributed environment. Gousios by the appliance. Table 1 is a sample test did not benchmark a complete LAMP frame- output, See (Kenney 2005) for more elabo- work either. Comparing that framework and rate description of the appliance capabilities. the effort required to generate benchmark Whether it was concurrent users or transac- tests, then gather and analyze the results, tions per second, each test was composed of lends strong credence to the inclusion of 4 phases: (1) a warm-up phase, (2) a ramp- special purpose network appliances to con- up phase, (3) a 4 minute steady phase and duct performance analyses. Another reason (4) a cool-down phase. The total duration of to use dedicated appliances is that Gousios each test was 6 minutes. For each test per- was “not able to perform the tests for more formed under the six scenarios, the following than 97 to 98 clients because the benchmark data was collected: a pcap log file of all net- program exhausted the physical memory of work traffic of the test (a pcap file is a net- the client machine” (Gousios, 2002). work traffic dump of packets in and out of a Gousios used shell-based scripting to gather network interface of a machine); desired and data and perform calculations and did not current load; cumulative attempted, suc- use the pcap log contents from a network cessful and unsuccessful transactions; in- analyzer to gather data, which would have coming and outgoing traffic in KBPS; min, made their results more reliable. Cecchet max and current time to TCP SYNC/ACK in (2003) performed elaborate “performance milliseconds; min, max and current round comparison of middleware architectures for trip time in milliseconds; min, max and cur- generating dynamic web content” imple- rent time to TCP first byte in milliseconds; menting the TPC-W transactional web e- TCP hand-shake parameters; min, max and commerce benchmark specification from current response time per URL milliseconds; tpc.org. However, Cecchet’s platform was and all HTTP status-codes. The appliance Apache, Tomcat, PHP, EJB and servlets, gathered these metrics and provided us with which is not a complete LAMP architecture.
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 7
Cecchet implemented an elaborate HTTP Scenario Two (Simple database query) client emulator that required laborious As previously described, test scenario two scripting. In our case, the appliance pro- adds a simple database query to scenario vided this functionality. Titchkosky (2003) one. The following is the Perl script code for extended the work of Cecchet (2003), rely- the scenario. The Python and PHP scripts ing heavily on shell scripting for distribution are similar. Uniquely identifying names of tests. Also, monitoring and analysis relied have been replaced by xxx. Line breaks heavily on raw network commands (netstat, were added to beautify the two columns httpref, sar, etc.) and web server log analy- format. ses. Titchkosky’s approach was labor inten- sive, hard to code, hard to reconfigure, and #!/usr/bin/perl had to be programmed to get the full spec- use strict; use DBI(); use warnings; trum of network traffic. In all of these my $sqlStatement = cases, it was unclear how database server "Select SQL_NO_CACHE count(*) timeouts, deadlocks, connection errors, etc. from perlperson"; were accounted for, managed and propa- my $dsn = gated to the HTTP client. In a multi-tier web "DBI:mysql:database=xxx;host='xxx';port environment, the reason for failure is not =xxx"; necessarily an HTTP error. my $dbh = DBI->connect($dsn, "xxx","xxx",{PrintError=>0}); 3. TEST RESULTS if(my $x = DBI::err) { responseBack(501, Scenario One (Pure CGI, No database "Unknown $x DBI Error"); Connectivity) } As discussed earlier for each of the concur- my $sth=$dbh-> prepare($sqlStatement); rent connections benchmark, a total of 54 if(my $x = DBI::err) { tests were performed. These are 27 tests responseBack(501, for each of the two Linux platforms (9 tests "Unknown $x DBI Error"); for each of the three languages within a } platform). The following is the Perl script my $mysh = $sth->execute(); code for the scenario. The Python and PHP if(my $x = DBI::err) { scripts are similar. responseBack(501, "Unknown $x DBI Error"); #!/usr/bin/perl } print "Content-type: text/html\n\n"; my $result; print "
Welcome to Perl-MySQL
"; print ""; while($result = print "Hello from Perl"; $sth->fetchrow_hashref()){ print "
\n"; $resultSet = Figure 4 shows the average page response "$resultSet + $result+
"; time for a web-page request in milliseconds if(my $x = DBI::err) { averaged for the duration of the six minutes responseBack(501, test under SE Linux. The horizontal axis "Unknown $x DBI Error"); records the “number of concurrent users” of } the test; the vertical axis measures the “av- } erage-page-response time” for that test. $sth->finish(); Figure 5 shows the same results under Red- if(my $x = DBI::err) { hat Linux. Under both environments, PHP responseBack(501, outperformed Perl and Python. The Linux "Unknown $x DBI Error"); flavor was not a factor in these tests. In } looking at the graphs, one can see that the $dbh->disconnect(); test results for both SE and Redhat Linux responseBack(200, "OK", $resultSet); were almost identical. sub responseBack { my $err = 501; my $message = "Connection to Database Failed";
"; print "
"; while($result = print "Hello from Perl"; $sth->fetchrow_hashref()){ print "
\n"; $resultSet = Figure 4 shows the average page response "$resultSet + $result+"; time for a web-page request in milliseconds if(my $x = DBI::err) { averaged for the duration of the six minutes responseBack(501, test under SE Linux. The horizontal axis "Unknown $x DBI Error"); records the “number of concurrent users” of } the test; the vertical axis measures the “av- } erage-page-response time” for that test. $sth->finish(); Figure 5 shows the same results under Red- if(my $x = DBI::err) { hat Linux. Under both environments, PHP responseBack(501, outperformed Perl and Python. The Linux "Unknown $x DBI Error"); flavor was not a factor in these tests. In } looking at the graphs, one can see that the $dbh->disconnect(); test results for both SE and Redhat Linux responseBack(200, "OK", $resultSet); were almost identical. sub responseBack { my $err = 501; my $message = "Connection to Database Failed";
c 2008 EDSIG http://jisar.org/1/1/ December 1, 2008 JISAR 1 (1) Jafar, Anderson, and Abdullat 8
my $results = "$err $message"; exit; if(@_){ } ($err, $message, $results) = @_; function microtime_float() { } list($usec, $sec) = print("status: $err $message\n"); explode(" ", microtime()); print("content-type: text/html\n\n"); return ((float)$usec + (float)$sec); print("\n"); } print("
\n"); function sendError($errNo,$err){ print( header("HTTP/1.1 501 $errNo $err"); "$results
\n"); } print("\n"); $currTime = microtime_float(); print("\n"); $sql ="insert into PHPTestInsert(ts) exit; values($currTime)"; } $result = mysql_query($sql, $link); if(!$result) sendEr- Figures 6 and 7 show the results of the tests ror(mysql_errno($link),mysql_error($link)); under SE Linux and Redhat Linux. Under $sql = "select 1 from PHPTestInsert both environments, PHP coupled with its DBI where ts = $currTime for UPDATE"; significantly outperformed Perl and Python. $result = mysql_query($sql, $link); For the SE Linux environment, however, if(!$result) both Perl and Python exhibited no difference sendError(mysql_errno($link), in performance. Python DBI performed a lot mysql_error($link)); better under the Redhat environment than $sql = "update PHPTestInsert set volume the SE Linux environment. = volume + 1 where ts = $currTime"; Scenario Three (Concurrent Insert- $result = mysql_query($sql, $link); Update-Delete) if(!$result) sendError(mysql_errno($link), Scenario three adds the transaction over- mysql_error($link)); head of updates and delete to the database. $sql = "delete from PHPTestInsert The following is the PHP script code for the where ts = $currTime"; scenario. The Python and PHP scripts are $result = mysql_query($sql, $link); similar. if(!$result) sendError(mysql_errno($link),”); if(mysql_errno($link)){ print(“Hello PHP”); $errNo = mysql_errno($link); print(“
"); } print("