PerformanceComparisonofDynamicWebPlatforms

TonyHansen,VarshaMainkarandPaulReeser AT&TLabs 200LaurelAve,RoomD5-3C12,Middletown,NJ07748USA Phone1732-420-3714,Fax1732-368-1919 Email{tony,mainkar,preeser}@att.com Keywords:Dynamic,Web,CGI,FastCGI,++,Java,Servlets,JSP,Performance,Comparison

Over the last few years, the World Wide Web has transformed itself from a static content-distribution mediumtoaninteractive,dynamicmedium.TheWebisnowwidelyusedasthepresentationlayerfora hostofon-lineservicessuchase-mailandaddressbooks,e-cards,e-calendar,shopping,banking,and stock trading. As a consequence, HTML files are now typically generated dynamically after the server receives the request. From the Web-site providers’ point of view, dynamic generation of HTML pages impliesalesserunderstandingoftherealcapacityandperformanceoftheirWebservers.FromtheWeb developers’ point of view, dynamic content implies an additional technology decision: the Web programmingtechnologytobeemployedincreatingaWeb-basedservice.SincetheWebisinherently interactive,performanceisakeyrequirement,andoftendemandscarefulanalysisofthesystems.Inthis paper,wecomparefourdynamicWebprogrammingtechnologiesfromthepointofviewofperformance. Thecomparisonisbasedtestingandmeasurementoftwocases:oneisacasestudyofarealapplication thatwasdeployedinanactualWeb-basedservice;theotherisatrivialapplication.Thetwocasesprovide uswithanopportunitytocomparetheperformanceofthesetechnologiesattwoendsofthespectrumin terms of complexity. Our focus in this paper is on how complex vs. simple applications perform when implementedusingdifferentWebprogrammingtechnologies.Thepaperdrawscomparisonsandinsights basedonthisdevelopmentandperformancemeasurementeffort.

1. INTRODUCTION

The World Wide Web (WWW) first emerged a decade ago as a medium to render documentsthatwerestoredonthe,onauser’scomputer,usingspecialsoftware(thebrowser) andanewprotocol(HTTP:HyperTextTransferProtocol).Forthefirstfewyears,theWWWgrewprimarily asanewmediuminwhichstaticcontentcouldbepublished,andinformationshared.Thecontentwas publishedintheofHTML(HyperTextMarkupLanguage)files,whichwereservedbyWebservers, on requests from browsers. However, over the last few years the WWW has transformed itself from a staticcontent-distributionmediumtoaninteractive,dynamicmedium.ContentontheWebisnowoften personalized,andthereforedynamicallygenerated.TheWebisnowwidelyusedasthepresentationlayer forahostofon-lineservicessuchase-mail,e-cards,e-calendar,andaddressbooks,shopping,banking, andstocktrading.Asaconsequence,theHTMLfilesthatarerenderedbytheclient’sbrowserarenow typicallygenerateddynamicallyaftertheWebserverhasprocessedtheuser’srequest.

This dynamic generation of HTML files has nothappenedwithoutanassociatedperformancecost.

JustwhenInternetusersweregettingaccustomedto“click-and-wait”ondial-uplinesduetographics-rich

1 Web-sites,dynamicallygeneratedcontentstartedproliferatingontheWeb.Nowusersmustwaitnotonly for the network delay but also for the server-side processing delay associated with serving a request dynamically.Inmanycases,thisisturningouttobethelargestcomponentofthedelay.FromtheWeb- siteproviders’pointofview,dynamicgenerationofHTMLpagesimpliesalesserunderstandingofthereal capacity of their Web servers. The vendor-provided “hits-per-second” capacity of the Web server is no longerenough,asthisonlypertainstostaticHTMLfiles.

From the Web developers’ point of view, dynamic Web content implies an additional technology decision:theWebprogrammingtechnologytobeemployedincreatingaWeb-basedserviceorproduct.

This decision is based on several factors. Among the factors considered are ease of programming, richnessoffeatures,maintainability,reliability,andperformance.SincetheWebisinherentlyinteractive, performanceisakeyrequirement,andoftendemandscarefulanalysisofthesystems.

In this paper, we compare the performance of four Web programming technologies, namely Java

Servlets,JavaServerPages,CGI/C++andFastCGI/C++.Thecomparisonisbasedontwocases:oneis acasestudyofacomplexapplicationthatwasdeployedinanactualWeb-basedservice;theotherisa

“trivial” application. The methodology of performance analysis was stress testing and measurement.

Performance measurement (rather than modeling) made most sense in this effort, since a quick turnaroundofresultswasnecessaryandtheaccuracyofresultswasrequiredtobehigh.

The two cases (i.e., the complex and the trivial) provided us with an opportunity to compare performance of these technologies attwoendsofthespectrumofapplications,intermsofcomplexity.

Theperformance“order”ofdifferenttechnologiesisseldomabsolute–itdependsgreatlyonthenatureof the application. Our focus in this paper is on how complex vs. simple applications perform when implementedusingdifferentWebprogrammingtechnologies.Thepaperdrawscomparisonsandinsights basedonthisdevelopmentandperformancemeasurementeffort.

Themainobservationsfromthisworkwereasfollows:Ingeneral,FastCGIoutperformedCGI,while

JSP outperformed Java servlets. In the case of a complex application, the CGI-based technologies outperformed the Java-based technologies by 3-4x, with Java performance limited by software bottlenecksintheJVM.Inthecaseofatrivialapplication,therelativeperformancewasreversed,asJava outperformedCGIby2-3x.

2 Therestofthepaperisorganizedasfollows:inSection2weprovidethemotivationforconducting such a comparative study, and in Section 3 we describe briefly the technologies that we compared.

Section 4 describes the performance measurement and analysismethodology,Section5describesthe case-study testing results in detail, and Section 6 describes the results of testing a trivial application.

Finally,Section7summarizesourresults,andSection8providessomeconcludingremarks.

2. MOTIVATION

TheapplicationcontextforthisstudywasanewWeb-basedmessagingservice.Theeffortcentered on the development of a “page generation engine” to perform dynamic Web page construction in a distributedenvironment(seeFigure2-1).Givenacceleratedtime-to-marketgoalsandlimiteddevelopment time,thenaturalchoiceoftechnologywastheoneperceivedtobepowerful,feature-rich,andyeteasyto useanddeploy–Javaservlets.

The distributed execution environment consists of a front-end sub-system running a standard Web server and a Java virtual machine (JVM) servlet engine, plus a back-end sub-system of servers implementing the businesslogicanddata(e.g.,POP/IMAPmailservers,SMTPgateways,LDAP/X.500 directoryservers,etc.).ThepageengineconsistsofanumberofWebpagetemplatesandJavaservlets.

TheWebpagetemplatesareHTMLfiles,butwithtagsfromaproprietaryscriptinglanguagethatspecify howtocollectdynamicdatathatcustomizestheseHTMLfilestotheparticularrequest.Collectively,these servlets:

3 ••• ••• LOGIC DATA HTTPTHREADS ServletTHREADS

APPLICATION1 ••• WEBPAGE POP TEMPLATES SERVLETS IMAP SMTP LOGIC LDAP DATA X.500 WEBSERVER JAVAVM APPLICATION2

FRONT-ENDSERVER BACK-ENDSERVER Figure2-1:SystemArchitecture

• parsetherequestedtemplateforscriptinglanguage(dynamic)content,

• issuetheappropriateback-endrequeststoprocessthebusinesslogicorretrievetherequireddata,

• processthereturneddataintothenecessarydatastructures,

• performprotocolconversion(asnecessary),and

• constructtheWebpagetoreturntotheuser.

Theprotocol/languagebetweentheend-userandtheWebserverisHTTP/HTML,andthatbetween theJavaservletsandtheback-endisdeterminedbytheparticularapplication(e.g.,IMAP,LDAP,etc.).

Within the JVM environment, data is passed between the various servlets as eXtensible Markup

Language (XML) data structures. Hence, the scripts perform a variety of protocol and language conversionsduringthecourseofrequestexecution.

2.1. InitialMeasurements

Weconductedaseriesofstresstestsusingacommercialloaddrivertogeneraterepeatedrequests totheservletengineatvariouslevelsofconcurrency(simulatedusers).Thetestconfigurationconsisted ofaWindowsNTserverrunningtheloadgenerationscripts(driver),aSolarisserverrunningthefront-end software,andaSolarisserverrunningtheback-endapplication(forthistest,aPOP3/IMAP4mailserver andanLDAPdirectoryserver).Hardware(CPU,memory,disk,I/O)andsoftwareresourceconsumptions weremeasuredonallmachines.Inaddition,end-to-enduser-perceivedresponsetimesweremeasured.

4 Thedriverscriptsemulatedaprescribednumberofconcurrentusersrepeatedlygeneratingthesame request(e.g.,readmessage,sendmessage,etc.).Thenumberofconcurrentsimulateduserswasvaried from1to20.Thenumberofrepeatedrequestsperuserateachconcurrencylevel(typically2000)was sufficient to achieve statistical stability. The tests wererunin“stress”mode;thatis,assoonasauser receivesaresponse,itimmediatelysubmitstherequest(i.e.,withnegligibleclientdelay).Eachofthe

N simulated users does so independently and in parallel. As a result, the concurrency level (i.e., the numberofrequestsinthesystem)equalsNatalltimes.

The results of the stress tests for a particular request type (read a 20KB message) are shown in

Figures2-2and2-3.Inparticular,Figure2-2plotstheend-to-endresponsetimeontheleft-handaxis, andthefront-endCPUutilizationontheright-handaxis,asafunctionoftheconcurrencylevel.Ascanbe seen,theresponsetimecurvebeginstoridealongalinearasymptote(shownbythedottedline)afteronly

7concurrentusers.Thatis,responsetimeincreasesproportionallywiththenumberofusers,indicating saturation in a closed system [1]. Additionally, CPU utilization levels off after 11 users at 65-70%

(indicatinganon-CPUsystembottleneck).

1 0 0 R e s p o n s e t i m e 9 0 C P U u t i l i z a t i o n 8 0 E M I 7 0 ) T % ( E S N

N 6 0 O I O T P A S

5 0 Z E I L R I T e 4 0 U z i U l a P C m 3 0 r o N 2 0

1 0

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 # C O N C U R R E N T U S E R S Figure2-2:StressTestResults

5 O r ig i n a l I m p l e m e n t a t i o n P e r f o r m a n c e - O p t i m iz e d E M I T E S N O P S E R d e z i l a m r o N

00 1 0 . 5 2 1 3 14. 5 5 2 6 2 . 5 7 38 T H R O U G H P U T ( r e q u e s t s / s e c ) FigurFeig2u-3re:E2q-4u:ivOapletnimtLizoeaddRTeessutltRsesults

Equivalently,Figure2-3plotsend-to-endresponsetimeasafunctionofthroughput(requests/sec).As canbeseen,themaximumsystemthroughputpeaksatabout2requests/sec,andthendegradesunder overload to about 1½ requests/sec. In other words, there was actually a drop in capacity of 25% (a

“concurrencypenalty”),likelyduetocontextswitching,object/threadmanagement,garbagecollection,etc.

Asizinganalysisbasedontheexpectedcustomergrowthandusagebehavior,togetherwiththese initialcapacityresults,suggestedthattheresultinghardwarecostswouldbeprohibitivelylarge.Itwasalso clearthatthescalabilityofthisapplicationwaspoor.Theresourceconsumptionresultsdemonstratedthat theapplicationcouldnotmakefulluseoftheresourcesavailabletoit(especiallyCPU).Thus,scalability wouldhavetobeachievedbyaddingmoreservers,ratherthanmoreCPUs.

Thefirstquestiontobeansweredwaswhetherornottheapplicationwasimplementedinthemost optimal manner. Theinitialphaseoftheperformanceenhancementeffortwasinthisdirection.Several key optimizations were performed on the Java servlet code, and several critical bottlenecks were discoveredandresolvedintheend-to-endarchitecture(describedinaseparatepaper[1]).Theresulting improvementinthereadrequestthroughputisshowninFigure2-4.

Sizinganalysisontheimprovedandoptimizedcodeshowedthattheresourcerequirementswerestill quitesubstantial,andtheapplicationstillhitaprematurewall(intermsofscalability).TheCPUcontinued toleveloff,nowatabout90%.Asaconsequencetothisanalysis,anadditionaleffortwaslaunchedtore- assess the choice of the dynamic technology itself. The plan was to analyze applications that had the

6 same functionality, but were implemented in different languages or technologies, within identical environments. The technologies chosen, including the initial version, were Java servlets, Java Server

Pages(JSP),CommonGatewayInterface(CGI)withC++programs,andFastCGIwithC++programs.

ThefollowingsectiongivesabriefintroductiontotheseWeb-programmingtechnologies.

3. DYNAMICWEBPLATFORMS

TherearemyriaddifferenttechnologiesemployedtoproduceHTMLfilesdynamically,afterthearrivalof an HTTP request. CGI, FastCGI, Java Servlets, JSP, Active Server Pages (Windows NT only), PHP

Hypertext Preprocessor, JavaBeans, and Enterprise JavaBeans are among the many technologies currently used for dynamic processing of Web requests. In this effort we choose to focus on four technologies: CGI,FastCGI,JavaServletsandJSP.Thefollowingsubsectionsgiveabriefoverviewof eachofthesetechnologies.

3.1. CGI

CommonGatewayInterface(CGI)[2]wastheearliesttechnologyusedtoserviceanHTTPrequestby aprogram,ratherthanbysendingastaticfile.Inthistechnology,aprogramiswritteninanylanguage(C,

C++,,shell,etc.)forprocessingauserrequestandgeneratingoutputinabrowser-viewableformat

(HTML, GIF, etc.). When a request arrives with a URL pointing to the CGI program, the Web server creates a separate process running that particular CGI program. The Web server sets various environmentvariableswithrequestinformation(remotehost,HTTPheaders,etc.),andsendsuserinput

(form variables)totheCGIprocessoverstandardinput.TheCGIprocesswritesitsoutputtostandard output.Whentheprocessingiscomplete,theCGIprocessexits,andtheWebserversendstheoutputto thebrowser.Whenthenextrequestcomesin,allofthesestepsarerepeated.

CGI performance is affected greatly by the overhead of process creation every time a dynamic requestisserved.InUnix,theoperatingsystemcreatestheCGIprocessbyfirstcloningtheWebserver process(whichisalargeprocess)andthenstartingtheCGIprogramintheaddressspaceofthatclone.

This requires a lot of system resources and is the main reason for performance problems in this technology.

7 3.2. FastCGI

FastCGI[3]addressestheprocesscreationprobleminCGIbyhavingprocessesthatarepersistent.In thistechnology,theWebservercreatesaprocessforservingdynamicrequestseitheratstartuporafter the first dynamic request arrives. After creation, this process waits for a request from the Web server.

When the Web server gets a request that requires dynamic processing, it creates a connection to the

FastCGI process (over a pipe, if local, or TCP/IP if on a remote machine), and sends all the request informationonthisconnectiontotheprocess.Afterprocessingisdone,theFastCGIprocesssendsoutput backonthesameconnectiontotheWebserver(usingaWebserverAPI),whichinturnforwardsittothe client browser. The Web server then closes the connection, and the FastCGI process goes back to waitingforanewconnectionfromtheWebserver.Thus,FastCGIessentiallyeliminatestheoverheadof processcreation,potentiallyimprovingCGIperformancegreatly.

3.3. JavaServlets

Javawasfirstintroducedasatechnologyforsendingplatform-independentcodefromaWebserver toaWebbrowser,wherethese“Javaapplets”wouldruninsideanenvironmentcalledtheJavaVirtual

Machine(JVM).AsJavagrewinpopularity,thescopeofJavagrewtofitthedemand.“Server-sideJava applets”–Javaservlets[4]–wereintroduced,allowingdeveloperstouseJavatowriteprogramsonthe server that process dynamic requests. The main advantage of these servlets was the use of Java’s naturalmulti-threadingarchitecture.Whenarequestcomesin,anewJavathreadiscreatedtoexecute theservletprogram.TheservletaccessesuserinformationusingtheservletAPI,processesitandsends theoutputbacktotheWebserver,usingaspecialAPItointerfacewiththeWebserver.TheWebserver returnsthisoutputtothebrowser.ServletsarequitepopularbecausetheyarebasedinJavatechnology, andofferallofthefeaturesthatmakeJavaprogrammingsopopular.Initially,theywerealsoclaimedto have solved the performance problems of CGI, since thread creation is much more lightweight than processcreation.

3.4. JavaServerPages

Java Server Pages (JSP) [5] is a technology that allows programmers to cleanly separate the presentation part of an HTML file from the information part, which is created dynamically. A JSP file contains standard HTML code, interspersed with the JSP code that specifies how to “fill in” the

8 “placeholders”forthedynamiccode.Thisseparationoffunctionalityaddressesadifficultyfacedinservlet orCGIdevelopment,wheretheprogramshadtoreturnthedynamicallygeneratedcontentinHTMLformat soastobedisplayedbytheWebbrowser.Thatis,theHTMLformattingwaspartoftheservletcode,and anychangestotheHTMLdesignmeantchangingtheservletcodeitself.Developerssolvethisproblemby writingHTMLtemplatefileswith“tags”(thetechniqueusedinthisapplication),thatareprocessedbythe servlet, and replaced with dynamically generated information. However, this implied that each servlet developmentteamcameupwithitsowncustomtags.Now,JSPoffersade-factostandardfordoingjust that.AJSPfileiscompiledintoaservlet,sointheenditusesJavaservlettechnology,butwithadifferent programminginterface.

4. TESTINGENVIRONMENT&METHODOLOGY

As described in Section 2, the messaging application was originally built using a page generation engineprogrammedinJavaservlets.Thepagegenerationenginewasaspecificcaseofsoftwarethat canbetermedastemplateserverprograms.TemplateserverprogramsuseWebtemplatestospecifythe lookandfeelofaWebpage,andemploytagsthatarereadandinterpretedbytheserverprogram.For ourperformancecomparisoneffort,wechoseothersuchtemplateserverprograms.

For the CGI case, an existing C++ CGI template server program was that implemented the same functionality. The relevant changes (optimizations) were made to the original servlet implementation to bring the differences down to the essential technology differences. For example, in these tests, the servletsdonotusetheXMLinterfacefordataexchange,butcalleachotherusingmethodcallsandpass objects internally. The FastCGI implementation was done specifically for the testing, by minimally changingtheCGIC++code.

JSPisessentiallyastandardizedJavatemplateservertechnology.AnewJSPimplementationwas doneforthepurposeoftesting.

Theenvironmentusedtodotheperformancetestswasasfollows:

• Serverhardware:2x360MhzSunUltra-60,with0.5GBmemory.

• Network:GigabitEthernet.

• Serversoftware:Solaris2.6,JDK1.2.1,Jrun2.3.3servletengine.

9 • Webservers:4.1andApache1.3.121.

• Loadgenerationclient:PCwithWindowsNT.

• Loadgenerationsoftware:SilkPerformer™3.6.

Thesetupofeachofthetestswasasfollows:

• Allserverswerestoppedandre-startedbeforeeachtest.

• Eachvirtualuser(programmedinSilkPerformer)doesatypicalsessionwheretheuserlogsin,lists

messages, reads some messages of varying sizes and then logs out. There are no think times

betweenthetransactionsissuedbytheuser.Assoonastheuserprocessgetsaresponseback,it

issuesthenexttransaction.

Theteststartedwithoneuserandaddedauserevery10minutes,increasingupto30users(enoughto stressthefastestofourimplementations).

4.1. PerformanceMeasurement

Measurements were taken at both the client and the server. Client measurements were recorded usingSilkPerformer,whileservermeasurementswererecordedusingstandardUnixmeasurementtools

(sar,netstat,mpstat,ps).

Thethroughput(insessions/second)attheclientwascalculatedintheSilkPerformeruserscriptby countingthetotalnumberofsessionscompletedbyallusersinthe10-minutewindowduringwhichthe numberofusersremainedthesame,anddividingthisnumberby600seconds.Theresponsetimes(to completeonesession)wererecordedandanaverageinthe10-minutewindowwascalculated.

Ontheserverside,10-secondsnapshotsofservermeasurementsweretaken(CPU,memory),andwere used to produce the 10-minute averages. (Taking snapshots on a smaller scale allowed us to see transientstatesoftheserverresources.)

1Forimplementationreasons,wehadtousetwodifferentWebserversintheperformancetests(FastCGIcouldworkonly withtheApacheserver).TheCGIandFastCGItestsarewiththeApacheWebserver,andtheJavatestsarewiththeNetscape

Webserver.Sanitychecksweredonetoconfirmthatthisvariationinthetestenvironmentwouldnotaffectthemajorconclusionsof thisstudy.Specifically,theCGIimplementationwasmeasuredinboththeNetscapeaswellastheApacheenvironment,anddid notshowmajordifferencesthatwouldaffectanyconclusionsderivedinthispaper.

10 4.2. Converting“StressTests”to“LoadTests”

Using the averaged measurements in the 10-minute windows gives us the measurements correspondingtoincreasingnumbersofusers:i.e.responsetimewith1,2,…,30users,orCPUutilization with1,2,…,30users.However,justknowinghowmanyvirtualusersweregeneratingloadontheserver doesnotnecessarilytellustheactualloadontheserverinitscorrectunit:sessionrequests/second(e.g. seeFigures2-2and2-3).Forexample,ifasystemhasreacheditssaturationpoint,addingusersmaynot proportionallyincreasethesessionsetuprateattheserver(orevenatall).Thisisbecausetheusersare essentiallysloweddownbythesystemresponsetime,andcanonlygenerateloadasfastasthesystem capacitypermits[1].Itisimportant,therefore,toderivetherealmetricofload:thesessionrequestssent totheserverperunittime,andplotperformancemeasuresagainstthismetric.

Inourcase,theSilkPerformerscriptwasprogrammedtorecordthenumberofsessionscompleted during a 10-minute window corresponding to a certain number of users, therefore we had direct measurementsofthesessions/second(i.e.thethroughput).

In the plots depicted in the following section, each sample point (average response time, CPU utilization, page faults, etc.) corresponds to the average measurement from the 10-minute window corresponding to a certain number of users. However, the unit of the X-axis is the corresponding throughput (in sessions/second), not the number of users. Readers should keep this in mind while studyingthegraphs,assomeofthesamplepointsareclusteredtogether.Thisclusteringoccursbecause aftersystemsaturation,althoughthenumberofuserscanbeincreased,thethroughputdoesnotincrease

(itmay,infact,decrease).

4.3. Analysisapproach

Thepurposeofthisstudywasnotonlytounderstandwhichimplementationperformedbest,butalso topinpointthebottleneckresourcesforeachtechnology,andtogaininsightintowhyonetechnologyout- performedtheother.Foreachimplementation,wecarriedoutthefollowingsteps:

• Determinedthemaximumachievablethroughputusingstresstests

• Ifpossible,foundthebottleneckresource,andverifiedthattheresourcecapacitywasequaltothe

maximumthroughputachievedbythatimplementation.

11 • Wheretheexactbottleneckresourcewasnotidentifiable,weeliminatedresourcesthatcouldnot

bebottlenecks.

Wealsousedthetwoapplicationstounderstandtheeffectofthecomplexityoftheapplications.Forboth thecases,theabovestepsweredone.

ThemeasurementswetookwereCPUtime,andpagefaultrate,tomonitortheusageoftheprocessor andmemoryresource,respectively.Togainfurtherinsight,wealsostudiedtheusertimeandsystemtime usedbyeachimplementation,separately.

Theresultsandinsightsaredescribedinthenextsection.

5. RESULTSANDCOMPARISON–MESSAGINGAPPLICATION

Themainmetricusedtocomparethetechnologieswasthemaximumachievablethroughput,gated byaresponsetimecriterion.Figure5-1showstheaveragesessionresponsetimevs.sessionthroughput.

Therearetwoobservationsonemaymakefromthischart.First,thethroughputthresholdsuptowhichthe systemisessentiallystable,withresponsetimeswithin“reasonablebounds”andincreasinggracefullyare asfollows:

Technology Throughput Javaservlet 0.6sessions/second JSP 0.8sessions/second CGI 2.0sessions/second FastCGI 2.5sessions/second Table5-1:Maximumthroughputunderstablecondition

Above these thresholds, response times increase sharply, indicating that the system has become unstable.

12 ResponseTimevs.Throughput

60.0 Javaservlet 50.0 JavaServerPages CGI 40.0

s FastCGI d

n 30.0 o c e

s 20.0

10.0

0.0 0.00 0.50 1.00 1.50 2.00 2.50 sessions/second

Figure5-1:ResponseTimevs.Throughput

Second,givenaperformancerequirementof10secondsaveragesessionresponsetime,thethroughput atwhichthisrequirementcanbemet,foreachcaseis:

Technology Throughput Javaservlet 0.66sessions/second JSP 0.86sessions/second CGI 2.1sessions/second FastCGI 2.6sessions/second Table5-2:Maximumthroughputgatedbyresponsetimerequirement

Inthiscaseitturnsoutthatthethresholdsatwhichthesystemsbecomeunstablearereachedbefore theresponsetimecriterionisviolated.Thecapacityofthesystemshouldbedeterminedbytheminimum of these two, so Table 5-1 shows the capacity of each of the technologies, in terms of maximum achievablethroughput.FastCGI/C++istheclearwinnerinthismetric,followedbyCGI/C++,JSPandJava servlets, in that order. The following analysis tries to understand why this happens and what the bottlenecksencounteredbyeachtechnologyare.

13 5.1. BottleneckAnalysis

Severalsystemresourcesarepotentialbottlenecks:CPU,memory,network,back-endsystems,the loadgenerator,softwareresourcessuchasthreadlimits,buffersizes,andJava-specificresources,etc.In thefollowingsectionsweexaminethepossibilityofeachoftheseresourcesbeingthebottleneck.

CPU

WestartwithalookattheCPU.Figure5-2showsCPUutilizationplottedagainstthroughput,foreach case.WeseefromthechartthatboththeCGIandFastCGIimplementationswereabletomakefulluse oftheCPU.However,boththeJSPandJavaservletimplementationsuseuptoabout90%oftheCPU, afterwhichthethroughputstartsdroppingandtheCPUutilizationstartsdropping,althoughmoreusers arebeingadded.Thisdropaloneisnotasymptomofaproblem(CPUutilizationshouldalwaysincrease anddecreasewiththroughput).HoweverifyouobserveFigure5-1again,youcanseethatatthepoints where CPU utilization stops increasing, the response times continue increasing sharply,asmoreusers areadded.ThisleadstotheconclusionthattheJavatechnologiesreachanon-CPUbottleneck.

CPUUtilizationvs.Throughput

1.20

1.00 s y s 0.80 + r s u 0.60 : l i t

u Javaservlet 0.40 u JavaServerPages p c CGI 0.20 FastCGI

0.00 0.00 0.50 1.00 1.50 2.00 2.50 sessions/second

Figure5-2:CPUutilizationvs.throughput

14 CPUtimepersessionvs.Throughput

3000 Javaservlet n o i 2500 JavaServerPages s

s CGI e s

2000

r FastCGI e p

s 1500 c e s i l

l 1000 i m

U 500 P C

0 0.00 0.50 1.00 1.50 2.00 2.50 sessions/second

Figure5-3:CPUmspersession SomeinsightintouseoftheCPUresourcecanbegainedbycalculatingtheactualCPUtimeusedby eachsession.Ifρ istheutilizationandλ isthethroughput,thenthistimeisgivenbyτ = ρ/λ.Figure5-3 shows τ (in ms) used by each session, plotted against the throughput. This is an interesting chart; it shows quite a steady plot for the CGI/FastCGI implementations, but a highly variable one for the Java technologies. Also, asymptotically, the Java implementations show a trend of increasing CPU time per session,asthenumberofusersincreases,suggestingthatexecutiontimeintheJavaimplementationsis greatlyinfluencedbyfactorssuchasthesizeoftheJavaheap.

Technology AverageCPUmspersession(τ) XFastCGIms Javaservlet 2468ms 3.2 JSP 2131ms 2.8 CGI 981ms 1.3 FastCGI 765ms 1.0 Table5-3:AverageCPUtimepersession

TheaverageCPUtimepersessionforallimplementationsisshowninTable5-3(alongwiththefactor bywhichthisisworsethanthefastestimplementation).InthecaseoftheCGI/C++implementation,the

15 averagevalueofτ is981ms.Sincethereare2CPUsintheservermachine,theoverallCPUthroughput is2×1000/981=2.04sessions/second,whichcorrespondsalmostexactlytothemaximumthroughputas determinedinTable5-1.Thus,theCPUisthebottleneckresourceintheCGIimplementation.

A similaranalysisforFastCGIshowsCPUthroughputtobe2×1000/765=2.61sessions/second, whichagaincorrespondstothemaximumthroughputforFastCGIasdeterminedinTable5-1.Thus,the

CPUisthebottleneckresourcefortheFastCGIimplementation.

TheaverageCPUtimepersessionfortheservletimplementationis2468ms,whichimpliesthatthe

CPU throughput should be 0.8 sessions/second. This is higher than the achieved throughput of 0.6 sessions/second, and further confirms that the servlet implementation runs into a non-CPU bottleneck.

Similarly,theCPUthroughputoftheJSPimplementationis0.93,whichagainishigherthantheachieved throughputof0.8sessions/second.

AfurtheranalysisofhowtheCPUtimesareactuallyusedisshowninTable5-4.

Timespentinsystemmode Actualmsspent Actualmsspent Technology as%ofoverallCPUutilization insystemmode inusermode Javaservlet 7% 173 2295 JSP 9% 192 1939 CGI 51% 500 481 FastCGI 44% 337 428 Table5-4:BreakdownofCPUmillisecondsbyuservs.systemmode

This reaffirms the effects of the characteristics of each technology: most of the time in the CGI implementation is spent in creating the CGI process, which is done in system mode. Almost all of the improvement that FastCGI does is on this time (reducing it to 337 from 500). However, it still remains substantial, possibly due to the setting up and tearing down of connections, and due to some initial creationofApacheWebserverprocesses(untiltheyreachacertainmaximum).

Memory

SinceCPUwasnotthebottleneckintheJavaimplementations,weturnedourattentiontomemory bottlenecks.Ameasureofmemoryproblemsispagingactivity;specifically,pagefaults.Pagefaultsoccur eitherwhenmultipleprocessessharingapagetrytowritetoit,inwhichcaseacopyofthepagemustbe made,orwhenneededpageshavebeenreclaimedbytheoperatingsystem.Pagefaultactivityincreases whenaprocesssizebecomestoolarge,whenafileisreadintomemory,orwhenaprocessisstarted,

16 sincethisinvolvesbringingcodeintomemory.Figure5-4showspagefaults/secondvs.sessions/second.

ThechartshowsthatinfactitistheCGIimplementationthathasthehighestindicatorsofpagingactivity.

(Note that the Y-axis has a logarithmic scale). The paging activity for the FastCGI and the Java technologies is an order of magnitude lower than that of CGI. The high paging activity in the CGI implementationisclearlyduetotheprocesscreationthatoccurswheneveradynamicpagegeneration requestarrives.TheinitialincreaseinpagingactivityintheFastCGIcaseisduetothecreationofApache

Webserverprocesses.Theactivitylevelsoffoncethespecifiedmaximumnumberofprocesseshasbeen created.SincetheCGIandFastCGIimplementationsshowedhigherthroughputs,weruledoutpagingas thebottleneckintheJavaimplementations.

PageFaults/secvs.Throughput

10000.00 n o i 1000.00 s s

e Javaservlet s r JavaServerPages e

p 100.00 CGI s t l FastCGI u a f

e 10.00 g a P

1.00 0.00 0.50 1.00 1.50 2.00 2.50 sessions/second

Figure5-4:PagingComparison

Otherresources

FortheCGI/FastCGIC++implementations,theCPUistheclearbottleneck.InthecaseoftheJava implementations, we can rule out the capacity of the back-end mail and directory servers, the load generationsystem,andtheLANasthebottleneck,sinceahigherthroughputwasachievedbytheCGI technologies.Thisanalysispointstoonlyonepossibility,i.e.thatofasoftwarebottleneck.SincetheJava

17 implementationswerehighlyoptimized,weconcludedthatthesewerenotduetopoorimplementation,but rathertobottlenecksinherentintheJavatechnology.

6. RESULTSANDCOMPARISON–TRIVIALAPPLICATION

TheanalysisinSection5showedthattimespentintheJavaservletswasmainlyinusermode,and althoughthistechnologyimprovessubstantiallyontheprocesscreationoverhead,itlostthatbenefitdue tothepoorperformanceoftheactualapplicationcode.Thiswasmostlikelybecausetheapplicationbeing studiedwasacomplexone,involvingparsingtext,connectingtoremotemachines,encrypting/decrypting

(ofsessioncookies),etc.Itisfair,therefore,tocomparethesetechnologiesinanotherscenario,where theactualprocessingdonewassimple.Asanextremecase,wecarriedoutasetofperformancetestsin thecasewhereeachofthedynamictechnologieswasusedtrivially,tosimplyreturnastaticfile.

Figure6-1showsachartofresponsetimevs.throughputfortwotechnologiesonwhichtestswere done:Javaservlet,andC++CGI.Asessioninthistestincludesthesametransactionsasintheearlier test,exceptthatthefilesreturnedasaresultofthetransactionarestaticHTMLfiles.

ResponseTimevs.Throughput

18.0

16.0

14.0 Javaservlet CGI 12.0 s

d 10.0 n o

c 8.0 e

s 6.0

4.0

2.0

0.0 0 1 2 3 4 5 6 7 8 sessions/second

Figure6-1:ResponseTimevs.ThroughputComparison–trivialcomputation

18 Table 6-1 shows the maximum achieved throughput under stable conditions by each of the technologies.ThistableshowsaveryinterestingreversaloforderofperformancecomparedtoSection52.

Infact,itconfirmstheexpectationthatifprocessingisnotdominatedbyuser-modeprocessing,andis simpleenoughthatitdoesnotexposeunderlyingJavabottlenecks,thenJavaservletscanbeveryfast.

Technology MaximumThroughput(sessions/sec) JavaServlet 7.2 CGI 2.8 Table6-1:MaximumThroughput-trivialcomputation

CPUUtilizationvs.Throughput

1 0.9

0.8 s y

s 0.7 +

r 0.6 s u

0.5 : l i

t 0.4 u

u 0.3 p

c 0.2 Javaservlet CGI 0.1

0 0 1 2 3 4 5 6 7 8 sessions/second

Figure6-2:CPUutilizationvs.throughput–trivialcomputation

Figure 6-2 shows that in this case, it is the C++ CGI implementation that runs intoanon-CPU bottleneck (CPU utilization peaks at 93%, then flattens out, even as throughput degrades). The Java servlet in this case drives the CPU to its maximum capacity. Figure 6-3 also shows the CPU time per

2Notethatthetestsinthissectionwerecarriedoutunderasomewhatdifferentconfigurationthanthemessagingapplication tests (Netscape 3.6, instead of Netscape 4.1 and Apache Web servers). Therefore, absolute results from these tests (such as throughput, CPU milliseconds, page faults, etc.) should not be compared with the corresponding tests in Section 5. We only comparetestswithinthesesectionsandtherelativeorderofperformanceofthetechnologiesacrossthetwotypesoftests.

19 session vs. throughput. As the throughput rate increases, the CPU time per session used by the CGI implementationincreases,whereastheJavaservletCPUtimepersessionlevelsoff.

Table6-2showsthebreakdownoftheCPUtimeintousermodeandsystemmodecomputation.The

CGI-C++ implementation spends an average of 80% of its processing time in the system mode. The expectationisthatthisisduetotheexcessiveamountofprocesscreationthatthisimplementationneeds todo.Figure6-4confirmsthis:thepagefaultrateoftheCGIimplementationistwoordersofmagnitude largerthanthatoftheservletimplementation.Clearly,thesystemisthrashing–itisspendingallitstimein pagingactivitywhilespendinglessertimedoingusefulwork,resultinginthethroughputdroppingsharply.

CPUtimeper %oftotalCPUtime Actualmsin Actualmsin Technology transactioninms spentinsystemmode systemmode usermode Servlet 237 38% 90 147 CGI 715 81% 579 136 Table6-2:CPUmillisecondsspentinuservs.systemmode:trivialcomputation

CPUtimepersessionvs.Throughput

1000 900 n o i Javaservlet

s 800

s CGI e 700 s r

e 600 p

s 500 c e

s 400 i l l i 300 m 200 U P

C 100 0 0 1 2 3 4 5 6 7 8 sessions/second

Figure6-3:CPUmspersession–trivialcomputation

20 PageFaults/secvs.Throughput

100000.00

n 10000.00 o i s s e s

1000.00 r e p s t l 100.00 u a f e

g 10.00 Javaservlet a

P CGI

1.00 0 1 2 3 4 5 6 7 8 sessions/second

Figure6-4:PagingActivityComparison

7. SUMMARYANDINSIGHTS

Thetwosuitesofteststhatwerecarriedoutprovidedimportantinsightsintotheperformancebehaviorof different Web programming technologies. In summary, the following explains why the order of performanceseenwasthewayitwasinthetestingofthemessagingapplication:

• Java servlets were the worst performing because the implementation and the technology involved

essentiallythreestepsofinterpretationofaprogramatdifferentlevels:

1. Theproprietaryscriptinglanguagethatwas“interpreted”bytheJavaservlet.

2. TheJavaservletthatwasinterpretedby(oratleastraninside)aJavaVirtualMachine.

3. TheJavaVirtualMachinethatitselfranonthehostCPU.

The existence of these three layers contributes to the large amount of CPU used. Additionally, the

JavaVirtualMachinelayercontributestothenon-CPUbottlenecksobservedintheanalysis.

• JavaServerPagesimprovedonJavaservletsbecauseittookoutthefirstofthethreestepslisted

above.IntheJSPimplementation,afilewiththeHTMLscript,andtheJavacodethatconstitutesthe

dynamicpartofthefile,arecompiledasawholeintooneJavaservlet.Thus,whenarequestarrives,

21 theservletruns,fetchingthenecessarydatafromtheback-endsystems,butitnolongerinterpretsa

templatefile.ThisreducestheCPUoverhead,whichresultsinJSPperformancebeingbetterthanthe

servlet/template combination. However, since JSP is a Java technology, it eventually runs into the

samesoftwarebottlenecksatthevirtualmachinelevelthattheservletimplementationencounters.

• TheCGI/C++implementationbasicallydoessteps1and3ofthestepsoutlinedabove(inadifferent

way).Itisnativecodeinterpretingatag-basedtemplatefile,whichproducesanHTMLfileasaresult.

Nothavingtogothroughthe2ndlayerreducedtheCPUoverheadgreatly.Additionally,thereareno

implicitsoftwareresourcesormechanismsthatbecomebottlenecks.Thus,thisimplementationwas

abletousetheCPUmoreefficientlyandmorefully(drivingittoitsmaximumcapacity).However,the

CPUspendsalotoftimeinsystemmodedoingworkrelatedtocreationoftheCGIprocesses.

• FastCGI eliminates the CGI overhead of recurrent process creation, and consequently reduces the

pagingoverhead.TheCPUoverheadisreduced,andFastCGIimprovesperformanceoverCGI.Asin

CGI,FastCGIisunencumberedbyanyofthesoftwarebottlenecksthatarecharacteristicoftheJava

technology.Therefore,FastCGIisthebestperformingofthetechnologiescompared.

Insummary,thereasonwhytheperformanceorderwasreversedinthetrivialapplicationtestwasthat the actual processing was much smaller compared to the overhead associated with each technology.

SincetheJavaservlettechnologyhastheleastpre-computationoverhead(creatingathreadinsteadofa process),itwasthebestperforminginthistest.

8. CONCLUSIONS

Weusedstresstestingandmeasurementmethodologytodeterminetheperformanceprosandcons ofusingaparticulartechnologytoimplementatemplateserver.Inthisstudy,FastCGI/C++turnedoutto bethebestchoiceofdynamicWebplatformfromtheperformancepointofview.Thisisbecausemost applications supporting a major service will likely be complex, thus exposing the software bottlenecks inherentinJavatechnology.OntheotherhandthisstudyshowedthatiftheWebprogramsaresimple, thentheperformancepenaltiesthatJavaimposesaresmall,andJSPmaybetherightchoice.

In reality, in most of the Web-based services, all priorities are subject to an overall cost-benefit analysis, not focused just on performance. Product management teams may set performance requirementsinitially;however,ifaJava-basedtechnologyofferslessdevelopmenttime,fewerproblems

22 tosystemintegrationanddeploymentteams,andwidersupport,thentherequirementsmayberelaxed.

Further, Java technology is constantly improving, and any measurement study should be updated with newer versions of the technology. In any scenario, however, comparative performance measurement studies are always called for - this work was illustrativeofhowsuchastudycanbedoneusingstress testing,measurement,andbottleneckanalysis.

ACKNOWLEDGEMENTS

TheauthorsthankMehdiHosseini-Nasab,Wai-SumLai,andPatriciaWirthfortheirhelpfulcomments.

REFERENCES 1. Reeser, “Using Stress Test Results to Drive Performance Modeling: A Case Study in “Gray-Box” VendorAnalysis”,submittedtothe17thInternationalTeletrafficCongress(ITC),September2001,Brazil. 2. “The Common Gateway Interface”, Internet document by the NCSA Software Development Group, http://www.w3.org/CGI. 3.“FastCGI:AHighPerformanceWebServerInterface”,TechnicalWhitePaperbyOpenMarket,Inc., http://www.fastcgi.com/devkit/doc/fastcgi-whitepaper/fastcgi.htm,April1996. 4. “An Introduction to Java Servlets”, by Hans Bergsten, The Web Developers Journal, http://www.webdevelopersjournal.com/articles/intro_to_servlets.html,March1999. 5.“JavaServerPages”,http://java.sun.com/products/jsp.

23