Increasing the Throughput of a Node.Js Application Running on the Heroku Cloud App Platform
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2016 Increasing the Throughput of a Node.js Application Running on the Heroku Cloud App Platform NIKLAS ANDERSSON ALEKSANDR CHERNOV KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY Abstract The purpose of this thesis was to investigate whether utilization of the Node.js Cluster module within a web application in an environment with limited resources (the Heroku Cloud App Platform) could lead to an increase in throughput of the application and, in the case of an increase, how substantial it was. This has been done by load testing an example application when utilizing the module and without utilizing it. In both scenarios, the traffic sent in to the application varied from 10 requests/second to 100 requests/second. For the tests conducted on the application utilizing the module the number of worker process used within the application varied between 1 and 16. Furthermore, the tests were first conducted in a local environment in order to establish any increases in throughput in a stable environment, and, in case there were notable differences in throughput of the application, the same tests were conducted on the Heroku Cloud App Platform. Each test was also aimed towards testing one of two different types of tasks performed by the application: I/O or CPU bound. From the test results, it could be derived that utilization of the Cluster module did not lead to any increases in throughput when the application was doing I/O bound tasks in neither of the environments. However, when doing CPU bound tasks, it led to a ≥20% increase when the traffic sent to the application in the local environment was 10 requests/second or higher. The same increase could be seen when the traffic sent to the application was 50 requests/second or higher in the Heroku environment. The conclusion was, thus, that utilization of the module would be useful for the company (that this thesis took place at) in case an application installed on Heroku was exposed to higher traffic. Keywords Throughput, Node.js, Heroku, Performance, Increasing Abstract Syftet med detta examensarbete var att undersöka om huruvida nyttjande av Node.jsmodulen Cluster i wen webbapplikation i en miljö med begränsade resurser (Heroku cloud appplattformen) skulle kunna leda till en ökning i throughput hos applikationen, och om det skedde en ökning – hur stor var då denna? Detta har gjorts genom att belastningstesta en exempelapplikation nyttjande modulen och utan den. I båda scenarier varierade trafiken som skickades till applikationen mellan 10 och 100 requests/sekund. För testerna utförda i applikationen som nyttjade modulen varierade antalet workerprocesser mellan 1 och 16. Vidare utfördes testerna i den lokala miljön med målet att slå fast möjlig throughputökning i en stabil miljö först, och om det fanns några märkbara skillnaden i throughput hos applikationen skulle samma tester även utföras på Heroku app cloudplattformen. Varje test strävade också för att testa en av två olika typer av arbetsuppgifter utförda av applikationen: I/O eller CPUbundna. Från testresultatet kunde det fastslås att: Clustermodulen ledde inte till några ökningar vad gällde throughput när applikationen gjorde I/Obundna arbetsuppgifter i någon av miljöerna. När applikationen däremot gjorde CPUbundna arbetsuppgifter ledde det till en ökning på ≥20% när trafiken var 10 requests/sekund eller högre. Samma ökning kunde ses först när trafiken kommer över 50 requests/sekund eller högre i Herokumiljön. Slutsatsen var därmed att användande av modulen skulle vara användbart för företaget som arbetet uträttades hos om en applikation som låg installerad på Heroku utsattes för vad som ansågs vara högre trafik. Nyckelord Throughput, Node.js, Heroku, Prestanda, Öka 1 Table of Contents Abstract (in English) Abstract (in Swedish) Table of Contents 1 Introduction………………………………………………………………………………………………5 1.1 Background………………………………………………………………………………………..5 1.1.1 Increasing Throughput………………………………………………………………...6 1.1.2 Node.js……………………………………………………………………………………….6 1.1.3 The Heroku Cloud App Platform…………………………………………………..6 1.1.4 Web Applications………………………………………………………………………..7 1.2 Problem……………………………………………………………………………………………..7 1.3 Research Questions…………………………………………………………………………….7 1.4 Purpose……………………………………………………………………………………………..8 1.5 Delimitations……………………………………………………………………………………..8 1.6 Disposition………………………………………………………………………………………...9 2 Theoretical Background…………………………………………………………………………….10 2.1 The Company Platform……………………………………………………………………..10 2.2 Heroku Dyno…………………………………………………………………………………....11 2.3 I/O vs. CPU bound……………………………………………………………………………12 2.4 The Inner Workings of Node.js……………………………………………………..13 2.5 Increasing Throughput in Node.js Using the Cluster Module…………...14 2.6 Related Work………………………………………………………………………………15 3 Research Process……………………………………………………………………………………...17 3.1 Research Methodology……………………………………………………………………….17 3.2 Process Overview………………………………………………………………………………18 3.2.1 Problem Definition……………………………………………………………………18 3.2.2 Data Collection………………………………………………………………………...19 3.2.3 Design & Implementation………………………………………………………....20 3.2.4 Defining the Testing Environments…………………………………………...20 3.2.5 Creating Test Plan…………………………………………………………………….20 3.2.6 Results and Analysis…………………………………………………………………20 3.2.7 Evaluation………………………………………………………………………………..21 3.3 Hypotheses……………………………………………………………………………………...21 4 Analysis: How to Increase Throughput………………………………………………………22 4.1 Our approach…………………………………………………………………………………..22 4.1.1 Different Implementations of the Cluster Module………………………..22 4.1.2 Clustering Method Chosen When Creating the Application Template…………………………………...23 2 4.2 The Application Template………………………………………………………………...23 4.2.1 CPU Usage……………………………………………………………………………….25 4.2.2 Workload………………………………………………………………………………...26 4.2.3 Memory Usage………………………………………………………………………....27 4.3 Test Application……………………………………………………………………………….27 5 Analysis: Benchmarking the Test Application……………………………………………..28 5.1 Testing Environment………………………………………………………………………..28 5.1.1 Local Environment…………………………………………………………………….29 5.1.2 Heroku Environment…………………………………………………………………29 5.1.3 The Test Application’s Memory Usage………………………………………...29 5.2 Testing Tools……………………………………………………………………………………30 5.2.1 Apache JMeter…………………………………………………………………………..31 5.2.2 Heroku Metrics………………………………………………………………………...32 5.3 Creating the Test Plan……………………………………………………………………….33 5.4 Local Tests………………………………………………………………………………………33 5.4.1 I/O Bound………………………………………………………………………………..34 5.4.2 CPU Bound……………………………………………………………………………...35 5.5 Heroku Tests…………………………………………………………………………………...36 5.5.1 Throughput Rates……………………………………………………………………..37 5.5.2 Memory Usage…………………………………………………………………………39 5.5.3 Median Reponse Times…………………………………………………………….40 5.5.4 Analysis of Heroku Test Results………………………………………………...41 6 Discussion……………………………………………………………………………………………….43 6.1 Our Methodology and Consequences of the Study…………………………….....43 6.2 Discussion and Conclusions……………………………………………………………....44 6.2.1 Recommendations Concerning the Application Template…………….45 6.3 Ethics……………………………………………………………………………………………...46 6.4 Sustainability…………………………………………………………………………………...46 6.5 Future Work…………………………………………………………………………………….47 References Appendix 1 Heroku Dyno CPU Information………………………………………………...52 Appendix 2 The Test Application……………………………………………………………….58 Appendix 3 The Application Template………………………………………………………..60 Appendix 4 The Local Server CPU Specifications………………………………………...61 Appendix 5 Results from I/O Bound Tests in Local Environment…………………63 Appendix 6 Results from CPU Bound Tests in Local Environment………………..65 Appendix 7 Results from CPU Bound Tests on Heroku………………………….…….67 3 1 Introduction Today, virtually every company with a presence on the Internet collects data [1] concerning their customers in some form . With a large collection of customer profiles it is possible to collect information concerning the customer’s geographical area, what products the customer has viewed, what devices the customer is using etc. With this data, customer communication can be improved, marketing can be optimized (through a more welltargeted informational flow), and all customer information can be stored in one single virtual space. Data can come from different sources: web analyticstools, login processes, email, etc. It can also be required to collect data from different physical nodes; it might be located in different data warehouses, and can even be administered by different third party companies. For a large company the collected data may grow very large and there might be a lot of daily transactions. It is therefore important that these transactions are consistent, that data is preserved, and that the application can handle as much traffic as possible. One way of making sure that the application is adapted to do this is by assuring that it can handle as many requests per time unit as possible. This leads to the application being able to serve more clients, thus lowering the risk of a client not receiving the requested data. 1.1 Background Innometrics, the company that the project took place at, is active within the area just described above. Their product helps other companies personalize their marketing strategies by collecting data from a customer’s different data warehouses,