Resilience Testing
Total Page:16
File Type:pdf, Size:1020Kb
Workshop: Resilience Testing Geoffrey Arij van der Tas Mark Abrahams About us! About Today • Setup our lab environment • Theory: • What is Resilience? • Resilience • How to test Resilience? • Load Testing • Resilience Testing • Automated Resilience Testing What is Resilience? “ Resilience is the ability of a system to withstand a major disruption within acceptable degradation parameters and to recover within an acceptable time and composite costs and risks. ” Becoming Resilient Examples: Load Balancing Stand-by servers Infra Examples: People & Re-try Pattern Software Examples: Processes Circuit Breaker Pattern Yearly Datacenter Switch-over Stand-by shifts To become Resilient • Prevent: Measures you take to make sure problems do not occur • Resist: To stand firm, to withstand problems that occur • Remediate: Is to correct or make the service again right Prevent • Recover: Restore the service as quickly as possible Recover Resist Remediate Now it is up to you.. Our App VM Front End API DB Installing Lab environment from HDD • Install Virtualbox • Install JDK • Install Putty or other ssh client • Copy Gatling • Install Notepad ++ or an IDE/other Scala editor • Make sure you set your JAVA_HOME to the JDK Setup carshare VM server • Get the .ova file from USB drive • Import the .ova file by clicking or via Virtualbox -> file -> import appliance • Make sure network connection is set to bridged mode: • Settings -> network -> network -> from dropdown choose “Bridged Adapter” • Start vm and login with user “root” and password “root” • Run command “ifconfig” and get ip from “eth0” interface • Run command “start-carshare” • Navigate in browser to url “http://{{OBTAINED_IP}}:8080” • Connect to the Virtual Box via Putty {{OBTAINED_IP}}:22 Load Testing “Load testing is the process of putting demand on a system and measuring its response” Most Populair Tools: JMeter, Gatling, Neoload & Loadrunner . We are going to use: Gatling. Forms of Performance Testing • Load Testing • Spike Testing • Stress Testing • Endurance Testing Load testing Baseline scripts: Load test Stress test VM Front End API DB Load Test Script: carshareBrowse.Load Users: 20 Runtime: 5 minutes Go to Gatling\user-files\simulations\carshare Open the file called load (with Notepad ++ or a IDE (with scala plugin)) Change .baseUrl (“http://OBTAINED_IP:8080”) Stress Test Script: carshareBrowse.Stress Runtime: 5 Minutes Users: 1000+ Go to Gatling\user-files\simulations\carshare Open the file called stress(with Notepad ++ or a IDE (with scala plugin)) Change .baseUrl (“http://OBTAINED_IP:8080”) Resilience Testing “Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.” Tools: Stress, Nstress Resilience testing Scripts: Load test VM Front End API DB CPU Spike on VM # Stress CPU with one worker process for 20 seconds stress –c 1 –t 20 Memory Spike on VM # Stress memory for 20 seconds stress –m 5 -t 20 IO Spike on VM # Stress io for 60 seconds stress –i 1 –t 60 Network Failure on VM #Short network outage /etc/init.d/networking restart Automated Resilience Testing “Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.” Most Populair Tools: Chaos monkey, Gremlin. We are going to use: Chaos monkey for kubernetes. Test results – normal load, normal circumstances Test app as monolith Application Test app as microservice (non HA) Test results – normal load, pod failure chaos Test app as microservice (HA) Test results – normal load, pod failures chaos HA Resilience testing Why: To help you build more stable application and perform well in real life situations; To avoid: Service loss, Data loss, Customer loss! How: By creating chaos, testing ‘What if scenarios’ & investigate what happens; What: Be creative and use the following tools for example: Tools you can use Load/Stress tool Resilience Stress tool - Nstress - Chaos monkey - Stress - Gremlin Application Load Monitoring - Gatling - ELK stack - Loadrunner - Dynatrace - Jmeter - Prometheus - NeoLoad Credentials: Geoffrey van der Tas Mark Abrahams https://www.linkedin.com/in/geoffreyvdtas/ https://www.linkedin.com/in/mark-abrahams- b8218129/ [email protected] @gavdtas [email protected] http://geoffreyvdtas.com Want to read more • https://gatling.io/ • https://usersnap.com/blog/resilience-testing/ • https://www.ibm.com/cloud/blog/resilience-testing-insights-from-the-pros • https://medium.com/netflix-techblog/fit-failure-injection-testing- 35d8e2a9bb2 • https://medium.com/netflix-techblog • https://www.gremlin.com/community/tutorials/chaos-engineering-the- history-principles-and-practice/ • http://geoffreyvdtas.com/blog (a blog with more information will follow soon) or https://drive.google.com/open?id=19UC8m9eLyrViGLdkg7hn70IngIoqIWO1 - to download the stuff from the workshop.