Workshop: Resilience Testing
Geoffrey Arij van der Tas Mark Abrahams About us! About Today
• Setup our lab environment • Theory: • What is Resilience? • Resilience • How to test Resilience? • Load Testing • Resilience Testing • Automated Resilience Testing What is Resilience?
“ Resilience is the ability of a system to withstand a major disruption within acceptable degradation parameters and to recover within an acceptable time and composite costs and risks. ” Becoming Resilient Examples: Load Balancing Stand-by servers Infra
Examples: People & Re-try Pattern Software Examples: Processes Circuit Breaker Pattern Yearly Datacenter Switch-over Stand-by shifts To become Resilient
• Prevent: Measures you take to make sure problems do not occur • Resist: To stand firm, to withstand problems that occur • Remediate: Is to correct or make the service again right Prevent • Recover: Restore the service as quickly as possible
Recover Resist
Remediate Now it is up to you.. Our App
VM
Front End API DB Installing Lab environment from HDD
• Install Virtualbox • Install JDK • Install Putty or other ssh client • Copy Gatling • Install Notepad ++ or an IDE/other Scala editor
• Make sure you set your JAVA_HOME to the JDK Setup carshare VM server
• Get the .ova file from USB drive • Import the .ova file by clicking or via Virtualbox -> file -> import appliance • Make sure network connection is set to bridged mode: • Settings -> network -> network -> from dropdown choose “Bridged Adapter” • Start vm and login with user “root” and password “root” • Run command “ifconfig” and get ip from “eth0” interface • Run command “start-carshare” • Navigate in browser to url “http://{{OBTAINED_IP}}:8080” • Connect to the Virtual Box via Putty {{OBTAINED_IP}}:22 Load Testing
“Load testing is the process of putting demand on a system and measuring its response”
Most Populair Tools: JMeter, Gatling, Neoload & Loadrunner .
We are going to use: Gatling. Forms of Performance Testing
• Load Testing • Spike Testing • Stress Testing • Endurance Testing Load testing Baseline scripts: Load test Stress test
VM
Front End API DB Load Test
Script: carshareBrowse.Load Users: 20 Runtime: 5 minutes
Go to Gatling\user-files\simulations\carshare Open the file called load (with Notepad ++ or a IDE (with scala plugin))
Change .baseUrl (“http://OBTAINED_IP:8080”) Stress Test
Script: carshareBrowse.Stress Runtime: 5 Minutes Users: 1000+
Go to Gatling\user-files\simulations\carshare Open the file called stress(with Notepad ++ or a IDE (with scala plugin))
Change .baseUrl (“http://OBTAINED_IP:8080”) Resilience Testing
“Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.”
Tools: Stress, Nstress Resilience testing Scripts: Load test
VM
Front End API DB CPU Spike on VM
# Stress CPU with one worker process for 20 seconds stress –c 1 –t 20 Memory Spike on VM
# Stress memory for 20 seconds stress –m 5 -t 20 IO Spike on VM
# Stress io for 60 seconds stress –i 1 –t 60 Network Failure on VM
#Short network outage /etc/init.d/networking restart Automated Resilience Testing
“Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.”
Most Populair Tools: Chaos monkey, Gremlin.
We are going to use: Chaos monkey for kubernetes. Test results – normal load, normal circumstances Test app as monolith
Application Test app as microservice (non HA) Test results – normal load, pod failure chaos Test app as microservice (HA) Test results – normal load, pod failures chaos HA Resilience testing
Why: To help you build more stable application and perform well in real life situations;
To avoid: Service loss, Data loss, Customer loss!
How: By creating chaos, testing ‘What if scenarios’ & investigate what happens;
What: Be creative and use the following tools for example: Tools you can use
Load/Stress tool Resilience Stress tool - Nstress - Chaos monkey - Stress - Gremlin
Application Load Monitoring - Gatling - ELK stack - Loadrunner - Dynatrace - Jmeter - Prometheus - NeoLoad Credentials:
Geoffrey van der Tas Mark Abrahams https://www.linkedin.com/in/geoffreyvdtas/ https://www.linkedin.com/in/mark-abrahams- b8218129/ [email protected] @gavdtas [email protected] http://geoffreyvdtas.com
Want to read more
• https://gatling.io/ • https://usersnap.com/blog/resilience-testing/ • https://www.ibm.com/cloud/blog/resilience-testing-insights-from-the-pros • https://medium.com/netflix-techblog/fit-failure-injection-testing- 35d8e2a9bb2 • https://medium.com/netflix-techblog • https://www.gremlin.com/community/tutorials/chaos-engineering-the- history-principles-and-practice/ • http://geoffreyvdtas.com/blog (a blog with more information will follow soon) or https://drive.google.com/open?id=19UC8m9eLyrViGLdkg7hn70IngIoqIWO1 - to download the stuff from the workshop