Workshop: Resilience Testing

Geoffrey Arij van der Tas Mark Abrahams About us! About Today

• Setup our lab environment • Theory: • What is Resilience? • Resilience • How to test Resilience? • • Resilience Testing • Automated Resilience Testing What is Resilience?

“ Resilience is the ability of a system to withstand a major disruption within acceptable degradation parameters and to recover within an acceptable time and composite costs and risks. ” Becoming Resilient Examples: Load Balancing Stand-by servers Infra

Examples: People & Re-try Pattern Examples: Processes Circuit Breaker Pattern Yearly Datacenter Switch-over Stand-by shifts To become Resilient

• Prevent: Measures you take to make sure problems do not occur • Resist: To stand firm, to withstand problems that occur • Remediate: Is to correct or make the service again right Prevent • Recover: Restore the service as quickly as possible

Recover Resist

Remediate Now it is up to you.. Our App

VM

Front End API DB Installing Lab environment from HDD

• Install Virtualbox • Install JDK • Install Putty or other ssh client • Copy Gatling • Install Notepad ++ or an IDE/other Scala editor

• Make sure you set your JAVA_HOME to the JDK Setup carshare VM server

• Get the .ova file from USB drive • Import the .ova file by clicking or via Virtualbox -> file -> import appliance • Make sure network connection is set to bridged mode: • Settings -> network -> network -> from dropdown choose “Bridged Adapter” • Start vm and login with user “root” and password “root” • Run command “ifconfig” and get ip from “eth0” interface • Run command “start-carshare” • Navigate in browser to url “http://{{OBTAINED_IP}}:8080” • Connect to the Virtual Box via Putty {{OBTAINED_IP}}:22 Load Testing

“Load testing is the process of putting demand on a system and measuring its response”

Most Populair Tools: JMeter, Gatling, Neoload & Loadrunner .

We are going to use: Gatling. Forms of Performance Testing

• Load Testing • Spike Testing • Stress Testing • Endurance Testing Load testing Baseline scripts: Load test Stress test

VM

Front End API DB Load Test

Script: carshareBrowse.Load Users: 20 Runtime: 5 minutes

Go to Gatling\user-files\simulations\carshare Open the file called load (with Notepad ++ or a IDE (with scala plugin))

Change .baseUrl (“http://OBTAINED_IP:8080”) Stress Test

Script: carshareBrowse.Stress Runtime: 5 Minutes Users: 1000+

Go to Gatling\user-files\simulations\carshare Open the file called stress(with Notepad ++ or a IDE (with scala plugin))

Change .baseUrl (“http://OBTAINED_IP:8080”) Resilience Testing

“Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.”

Tools: Stress, Nstress Resilience testing Scripts: Load test

VM

Front End API DB CPU Spike on VM

# Stress CPU with one worker process for 20 seconds stress –c 1 –t 20 Memory Spike on VM

# Stress memory for 20 seconds stress –m 5 -t 20 IO Spike on VM

# Stress io for 60 seconds stress –i 1 –t 60 Network Failure on VM

#Short network outage /etc/init.d/networking restart Automated Resilience Testing

“Resilience testing measures the ability to absorb the impact of a problem in one or more parts of a system.”

Most Populair Tools: Chaos monkey, Gremlin.

We are going to use: Chaos monkey for kubernetes. Test results – normal load, normal circumstances Test app as monolith

Application Test app as microservice (non HA) Test results – normal load, pod failure chaos Test app as microservice (HA) Test results – normal load, pod failures chaos HA Resilience testing

Why: To help you build more stable application and perform well in real life situations;

To avoid: Service loss, Data loss, Customer loss!

How: By creating chaos, testing ‘What if scenarios’ & investigate what happens;

What: Be creative and use the following tools for example: Tools you can use

Load/Stress tool Resilience Stress tool - Nstress - Chaos monkey - Stress - Gremlin

Application Load Monitoring - Gatling - ELK stack - Loadrunner - Dynatrace - Jmeter - Prometheus - NeoLoad Credentials:

Geoffrey van der Tas Mark Abrahams https://www.linkedin.com/in/geoffreyvdtas/ https://www.linkedin.com/in/mark-abrahams- b8218129/ [email protected] @gavdtas [email protected] http://geoffreyvdtas.com

Want to read more

• https://gatling.io/ • https://usersnap.com/blog/resilience-testing/ • https://www.ibm.com/cloud/blog/resilience-testing-insights-from-the-pros • https://medium.com/netflix-techblog/fit-failure-injection-testing- 35d8e2a9bb2 • https://medium.com/netflix-techblog • https://www.gremlin.com/community/tutorials/chaos-engineering-the- history-principles-and-practice/ • http://geoffreyvdtas.com/blog (a blog with more information will follow soon) or https://drive.google.com/open?id=19UC8m9eLyrViGLdkg7hn70IngIoqIWO1 - to download the stuff from the workshop