Performance Tuning NGINX Name: Amir Rawdat

Currently: Technical Marketing Engineer at NGINX inc. Previously: - Customer Applications Engineer at Nokia inc. Multi-Process Architecture with QPI Bus Topology

wrk nginx Topology

wrk nginx nginx

6 Technical Specifications

# Sockets # Cores # Model RAM OS NIC per Threads Name Socket per Core

Client 2 22 2 Intel(R) 128 GB Ubuntu 40GbE Xeon(R) CPU Xenial QSFP+ E5-2699 v4 @ 2.20GHz

Web Server 2 24 2 Intel(R) 192 GB Ubuntu 40GbE Xeon(R) & Platinum Xenial QSFP+ Reverse 8168 CPU @ Proxy 2.70GHz

Multi-Processor Architecture #1 Duplicate NGINX Configurations

9 Multi-Processor Architecture NGINX Configuration (Instance 1)

user root; worker_processes 48 ; worker_cpu_affinity auto 000000000000000000000000111111111111111111111111000000000000000000000000111111111111111111111111; worker_rlimit_nofile 1024000; error_log /home/ubuntu/access.error error;

….. …….

11 NGINX Configuration (Instance 2)

user root; worker_processes 48 ; worker_cpu_affinity auto 111111111111111111111111000000000000000000000000111111111111111111111111000000000000000000000000; worker_rlimit_nofile 1024000; error_log /home/ubuntu/access.error error;

……. …….

12 Deploying NGINX Instances

$ nginx – /path/to/configuration/instance-1 $ nginx –c /path/to/configuration/instance-2 $ ps aux | grep nginx

nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx_0.conf nginx: worker process nginx: worker process

nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx_1.conf nginx: worker process nginx: worker process

$ pkill nginx

13 #2 Additional NGINX Configuration Directives

14 Web Server (Instance 1)

events { worker_connections 1000000; } http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

server { listen 10.10.16.10:443 backlog=250000 reuseport; root /usr/share/nginx/bin; } } Web Server (Instance 2)

events { worker_connections 1000000; } http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

server { listen 10.10.11.23:443 backlog=250000 reuseport; root /usr/share/nginx/bin; } }

16 Reverse Proxy (Instance 1)

server { listen 10.10.10.18:443 ssl backlog=102400 reuseport; ssl_certificate /etc/ssl/certs/nginx.pem; ssl_certificate_key /etc/ssl/private/nginx.key; ssl_session_cache off; ssl_session_tickets off;

location / { proxy_http_version 1.1; proxy_set_header Connection ""; proxy_pass http://webserver_0; } } upstream webserver_0 { server 10.10.10.11:80; keepalive 200; } }

17 Reverse Proxy (Instance 2)

server { listen 10.10.15.9:443 ssl backlog=102400 reuseport; ssl_certificate /etc/ssl/certs/nginx.pem; ssl_certificate_key /etc/ssl/private/nginx.key; ssl_session_cache off; ssl_session_tickets off;

location / { proxy_http_version 1.1; proxy_set_header Connection ""; proxy_pass http://webserver_1; } } upstream webserver_1 { server 10.10.15.12:80; keepalive 200; } }

18 Performance Test Results

19 Web Server (Instance 1)

events { worker_connections 1000000; } http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

server { listen 10.10.16.10:443 backlog=250000 reuseport; root /usr/share/nginx/bin; } } Web Server (Instance 2)

events { worker_connections 1000000; } http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

server { listen 10.10.11.23:443 backlog=250000 reuseport; root /usr/share/nginx/bin2; } }

21 Performance Test Results

22 Performance Test Results

23 #3 Performance Tip: Sysctl Settings

24 Linux Sysctl Settings

• Increase memory thresholds to prevent packet dropping ◦ sysctl -w net.ipv4.tcp_rmem=”4096 87380 4194304” ◦ sysctl -w net.ipv4.tcp_wmem=”4096 65536 4194304” • Increase the size of the processor queues ◦ sysctl -w net.core.net_dev_max_backlog=250000 • Setting the maximum TCP buffer sizes ◦ sysctl -w net.core.rmem_max=4194304 ◦ sysctl -w net.core.wmem_max=4194304

25 Linux Sysctl Settings

• Disable TCP timestamps ◦ sysctl -w net.ipv4.tcp_timestamps=0 • Defines the local port range that is used by TCP and UDP to choose the local port ◦ sysctl -w net.ipv4.ip_local_port_range = 32768 60999 • Enable reuse of TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. ◦ sysctl -w net.ipv4.tcp_tw_reuse = 1

26 Linux Sysctl Settings

27 4 Performance Tip: Enabling RSS and TPS

28 Enabling RSS and TPS

• /etc/init.d/irq_balance stop • git clone https://github.com/ANLAB-KAIST/mlnx-en.git • cd /mlnx-en/ofed-scripts • ./set_irq_affinity_bynode.sh • set_irq_affinity -x local

29 Enabling RSS and TPS

30 Enabling RSS and TPS

31 Performance Test Results

32 Performance Numbers with NGINX and Intel QuickAssist

33 Performance with QuickAssist Performance with QuickAssist

35 Summary

• Deploy two nginx instances • Using additional nginx configuration directives • Linux sysctl parameter tuning • Setting IRQ affinity using RSS and TPS • Consult appendix for additional information and performance tips

36 Thank you

Contact information here [email protected] 7 Appendix Client Traffic Script

taskset -c 0-21,44-65 wrk -t 44 -c 1000 -d 180s -H 'Connection: Close' https:// 10.10.16.10:443/$1 >> output.txt & taskset -c 22-43,66-87 wrk -t 44 -c 1000 -d 180s -H 'Connection: Close' https:// 10.10.11.23:443/$1 >> output1.txt &

## $1 is the requested static file size Architectural Specifications Architectural Specifications

41 Technical Specifications

# # Cores # Model RAM OS NIC Sockets per Threads Name Socket per Core

nbdw32 2 22 2 Intel(R) 128 GB Ubuntu 40GbE Xeon(R) CPU E5-2699 v4 Xenial QSFP+ @ 2.20GHz

Web 2 24 2 Intel(R) 192 GB Ubuntu 40GbE Xeon(R) Server Platinum Xenial QSFP+ 8168 CPU @ 2.70GHz

Technical Specifications

# Sockets # Cores per # Threads Model RAM OS NIC Socket per Core Name

Client 2 22 2 Intel(R) Xeon(R) 128 GB Ubuntu 40GbE CPU E5-2699 v4 @ 2.20GHz Xenial QSFP+

Reverse 2 24 2 Intel(R) Xeon(R) 192 GB Ubuntu 40GbE Platinum 8168 Proxy CPU @ Xenial QSFP+ 2.70GHz

Web Server 2 22 2 Intel(R) Xeon(R) 128 GB Ubuntu 40GbE CPU E5-2699 v4 @ 2.20GHz Xenial QSFP+ Reverse proxy (Instance 1)

user root; worker_processes 48 ; worker_cpu_affinity auto 000000000000000000000000111111111111111111111111000000000000000000000000111111111111111111111111; worker_rlimit_nofile 1024000; error_log /home/ubuntu/access.error crit; events { worker_connections 1000000; }

http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

44 Reverse proxy (Instance 1)

server { listen 10.10.10.18:443 ssl backlog=102400 reuseport; ssl_certificate /etc/ssl/certs/nginx.pem; ssl_certificate_key /etc/ssl/private/nginx.key; ssl_session_cache off; ssl_session_tickets off;

location / { proxy_http_version 1.1; proxy_set_header Connection ""; proxy_pass http://webserver_0; } } upstream webserver_0 { server 10.10.10.11:80; keepalive 200; } }

45 Reverse proxy (Instance 2)

user root; worker_processes 48 ; worker_cpu_affinity auto 111111111111111111111111000000000000000000000000111111111111111111111111000000000000000000000000; worker_rlimit_nofile 1024000; error_log /home/ubuntu/access.error crit; events { worker_connections 1000000; }

http { access_log off; keepalive_timeout 315; keepalive_requests 10000000; sendfile on; tcp_nopush on; tcp_nodelay on;

46 Reverse proxy (Instance 2)

server { listen 10.10.15.9:443 ssl backlog=102400 reuseport; ssl_certificate /etc/ssl/certs/nginx.pem; ssl_certificate_key /etc/ssl/private/nginx.key; ssl_session_cache off; ssl_session_tickets off;

location / { proxy_http_version 1.1; proxy_set_header Connection ""; proxy_pass http://webserver_1; } } upstream webserver_1 { server 10.10.15.12:80; keepalive 200; } }

47