Amazon Cloudfront Uses a Global Network of 216 Points Of

Tuning your cloud: Improving global network performance for applications Richard Wade Principal Cloud Architect AWS Professional Services, Singapore © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Topics Understanding application performance: Why TCP matters Choosing the right cloud architecture Tuning your cloud Mice: Short connections Majority of connections on the Internet are mice - Small number of bytes transferred - Short lifetime Expect fast service - Need fast, efficient startup - Loss has a high impact as there is no time to recover Elephants: Long connections Most of the traffic on the Internet is carried by elephants - Large number of bytes transferred - Long-lived single flows Expect stable, reliable service - Need efficient, fair steady-state - Time to recover from loss has a notable impact over the connection’s lifetime Transmission Control Protocol (TCP): Startup Round trip time (RTT) = two-way delay (this is what you measure with a ping) In this example, RTT is 100 ms Roughly 2 * RTT (200 ms) until the first application request is received The lower the RTT, the faster your application responds and the higher the possible throughput RTT 100 ms AWS Cloud 1.5 * RTT Connection setup Connection established Data transfer Transmission Control Protocol (TCP): Growth A high RTT negatively affects the potential throughput of your application For new connections, TCP tries to double its transmission rate with every RTT This algorithm works for large-object (elephant) transfers (MB or GB) but not so well for small-object (mice) transfers 32 30 28 26 24 1 packet 22 20 18 Increase cwnd 16 14 Segments 12 2 packets 10 8 6 Increase cwnd 4 2 4 packets 0 0 1 2 3 4 5 6 7 Time Transmission Control Protocol (TCP): Loss Most TCP algorithms use packet loss as a signal that the transmission rate has exceeded the bottleneck bandwidth on a given path Most will reduce (by up to 50%) the transmission rate when loss or timeout is experienced, which can have a dramatic effect on overall performance 32 30 28 26 24 Packet loss 22 20 18 16 14 Segments Partial acknowledgement 12 10 Decrease cwnd 8 6 Retransmit lost packets 4 2 0 0 2 4 6 8 Time Acknowledgement Transmission Control Protocol (TCP): Recovery After a loss or timeout event, most TCP algorithms enter a congestion avoidance phase The transmission rate is increased linearly until further events are experienced This means that recovery from loss is slow, again affecting overall flow performance The rate of recovery is highly dependent on a connection’s RTT 32 30 Packet loss 28 Halve cwnd 26 24 22 20 18 16 14 Segments 12 Increment cwnd 10 8 6 4 2 0 0 2 4 6 8 10 12 Time TCP: Impact of loss on throughput 100 80 60 40 20 0 0% 2% 4% 6% 8% 10% Loss rate % Net401: Network Performance: Making Every Packet Count, Re:Invent 2017 TCP summary 1) Latency from user to cloud is important • Time to establish connection • Rate at which throughput accelerates • Limits the maximum potential throughput for a TCP connection* 2) Understand your application • Small objects and large objects have different requirements • Define your architecture and tune your infrastructure accordingly 3) TCP has many tunable parameters and variants More on this later https://en.wikipedia.org/wiki/Bandwidth-delay_product Solutions: Three things you can influence 1) Latency from your applications to your users: Service Architecture Region selection, use of edge services 2) Throughput from your infrastructure: Infrastructure Design Using optimized instance types, edge services 3) Configuration of your infrastructure: Tuning Tuning infrastructure parameters to suit your application and deployed architecture Latency: Move closer to your users Amazon CloudFront: Improving latency to users Amazon CloudFront uses a global network of 216 points of presence (205 Edge Locations and 11 Regional Edge Caches) in 84 cities across 42 countries Amazon CloudFront: Improving latency to users RTT 150 ms Viewer request Users Origin Viewer response Amazon CloudFront: Improving latency to users RTT 30 ms CloudFront RTT 120 ms cache Viewer Origin request request Users Origin Viewer Origin response response Amazon CloudFront: Improving latency to users RTT 30 ms CloudFront RTT 120 ms cache Viewer Origin request request User #1 Viewer Origin response response Origin User #2 Amazon CloudFront: Improving throughput Consider the impact of reduced RTT on transfer time for small objects Smaller RTT, faster increase, more throughput, faster transfer 32 30 28 26 24 22 20 Increase cwnd 18 16 14 Segments 12 10 8 6 Increase cwnd 4 2 0 0 1 2 3 4 5 6 7 Time Throughput: Get data to your users faster Why packet throughput matters Packets per second (PPS) and maximum transmission unit (MTU) Each packet has processing overhead Small packets such as real-time systems or transactions Large packets increase the overall performance Jumbo MTU of 9001 available within VPC or VPC peers 1448 B Payload 8949 B Payload Jumbo MTUs increase the usable data per packet https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html The AWS Nitro System Nearly 100% of available compute resources available to customers’ workload Server Improved throughput Improved latency Nitro Improved PPS The Nitro System enables performance improvements Up to 4x improvement in instance network throughput 90 43 5.05 34 27 30 28 1.02 0.52 Throughput (Gbps) Latency (microseconds) Packets per second (millions) Enterprise Strategy Group, 2019 The AWS Nitro System Network Mem Network Model vCPU Performance Mem (GiB) Model vCPU Performance (Gbps) (GiB) T3 instances – up to 5 Gbps (Gbps) t3.nano 2 6 Up to 5 network performance c5.large 2 4 Up to 10 t3.micro 2 12 Up to 5 c5.xlarge 4 8 Up to 10 t3.small 2 24 Up to 5 c5.2xlarge 8 16 Up to 10 Smaller sizes of C5, M5, R5 – up to t3.medium 2 24 Up to 5 10 Gbps network performance t3.large 2 36 Up to 5 c5.4xlarge 16 32 Up to 10 t3.xlarge 4 96 Up to 5 c5.9xlarge 36 72 10 Larger instance sizes have sustained t3.2xlarge 8 192 Up to 5 c5.18xlarge 72 144 25 10 or 25 Gbps Network Mem Model vCPU Performance (GiB) (Gbps) Smaller sizes of C5n – up to 25 Gbps c5n.large 2 4 Up to 25 network performance c5n.xlarge 4 8 Up to 25 c5n.2xlarge 8 16 Up to 25 C5n instances have sustained 50 or c5n.4xlarge 16 32 Up to 25 100 Gbps c5n.9xlarge 36 72 50 c5n.18xlarge 72 144 100 AWS Outposts • Industry standard 42U rack • Fully assembled, ready to be rolled into final position • Installed by AWS, simply plugged into power and network • Centralized redundant power conversion unit and DC distribution system for higher reliability, energy efficiency, and easier serviceability • Redundant active components, including top-of-rack switches and hot spare hosts AWS Outposts Nitro hardware and software in your data center Access via standard AWS API and console AWS Outposts Deploy apps to AWS Outposts using AWS services Improving both latency and throughput Consider the impact of reduced RTT, lower risk of packet loss Optimized compute and network performance Smaller RTT, faster increase, more throughput, faster transfer 32 32 30 30 28 28 26 26 24 24 22 22 20 20 18 18 16 16 14 14 12 12 Segments Segments 10 10 8 8 6 6 4 4 2 2 0 0 0 2 4 6 8 0 5 10 Time Time 32 30 28 100 26 24 80 22 20 18 60 16 14 40 12 Segments 10 8 20 6 4 2 0 0 0 5 10 15 0% 5% 10% Time Tune and optimize your cloud Amazon Linux Kernel TCP tuning US-EAST-1 AP-SOUTHEAST-1 VPC RTT 220 ms VPC Public subnet Public subnet Amazon Linux Kernel TCP tuning Kernel Setting Default Tuned Value Function The maximum receive net.core.rmem_max 212,992 134,217,728 socket buffer size in bytes The maximum send socket net.core.wmem_max 212,992 134,217,728 buffer size in bytes Min, default, max TCP net.ipv4.tcp_rmem 4,096 87,380 62,91,456 4,096 87,380 67,108,864 receive buffer size in bytes Min, default, max TCP send net.ipv4.tcp_wmem 4,096 20,480 41,94,304 4,096 65,536 67,108,864 buffer size in bytes TCP congestion control net.ipv4.tcp_congestion_control Cubic BBR https://tools.ietf.org/html/rfc8312 https://research.google/pubs/pub45646/ algorithm name Queueing discipline net.core.default_qdisc pfifo_fast fq algorithm name https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Example: Amazon Linux Kernel TCP tuning #!/bin/bash sudo sysctl -w net.core.rmem_max=134217728 sudo sysctl -w net.core.wmem_max=134217728 sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 67108864“ sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 67108864“ sudo sysctl -w net.ipv4.tcp_mtu_probing=1 sudo sysctl -w net.core.default_qdisc=fq One size does NOT fit all. Experiment safely and methodically in a test environment. Amazon Linux Kernel TCP tuning: 1 GB transfer 220 ms latency, no tuning TCP Cubic Kernel Time to Mbps Retransmits Settings transfer 1 GB Default ~150 seconds ~55 0 ~90 seconds Tuned (average of 10 tests was ~90 ~1000 108 seconds) 220 ms latency, tuning - Kernel settings - TCP Cubic Summary In this session, we have shown: 1) How to improve application performance by positioning services closer to your users 2) Some examples of AWS services you can use to reduce latency and increase throughput 3) How application performance can be improved using kernel TCP tuning methods AWS Training and Certification Explore tailored Build cloud skills with Demonstrate expertise with Find entry-level cloud learning paths for 550+ free digital an industry-recognized talent with AWS customers and training courses, or dive credential Academy and AWS partners deep with classroom re/Start training aws.amazon.com/training Thank you! Richard Wade [email protected] © 2020, Amazon Web Services, Inc.

Amazon Cloudfront Uses a Global Network of 216 Points Of

A Comparison of TCP Automatic Tuning Techniques for Distributed

Network Tuning and Monitoring for Disaster Recovery Data Backup and Retrieval ∗ ∗

Improving the Performance of Web Services in Disconnected, Intermittent and Limited Environments Joakim Johanson Lindquister Master’S Thesis Spring 2016

Use Style: Paper Title

Exploration of TCP Parameters for Enhanced Performance in A

EGI TCP Tuning.Pptx

A Comparison of TCP Automatic Tuning Techniques for Distributed Computing

Tuning, Tweaking and TCP (And Other Things Happening at the Hamilton Institute)

Transport Layer

Dell EMC Powerscale Network Design Considerations

Throughput Issues for High-Speed Wide-Area Networks

TCP Tuning Techniques for High-Speed Wide-Area Networks