Network Congestion Management: Considerations and Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Network Congestion Management: Considerations and Techniques An Industry Whitepaper Contents Executive Summary Network congestion occurs when demand for a resource Executive Summary ................................... 1 exceeds that resource’s capacity. Congestion management equates to achieving cost-savings while Introduction to Access Network Congestion ...... 2 preserving subscriber QoE. These dual objectives are Throughput and Latency .......................... 2 often contradictory, and a balance must be struck. Congestive Collapse ............................... 3 Many factors must be considered when evaluating Congestion Management .......................... 3 solutions, including the regulatory environment. Considerations and Techniques ..................... 4 Generally, to be compatible with the spirit of network neutrality, a congestion management solution: Considerations ...................................... 4 must be narrowly tailored Defining the Goal ............................... 4 must have a proportional and reasonable effect Network Neutrality ............................. 4 must serve a legitimate technical need Topology Awareness ............................ 5 A complete congestion management solution has two Congestion Detection .............................. 5 functional components: Time of Day ...................................... 5 A mechanism to manage the impact of congestion A mechanism to trigger the management policies Concurrent User Thresholds ................... 6 Only by linking an accurate trigger with precise Bandwidth Thresholds .......................... 6 management can a CSP ensure that the goals of Subscriber Quality of Experience ............ 8 congestion management are achieved. Congestion Management .......................... 9 Real-time measurements of access round trip time Defining Application Priorities ............... 10 provide the best possible congestion detection mechanism by informing operators precisely where, Minimizing Negative Subscriber Impact .... 11 when, and for whom congestion is manifesting. Other Maximizing Precision .......................... 11 approaches result in misapplication of management Policy Enforcement ............................ 12 policies, in violation of best practices. Conclusions ............................................ 13 Since congestion is a result of overwhelming demand, it Summary of Detection Techniques ............. 13 is up to the CSP to define: What applications or application types will be Summary of Management Considerations ..... 14 prioritized during congestion? Related Resources ................................ 15 Will congestion management policies take into Invitation to Provide Feedback ................. 15 account subscriber-related attributes? Version 2.0 Network Congestion Management: Considerations and Techniques Introduction to Access Network Congestion Network congestion is defined as the situation in which an increase in data transmissions results in a proportionately smaller (or even a reduction in) throughput. In other words, when a network is congested, the more data one tries to send, the less data is actually successfully sent. Network congestion has been difficult to define quantitatively across the industry, but it’s something everyone recognizes when they see it. For subscribers, it means reduced quality of experience (QoE) with video stalls, choppy VoIP communications, a poor web browsing experience and frustrating online gaming performance. For communications service providers (CSPs) it means angry subscribers that are more likely to churn and infrastructure extensions that represent huge capital investments. The underlying problem is that all access network resources have a finite capacity, and demand can exceed that capacity. The inconvenience of congestion poses real business threats to CSPs. When a network becomes congested, the operator can expect numerous support calls or high subscriber attrition. By the time the operator is able to respond (often via increased capacity) the damage has already been done, with word-of-mouth compounding the problem. Throughput and Latency When subscribers are using the internet, they are readily (if not consciously) aware of two things: throughput (the capacity of the connection, measured in bandwidth terms like megabits per second) and latency (the time it takes traffic to traverse the connection, typically measured in milliseconds). Throughput is about how much data subscribers can draw from the network over time and is most apparent when transferring large files (e.g., software updates, email attachments, peer-to-peer exchanges, etc.). Latency refers to how long it takes for a packet to travel from a server to a client, and is most apparent when interactive applications are in use (e.g., web browsing, streaming, gaming, voice, video, etc.). In the example of a VoIP call, high latency introduces unacceptable delay in the conversation. Anyone who has played a latency-sensitive online game doesn’t need a study to tell them that a 200ms ping will render the game unplayable and most servers un-joinable. Because online gaming can include paid subscription to a service beyond paying for data access, in such cases subscribers cannot help but feel doubly cheated. Google and Amazon have both independently released research results showing that latency on their websites has a direct impact on sales. Google, upon introducing an added 0.5 seconds on page loads to deliver 30 results instead of 10, found revenue dropped 20% as a result1. Similarly, Amazon conducted an experiment that showed with every 100ms increase in latency, sales dropped by 1%. It’s clear that subscribers are affected by latency, and that it doesn’t have to be particularly high to have a heavy impact on subscriber QoE. 1 Interested readers can find the original research in User Preference and Search Engine Latency 2 Network Congestion Management: Considerations and Techniques Congestive Collapse As throughput increases on a node or router, latency increases due to the growing queue delay2 and the ‘bursty’ nature of TCP.3 At first, the increase in latency is rather marginal, but nevertheless is proportional to the increase in bandwidth. However, as the throughput approaches capacity, latency begins to increase exponentially as it reaches a final tipping point where the element experiences congestive collapse. Figure 1 - Relationship between throughput and latency; the exponential increase in latency represents congestive collapse The tipping point often occurs somewhat before the resource’s theoretical maximum bandwidth threshold is actually reached. Plus, since both fixed and access network resources experience dynamic capacity (this phenomenon is explained in greater detail in a later section), there is no fixed point at which the congestive collapse acceleration begins – the point varies every moment. For subscribers, when an access node is near capacity, the rapid increase in latency causes a substantial deterioration of QoE for latency-sensitive applications. Congestion Management At a high level, a congestion management solution has two functional components: A mechanism to alleviate the impact of congestion; the actual ‘management’ A mechanism to switch on the alleviation mechanism; the detection ‘trigger’ A complete congestion management solution requires both these components, but this simple characterization betrays enormous complexity; available solutions vary enormously in effectiveness and complexity. 2 In this case, the queue delay is the time a packet waits in a queue until it can be processed. As the queue begins to fill up due to packets arriving faster than they can be processed, the amount of delay a packet experiences increases. The problem is particularly pronounced when TCP is in its ‘ramp-up’ stage. More information is available here: http://en.wikipedia.org/wiki/Queuing_delay 3 This study looks at the phenomena of TCP burstiness: http://www.caida.org/publications/papers/2005/pam-tcpdynamics/pam- tcpdynamics.pdf 3 Network Congestion Management: Considerations and Techniques Considerations and Techniques Considerations There are many factors that must be considered and understood to critically evaluate congestion management systems, including: Defining the goal: what is a CSP trying to achieve? Network neutrality: does the solution fit within a particular regulatory environment? Topology awareness: does the solution rely upon a complete understanding of the network’s structure? Defining the Goal Congestion management is a general solution that actually addresses several potential objectives, from the perspective of the CSP: Defer capital investments by extending infrastructure lifetime: congestion management can provide short term relief that buys time before additional capacity can be physically added to the network Extend the lifetime and utility of legacy network equipment that will not receive expansion: some earlier generation access technologies will not receive additional investment, but are still serving many subscribers Reduce or contain transit costs: a congestion management solution can be applied to cap peaks on expensive links or get the most utility out of a saturated link Enhance or protect the subscriber quality of experience: make sure that even during times of congestion, the subscriber base is happy; put another way, this goal could be stated as “to maximize the quality of experience, for the maximum number of subscribers, for the maximum amount