Applications and Network Performance
Total Page:16
File Type:pdf, Size:1020Kb
Applications and network performance Introduction Considerable effort imporoving networks and operating systems has vastly improved the performance of large file transfers. Increasingly the causes of poor network performance are found in application design and programming. People with long memories will recall the same experience with the performance of relational databases. Just as programmers learnt how to use SQL effectively, programmers now need to learn to use fast long-distance networks effectively. This paper can be a beginning for that learning. How fast, how long? The typical host is connected at one gigabit per second to a workgroup ethernet switch. This in turn connects to building ethernet Workgroup Core Core Border Border switch, a routing core, a firewall, and Hosts s witch s witch router firewall router Internet finally to a border router with a 1Gbps link to AARNet. AARNet has sufficient capacity to take this potential 1Gbps to many of the world©s universities. Figure 1. Typical campus network. The longest production 1Gbps path through the network is from Perth to Sydney, Sydney to Seattle (USA), Seattle to Amsterdam (Netherlands), Amsterdam to Tomsk (Russia). It takes a packet roughly 600ms to travel this path and back. The average 1Gbps path is much shorter than this. However, Australia©s position makes most sites of interest at least a Pacific Ocean, or 100ms, away. Application developers in other countries do not face such long paths, with their large round-trip times. 100ms is enough to comfortably cross Europe or North America, but is the minimum value for applications run from Australia. Figure 2. Longest 1Gbps path is 600ms. questnet-2007-gdt-tcp-paper.odt 2007-07-03 TCP Understanding some behaviours of the Transmission Control Protocol is needed to write applications which use the network perform well. TCP has two major states: slow start and congestion avoidance.1 These two algorithms determine the scheduling and quantity of data transmitted by the sender Slow start When starting TCP does not know how much bandwidth is available. It starts by sending two to four packets. If all of these packets are acknowledged then the number of packets to be sent is incremented and those packets are sent. This is repeated until an Ack is lost or until the receiver©s TCP buffer size is reached. At the end of the slow start state we have a good estimate of the round-trip time and available bandwidth of the link, this is described in a single figure: the congestion window, the number of bytes which can be sent without congesting the path or over-running the receiver. Congestion avoidance The goal of this mode is to schedule packets to arrive just as they are needed by the receiver and not to congest the path. During this mode we send a full congestion window of packets per round-trip time. If all the packets are Acked, then the congestion window is slowly advanced2 to probe for any unused bandwidth on the path which may have become available. An estimate and variance of the round-trip time is maintained by timing the transmitted packets and their Acks. This is used to determine if a transmitted packet has a late or missing Ack. If a transmitted packet appears to have been lost due to congestion, then the congestion window and slow start threshold is halved and slow start mode is invoked. New TCP algorithms A large number of modified TCP algorithms exist. These seek to improve the performance of the TCP algorithm. They variously feature: faster discovery of available bandwidth, better estimation of the delay-bandwidth product, faster recovery from congestion, better resilience to media packet loss. All recent operating system versions have modified TCP algorithms available. Inter-flow fairness Differing TCP algorithms have radically different behaviours when faced with multiple flows. 1 Jacobson©s TCP congestion control algorithm is most recently described in Allman, Paxson, Stevens. RFC2581, TCP congestion control. 1999. 2 Specifically, the congestion window is increased by one packet per round-trip time. — 2 — Ideally the available bandwidth will be equally shared between running connections. This ideal is rarely met. Initiating multiple connections to improve throughput is limited by the inter-flow fairness of the TCP algorithm, Hamilton TCP has particularly good inter-flow fairness. Round-trip time fairness Almost all algorithms have connections with a long round-trip time treated much more unfairly than connections with a small round-trip time. All connections from Australia have a long round-trip time, so congestion near an offshore server is particularly felt by Australian clients. Tuning the operating system for TCP There are three major changes in operating system©s treatment of TCP. Firstly, buffer tuning is becoming more automatic. Secondly, more and better TCP algorithms are available for use. Thirdly, concentration on removing overheads from the operating system itself. Zero-copy, VJ©s API. This effort is wasted if applications are hosted on an operating system that does not include these improvements. Microsoft Windows Xp Microsoft Windows Server 2003 or Linux kernels 2.4 and earlier are not suitable if high netwok performance is desired. There is often conflict between the requirement of networking performance to use a recent operating system and other systems administration objectives. Most systems administrators do not realise that this trade-off exists. Buffer sizes The sender must be able to re-send the entire data in-flight across the link in case none of the packets is acknowledged. The in-flight data includes the data being transmitted across links plus the router buffers on the path. If the first re-transmission is lost then there will be a second round-trip time of data to buffer. Set the sender©s buffer to twice the bandwidth-delay product. Modern TCP algorithms need less buffer, say 1.2 of BDP, since they back-off less radically then the traditional Reno TCP algorithm. The receiver must be able to accept a full flight©s worth of data. The in-flight data includes the data being transmitted across links plus the router buffers on the path. Set the receiver©s buffer to a tad more than the bandwidth-delay product. There is no need to allow for all of the router in the path to have full buffers, since this path congestion will lower TCP performance in any case. Linux buffer auto-tuning Linux 2.6 has steadily improving automated buffer tuning of both receive and send buffers. To take advantage of this, it is important to use a recent kernel. This can conflict with other system administration goals. — 3 — Manually setting the buffer size disables auto-tuning. Linux destination route cache Linux caches the TCP parameters in a destination cache, which records the IP MTU and TCP window. This cache reduces the slow-start period of new connections. Look in /proc/net/rt_cache for the contents of the cache. When benchmarking TCP performance clear the route cache and the neighbour cache3 between each test. Path loss Packet loss on a path can cause TCP to falsely believe that congestion is occurring. The sender enters ªcongestion avoidanceº mode, immediately halving the packet transmission rate. Loss on underground optical links of gigabit speeds is essentially zero: perhaps one error seconds per year. Loss is so low that there is no visible difference in the reported error rates between gigabit ethernet LAN PHY (which does not have forward error correction) and SDH (which does have FEC). Loss on undersea fibre segments is much higher than loss on underground fibre segments. AARNet©s router©s record roughly 400 error seconds per year on cross-Pacific links. Between these error seconds loss is well under 10-13, mainly because the random spread of errors then allows SDH forward error correction to work well. Distressingly, megabit speed SDH links purchased from other carriers seems to have high loss. The 155Mbps STM-1 links to Darwin-Adelaide and Melbourne-Hobart show 10 to 100 error seconds per year, mainly as path loss errors. It is assumed that these links are more environmentally vulnerable, although why a Adelaide-Darwin link should be more vulnerable than a Adelaide-Perth link is not clear. Packet loss on wireless networks is greater and more random. A busy wireless LAN may have no error-free seconds whilst carrying traffic. Wireless LAN links should not be used where good network performance is desired. If there is no choice but to use a wireless LAN then the sender should select a TCP algorithm designed for paths with high media loss, such as Westwood TCP. Configuration error and loss Two configuration errors commonly lead to high loss links. Firstly, network engineers may fail to calculate the loss budget for optical and microwave links. This commonly drives the receiver with too little or too much power, causing loss. Optical budgets should be calculated for all links greater than 220m with 1000BaseSX, all links greater than 2,000m using 1000Base-LX, and all other optical links. Electrical power budgets should be calculated for all G.729 links: the receiver expects 1.0V±10% peak-to-peak as the input power, the output power varies by device and DIP switch setting, and the path 3 The ªneighbour cacheº is the general form of an ARP cache. — 4 — loss varies by coaxial cable type and length. Attenuation is typically installed on the receiving interface, as this minimises the opportunity to destroy an interface by connecting unattenuated power. Secondly, system administrators and network engineers often misunderstand the function of ethernet auto-negotiation. Auto-negotiation was designed to allow the connection of 100Base- TX interfaces to 100Base-TX interfaces or to 10Base-T interfaces.