Measuring End-To-End Bandwidth with Iperf Using Web100∗

Presented at Passive and Active Monitoring Workshop (PAM 2003), SLAC-PUB-9733 4/6/2003-4/8/2003, San Diego, CA, USA April 2003 Measuring end-to-end bandwidth with Iperf using Web100∗ Ajay Tirumala Les Cottrell Tom Dunigan CS Dept. SLAC CSMD University of Illinois Stanford University ORNL Urbana, IL 61801 Menlo Park, CA 94025 Oak Ridge, TN 37831 [email protected] [email protected] [email protected] Abstract about 45ms), whereas the actual bandwidth achiev- End-to-end bandwidth estimation tools like Iperf able is about 200 Mbps. For the cumulative band- though fairly accurate are intrusive. In this paper, width to get up to 90% of the end-to-end achievable we describe how with an instrumented TCP stack bandwidth, we need to run Iperf for about 7 seconds. (Web100), we can estimate the end-to-end bandwidth The other end-to-end bandwidth metric is the bot- accurately, while consuming significantly less network tleneck bandwidth which is the ideal bandwidth of bandwidth and time. We modified Iperf to use Web100 the lowest bandwidth link on the route between the to detect the end of slow-start and estimate the end-to- two end-hosts [10] [14]. Generally, packet-pair al- end bandwidth by measuring the amount of data sent gorithms are used to measure the bottleneck band- for a short period (1 second) after the slow-start, when width. Packet-pair algorithms generate negligible net- the TCP throughput is relatively stable. We obtained work traffic. Bottleneck bandwidth does not vary bandwidth estimates differing by less than 10% when rapidly in the timescales over which people make Iperf compared to running Iperf for 20 seconds, and savings measurements, unless there are route changes and/or in bandwidth estimation time of up to 94% and savings link capacity changes in the intermediate links of the in network traffic of up to 92%. route. In this paper, we are interested only in the achievable end-to-end TCP bandwidth. Achievable 1 Introduction end-to-end TCP bandwidth (bandwidth from hereon) Iperf [16] is a bandwidth measurement tool which is depends not only on the network, but also on the used to measure the end-to-end achievable bandwidth, TCP/IP stack, processing power, NIC speed, the num- using TCP streams, allowing variations in parameters ber of parallel streams used and the buffer sizes on like TCP window size and number of parallel streams. the end-hosts. We assume sufficient processing power, End-to-end achievable bandwidth is the bandwidth at NIC speed and large enough buffer sizes at the end- which an application in one end-host can send data hosts to be available and shall not discuss this further. to an application in the other end-host. Iperf approxi- Tools like Iperf[16] and TTCP[12] measure band- mates the cumulative bandwidth (the total data trans- width by measuring the amount of data sent for a ferred between the end-hosts over the total transfer fixed period of time. They use TCP streams and can period) to the end-to-end achievable bandwidth. We make use of parallel TCP connections1. Bulk data need to run Iperf for fairly long periods of time to tools like Iperf work fairly well and are widely used counter the effects of TCP slow-start. For example, [7] [8] [2]. Our endeavor is to reduce the measurement while running Iperf from SLAC to Rice University us- time and network traffic generated by these tools while ing a single TCP stream, with a TCP window size of 1 retaining or improving the accuracy of measurement. MB set at both ends, only 48.1 Mbps is achieved dur- Web100 [19], which is currently available for Linux ing slow-start (slow-start duration was about 0.9 sec- kernels, exposes the kernel variables for a particular onds, the Round Trip Time (RTT) for this path was TCP connection. We can use these variables in Iperf to estimate the end of slow-start. We can then mea- ∗This work was supported in part by the Director, Office of Science, Office of Advanced Scientific Computing Research, sure the amount of data transferred for a short period Mathematical, Information, and Computational Sciences Divi- sion under the U.S. Department of Energy. The SLAC work is 1A detailed analysis of end-to-end performance effects of par- under Contract No. DE-AC03-76SF00515. allel TCP streams can be found in [6]. 10000 100Mbps of time after slow-start, and often dramatically reduce 1Gbps 10Gbps the total measurement time (and network traffic) to 1000 make a bandwidth estimate. The rest of the paper is organized as follows. Sec- 100 tion 2 mentions the motivation for attempting this 10 x 1Gbps 200msx 10Gbps work. Section 3 explains the algorithm in detail. Sec- 100Mbps 200ms x 200 ms tion 4 has the implementation details, testing environ- ment and the results obtained. We finally conclude in 1 Section 5 with our overall observations and conclusions Slow-start duration (seconds) 0.1 and a note on future work. 0.01 2 Motivation 0.1 1 10 100 1000 The popularity and widespread use of Iperf can also Bandwidth-delay (MB) be partially attributed to its ease of installation and absence of kernel and/or device driver modifications. Figure 1: Slow-start times for various bandwidth- The Web100 application library provides functions to delay products (in theory) query and set values for TCP variables for a particular session. These functions allow an application to Figure 1 gives the slow-start times (in theory) for tune and monitor its TCP connections which was pre- various bandwidth-delay products. This also illus- viously possible only for kernel and/or device drivers. trates the importance of detecting the end of slow- Importantly, Web100 exposes the current conges- start before starting the bandwidth measurement. For tion window, maximum, minimum and smoothed example, a 10 second measurement (default Iperf mea- RTT, receiver advertised window, maximum segment surement time) is not sufficient for a 10 Gbps-200 ms size and data bytes out (and many other parameters) bandwidth-delay network path, since the slow-start through the life of a particular TCP connection. In duration is itself over 5 seconds. But, a 10 second the next section, we show how we can determine the measurement is more than sufficient for a 1 Mbps-20 sampling rate and track these variables to determine ms bandwidth-delay network path. quickly when the connection is out of slow-start. Varying network conditions like RTT and instanta- As mentioned in the Web100 study undertaken neous bandwidth will alter the slow-start duration (see at ORNL [3], the duration of slow-start is roughly [7] for variations in RTT and bandwidth from SLAC dlog2(ideal window size in MSS)e * RTT. In prac- to various nodes across the world with time). If we tice, it is a little less than twice this value because can accurately and quickly determine the end of slow- of delayed ACKs. For example, the bandwidth-delay start for a connection, then by measuring bandwidth produce for 1 Gbps network with an RTT of 200 ms for a very short duration after the end of slow-start, we is about 16667 1500-byte segments. The slow-start can significantly reduce the amount of network traffic duration, when a single TCP stream is used, will be generated and still retain or improve the accuracy. approximately dlog2(16667)e ∗ 2 ∗ 0:2, which is 5.6 seconds. Assuming a stable congestion window after 3 Iperf QUICK mode slow-start, the time for the cumulative bandwidth to reach 90% of the achievable bandwidth will be about Our algorithm works for TCP Reno, which is the 10∗slow start duration−10∗RT T , which can be ap- default implementation in most operating systems. proximated to 10∗slow start duration for high band- We have our analysis for TCP Reno. Section 3.1 width networks [3]. This means for the 1 Gbps-200ms briefly mentions the behavior of congestion windows network path, it will take over 50 seconds for the cu- in TCP stacks and section 3.2 gives the details of the mulative bandwidth to reach 90% of the achievable algorithm to detect the end of slow-start in Iperf Quick bandwidth. Using Iperf in quick mode can cut this mode. measurement to under 7 seconds while still retaining which experienced a loss [13]. For example, if only one stream the accuracy of measurement. Using parallel TCP is used and there is a packet loss, the congestion window and streams can reduce the slow-start duration to a cer- thus the instantaneous bandwidth reduces to half the previous tain extent, since the bandwidth is shared between the value. But, say 8 parallel streams are used and there is a ran- streams2. dom packet loss, only the stream which experienced a loss will reduce the congestion window to half its previous value and 2Using parallel TCP connections has other advantages like the instantaneous bandwidth will be about 94% of its previous dropping of a random packet affects only the particular stream value. 3.1 TCP Congestion Windows a valid value (value > 0 ms reported by Web100) for TCP Reno doubles the congestion window every the RT T and also note the value of the congestion RTT until it reaches the threshold congestion window window at that instant. Once we get the value of the size or it experiences a retransmission timeout. After RT T , we poll the value of the congestion window at the slow-start period, Reno sets the threshold window an interval of 2 ∗ RT T .

Load more