Experimental analysis of the impact of RTT differences on the of two

competing TCP flows

The network phase effect explored

Maarten Burghout

Individual Research Assignment, June 2008

Committee

Dr. ir. Pieter-Tjerk de Boer

Dr. ir. Geert Heijenk

Abstract Two TCP flows sharing a common bottleneck link, obtain a fraction of the available throughput, based on the (inverse) ratio of their roundtrip times (RTT): the higher the RTT, the less throughput. This report investigates how a higher RTT can deliver a higher throughput, in contrast with normal TCP calculation rules.

Underlying cause for this deviation of regular throughput predictions is the fact that modern TCP traffic shows a strong periodicity at bottleneck links, as a result of most packets carrying maximum payload. Traffic sources that tune their RTT to be in phase with the transmissions at that bottleneck are known to obtain a higher throughput at that link. This phenomenon is called the Network Phase Effect.

In this report, results of a series of experiments that address this phenomenon are discussed. A physical implementation of a simple network was constructed to further investigate preliminary simulation results. A theoretical analysis is also given, with conditions for the phase effect to occur derived and tested against physical experiments. These experiments confirm the possibility of a higher throughput on a bottleneck link, as a result of RTT tuning according to phase effect conditions.

2

Table of contents

1 INTRODUCTION ...... 5

1.1 SCOPE ...... 5 2 PROBLEM DEFINITION ...... 6

2.1 SIMULATION SETUP ...... 6 2.2 NS2 SIMULATION RESULTS ...... 7 3 ANALYSIS ...... 8

3.1 NEUTRAL SITUATION ...... 10 3.2 CONGESTION WINDOW INCREASE ...... 10 3.3 DELAY ...... 10 3.4 THE NETWORK PHASE EFFECT ...... 11 3.5 NS2 SETUP RESULTS EXPLAINED ...... 11 4 EXPERIMENTS ...... 13

4.1 THE ASSIGNMENT : TESTS TO BE CONDUCTED ...... 13 4.2 TEST ENVIRONMENT ...... 13 4.3 MEASUREMENT METHOD ...... 13 4.4 COMPARISON TO NS2 SETUP AND PREDICTIONS ...... 14 4.5 DIFFERENT TCP FLAVORS ...... 14 5 RESULTS ...... 16

5.1 DATA PROCESSING ...... 16 5.2 PHASE EFFECT VISIBLE ...... 16 5.3 MEASUREMENT ABNORMALITIES ...... 17 5.3.1 Small payload packets ...... 17 5.3.2 Terminated measurements ...... 17 5.4 PHASE EFFECT FOR DIFFERENT TCP FLAVORS ...... 18 5.4.1 Equal tripping points ...... 19 5.4.2 sharing ratio ...... 19 5.4.3 Low delay and Scalable ...... 20 5.5 COMPARISON OF DIFFERENT FLAVORS ...... 20 5.6 EQUAL DELAY SETTINGS ...... 21 6 CONCLUSIONS ...... 24

6.1 PHASE EFFECT POSSIBLE IN LAB ENVIRONMENT ...... 24 6.2 DIFFERENT TCP FLAVORS AGAINST EACH OTHER ...... 24 6.3 PREDICTIONS ON (SEMI -)PUBLIC NETWORKS ...... 24 APPENDIX A - DEFINING GOODPUT...... 25 APPENDIX B – NS2 GIVES FAULTY/UNEXPECTED BEHAVIOR IN A SPECIFIC CASE ...... 26

WHY ...... 26 NETWORK SETUP ...... 26 NETWORK PROPERTIES ...... 26 DESCRIPTION OF THE PROBLEM ...... 26 TEST RESULTS ...... 27 Both flows start at 0 ...... 27

3

Starting both flows separate ...... 28 Conclusion ...... 29 LITERATURE ...... 30

4

1 Introduction In today’s network topology, a not-uncommon scenario has two or more TCP flows sharing a common bottleneck link, e.g. multiple users on a LAN, sharing a DSL link. When competing for bandwidth on that bottleneck, the round-trip time (RTT) is a determining factor for the throughput obtained by those flows.

In general, a higher RTT means a lower throughput. Under general conditions, flows share the available bandwidth by the inverse of their RTT ratio.

In this project, we will investigate how tuning the RTT for a flow can help in obtaining a higher throughput. Simulations in the Ns2 network simulator suggest that by opting for a higher RTT, a flow can actually get more throughput than its competitors (which do not tune their RTT). Main focus for this project is a physical implementation of such a setup and parameter strategy. Measurements will be done to see whether a higher RTT will give a flow a greater share of the available bandwidth, or even make it the sole user on the bottleneck, as some simulations suggest.

1.1 Contents The cause for this assignment was a set of simulation results on the Ns2 network simulator, which was believed to be erroneous. These simulation results are discussed in chapter 2, with a theoretical analysis for this (correct!) behavior given in chapter 3. Based on this analysis, a series of experiments were drawn up, as is discussed in chapter 4. Results of these experiments are shown in chapter 5. Finally, chapter 6 gives conclusions based on the analytic investigation and practical results.

1.2 Scope This project was conducted as an Individual Research Assignment (Individuele Onderzoeks Opdracht, IOO), part of the undergraduate program for Electrical Engineering at the University of Twente. Main activity was performed in 2005.

5

2 Problem definition While working on his BSc-assignment entitled “TCP synchronisation effect in TCP New Reno and TCP Hybla”, Bert Baesjou stumbled upon an apparent flaw in simulation results of the Ns2 network simulator. Since this flaw was not of direct interest for his assignment, he summarized his findings in a memo, which can be found in appendix B. His findings were worth investigating further, which therefore led to the formulation of my assignment.

2.1 Simulation setup Figure is a graphical representation of the network as simulated in Ns2, when the problem occurred.

Figure - Lay-out of network in the Ns2 simulation

The network consists of:

• 2 source nodes (1 and 2) connected to the first router via 100 Mbit/s links • 2 end nodes (5 and 6) connected to the second router via 100 Mbit/s links • 2 routers, interconnected via a 100 kbit/s link

Only two flows exist: node 1 sends to node 5, and node 2 to node 6, accompanied by their respective acknowledgment return flows. All links are lossless, but packets may be dropped at the routers when their queues fill up to their maximum of 10 packets.

All traffic is TCP (New Reno), packet size is set to 1,000 bytes; network protocol is IPv4.

The 100 Mbit/s links for the 1  5 flow are set to a fixed delay of 5 ms, while delays for the links for 2  6 are variable (and set between 5 and 25 ms). The first flow experiences a smaller RTT and is therefore expected to be the ‘faster’ connection.

6

Given that the clear bottleneck for both flows is the 100 kbit/s link between the routers, one may expect a quick fill of the queue at the first router, causing packets to be dropped and the packet windows of both flows being adjusted in classic TCP form (the ‘saw tooth’).

2.2 Ns2 simulation results For delays up to 19 ms per link on the variable delay links the goodput is shared equally between the two links, giving both flows around 50% of the capacity on the 100 kbit/s link. This is shown in Figure . The TCP windows of both flows increase to 6 (sending 12 packets to the first router) which causes packets to be dropped since no more than 10 can be accommodated in the queue, the window decreased to half the size and thereafter increasing to 6 again.

Goodput from 1 to 5 Goodput from 2 to 6 100

80

60

40 Goodput on bottleneck link in B/s

20

0 0 20 40 60 80 100 120 Total delay over 4 links for flow from 2 to 6 in ms

Figure - Goodput over the bottleneck link for both flows, for different delay settings on the links from node 2 to node 6

However, when the variable delay is set to 20 ms, the flow between 1  5 comes to a complete halt. The window of node 2 still shows the normal saw tooth form, but now increases to 12 and decreases to 6. Goodput on the bottleneck is entirely given to the 2  6 flow.

This behavior continues to show in the simulation results up to a delay of 25 ms. When increased further (i.e. 26, 27… ms) the goodput is divided in a 63-37 ratio between 1 5 and 2  6 respectively. Window size reaches 7 (node 1) and 5 (node 2), still adding up to 12 at which point both nodes notice packet loss due to the queue of 10 packets at the first router.

While it seems awkward that the bottlenecks goodput isn’t shared equally between the competing flows, there is actually a sound explanation for this behavior, as we will see in the analysis.

7

3 Analysis The behavior of the setup during simulation in Ns2 seems awkward: a link deliberately made slower by means of a larger RTT still manages to capture the entire bandwidth available on a bottleneck link. In theory however, there is a good explanation for this phenomenon, which is known as the Network Phase Effect .

The analysis is based on the network setup of Figure and uses the following assumptions and parameters:

• The traffic consists of 1,000 byte packets (including headers) • Traffic is already flowing and the 10 packet queue at the first router is filled entirely • Transmission times over the 100 Mbit links are neglected, since these are a magnitude smaller than other relevant transmission times (i.e. over the bottleneck link) • The individual delays on all the 100 Mbit links are concatenated 1 and set at the source nodes

• We denote the transmission time of a packet over the bottleneck ( TP), of the accompanying

ACK packet ( TA), the delay set on the bottleneck link ( dc) and the total delays for both flows

(d1 and d2) set at source 1 and 2 respectively.

1 This is allowed, since packet loss can only occur at the back of the queue; therefore, only the amount of delay is of importance, not the location where it is added.

8

Source nodes Destination nodes

Router 1 Router 2 Time

t = -TP

Packe t from s ource 2

t = 0

dc t = dc

t = TA + dc urce 1 ards so ACK tow t = TA + 2dc

Packe 1 t from s ource 2 1 place available in queue

d1

t = TP

2 2 places available in queue rce 2 ards sou ACK tow t = TP + TA + 2dc

d2

t = TP + TA + 2dc + d2

Figure - Transmission of two packets over the bottleneck link, without delay (1) and with delay (2)

9

To illustrate a chronological sequence of events, Figure shows two packets transmitted over the bottleneck link.

3.1 Neutral situation We start our analysis at the first router, where transmission of the first packet (originating in source

1) starts at t = -TP. The packet takes TP to be transmitted over the bottleneck link, so at t = 0 a place in the queue becomes available.

The first packet continues to its destination node, where it generates an ACK reply. This ACK travels back to source 1, taking TA back over the bottleneck. Since an additional delay of dc was set on the

bottleneck (in both ways!), source 1 receives the ACK packet at t = T A + 2d c

If receiving this ACK allows source 1 to send a new packet to the first router, this packet will fill the free spot in the queue (indicated in Figure at ). Source 1 thereby has the same amount of packets in the queue as it had at t = -TP. In this way, each packet sent over the bottleneck will allow for its source to queue a new packet, thereby maintaining the amount of bandwidth each flow gets. Sources cannot place a packet in an empty spot, created by transmission of an opponent’s packet, because TCP only allows for packets to be sent when receiving an ACK. As TA is only a fraction of TP, a source can reclaim its own spot long before the other source can.

3.2 Congestion window increase When receiving an ACK, TCP usually allows for sending a new packet. Since this is a straightforward TCP connection, where only packet loss can occur (packet reordering is not possible), some ACKs will allow an increase of TCP’s congestion window, thus sending (at least) a second packet to the queue. These two packets arrive at the back end of the queue where only one spot is available, causing the second packet to be dropped. This packet loss will result in a decrease of the congestion window, forcing Source 1 to wait for one or more ACKs from its packets still in the queue before sending new packets to the queue. By then, the other source may have already claimed the empty spots and temporarily taking a larger share of the bandwidth on the bottleneck.

However, as soon as the queue has been completely filled again, source 2 will also have one of its packets dropped, allowing for source 1 to reclaim some queue spots.

Over a greater period of time and with other criteria (such as TCP congestion avoidance mechanism) equal, this sequence of increase and decrease of bandwidth share should level out, with both flows obtaining an equal goodput.

3.3 Delay In the basic setup (i.e. without delays) a spot in the queue becomes available at t = 0 and source 1 can have a packet delivered to fill it at t = TA + 2d c. When the ACK that allowed the new packet of source 1, also allowed for an increase of the send window, source 1 may send two packets. However, only one spot is available in the queue, so for both packets to be accommodated, source 1 should wait until that second spot becomes available at t = T P. If source 1 had not waited but instead increased its window and sent two packets to the queue, the second packet would have been dropped.

10

Clearly, source 1 can have its two packets accepted in the queue if it waits until after t = T P. It should

however not wait too long, as source 2 had the next packet in line and will receive an ACK at t = TP +

TA + 2d c. Therefore, source 1 has to set its additional delay d1 in the window

͎ Ǝ ͎ Ǝ 2͘ Ɨ ͥ͘ Ɨ ͎

Source 2 had also been given an extra delay d2, which extends the above window to

͎ Ǝ ͎ Ǝ 2͘ Ɨ ͥ͘ Ɨ ͎ ƍ ͦ͘

This scenario is also illustrated in Figure , indicated at .

Of course, this short example will not hold indefinitely. Eventually, source 1 will enlarge its congestion window even further, which will allow for sending more packets to the queue than there are empty spots. Source 1 will experience packet loss, reduce its congestion window and has to wait for a while before it can send its next packet. By then, source 2 may have stepped in and obtained some extra spots in the queue, thereby taking a larger share of the bandwidth on the bottleneck. However, as soon as the packet order as described in the example above takes place, source 1 again will have a change to regain goodput. Source 2 on the other hand always has a disadvantage when recovering from packet loss. Thus, over a greater period of time, Source 1 can obtain more goodput.

3.4 The network phase effect The phenomenon described above can be seen as a phase effect in a traffic flow through a network. Here, phase denotes the time between the end of transmission of a packet out of a stream, and the arrival of another packet belonging to that stream at the transmission queue.

Because all traffic in this setup is of the same size (i.e. 1000 bytes), all packet transmissions over the bottleneck link take the same time, making the RTT a multiple of the transmission time.

Adding delay in the amount of this transmission time to another part of the total connection can make a flow dominant when competing for goodput over a bottleneck link. Extra delay that sets the RTT of a flow to a non-multiple of the transmission time makes that flow loose out over a greater period of time. End hosts can use this method to their advantage, especially if they ‘know’ that the major part of the RTT is caused at one particular part of the link, and provided almost all traffic is of same size.

Earlier investigations on this effect were conducted by Sally Floyd and Van Jacobsen (Floyd & Jacobsen, 1992). They established a strong relation between the roundtrip time and arrival of a packet in a queue. Different drop strategies for TCP were found to be of influence, as was the addition of random (i.e. out-of-phase) traffic.

3.5 Ns2 setup results explained Adding a delay for source 1 in the amount of one transmission time does not fully explain why this allows source 1 to take more spots in the queue and thereby more goodput over the bottleneck link. If both packets that left a vacant spot in the queue belonged to source 1, then no gain is obtained: source 1 still has as many packets in the queue as it had when transmission of its packet began. But if packet 2 originated from source 2, source 1 has gained an additional packet over source 2, and thereby also gained some goodput.

11

As we saw, source 1 only gains a larger share if it steals a spot from the other source, which can only occur if that source does not reclaim its vacant queue position in time.

Calculating for the Ns2 simulation setup, the total delay for the flow from node 2 to node 6 should be between 74.8 and 100 ms. Since the delay was set distributed over the various links (and in multiples of 1 ms), individual delay settings of 20 ms to 25 ms should result in a strong phase effect condition, which is exactly shown in Figure .

As the term ‘phase’ implies, adding an extra transmission time ͎ to the total delay setting should yield similar results. The theory described above supports this, but the Ns2 experiments did not investigate higher delay settings. For the physical world experiments in this assignment, we will explore those delay settings.

12

4 Experiments

4.1 The assignment: Tests to be conducted First aim within this assignment was to create a close physical copy of the setup used in the Ns2 simulations. Using this setup, initial tests were to be conducted using the parameters of the simulation and determine how close the measured behavior aligns with Ns2 results.

Next, and most important, the setup would be used to (try to) show the ‘network phase effect’. Once established, it was to be determined within what range of delays the effect would show. Finally, possibilities for application in (semi-)public network would be explored and possibly tested.

4.2 Test environment Physical implementation of the Ns2 setup, as shown in Figure , was created using six identical PC- compatibles, running Debian Linux. An IPv4 subnet was created, IP range 10.0.0.0/16.

Network infrastructure consisted of 905TX series 3Com network interfaces (either onboard the mainboards or as PCI cards). Standard CAT5 UTP cables were used.

The bottleneck link between the two routers was established by means of an SLIP (RFC 1055) link, operating at 115,200 bps, over a 3-wire null modem cable. The SLIP protocol adds a start- and stop bit to each byte, putting 10 bits on the wire for every byte of user data. Furthermore, each datagram is preceded and concluded by an additional symbol.

SLIP also uses two special characters (ESC and END), which must be ‘escaped’ when the same byte is present in the user data. Payload for all traffic is however strictly chosen to avoid these characters. Occasionally, a byte in the TCP- or IP-header has to be ‘escaped’ but it is highly unlikely that multiple need for escaping causes a significant increase in transmission time, relative to other packets, since each escaped character adds about 87 µs or 1 per mille.

Each datagram transmitted over SLIP is preceded and concluded by an END character, thus for each IP datagram an additional 2 bytes are sent.

Traffic Control Queueing Discipline (“tc qdisc”) is used to limit (IP-)packet size to 1000 bytes and to set the delays for the links.

Netcat is used to create the desired traffic (in most tests an endless stream of zeroes, to avoid escape characters).

4.3 Measurement method The main interest of these tests is the effect of the additional delay added to links on the phase effect. Therefore sequential runs, each with an increase in delay, were conducted in an automated manner. Cron (a daemon to execute scheduled commands) was used at all nodes to start and stop traffic, and for logging. Since cron has a one-minute resolution for jobs, the measurements had to be adopted to function accordingly.

For the main runs, the flow from node 1  5 (experiencing a constant delay of 5 ms per link, totaling to 20 ms for the whole flow) was started first, with the other flow, set to the appropriate delay for that run, following a minute later. This method was selected, because a simultaneous start (i.e.

13

within the same ‘cron-minute’) might result in a winner, should one node start slightly early. Starting the adjustable flow a minute late ensured an established flow of traffic was already present, and measurements could be done on the amount of goodput the second link was able to conquer from the first.

All flows were logged at all nodes using the tcpdump tool. Main source for analysis were the logs of the first router, at the egress point to the SLIP link.

Since the main focus of this project was on goodput in phase-effect affected links, the total goodput of both links was calculated during the ‘stable’ period of each measurement, i.e. while both source nodes were sending traffic. Therefore, goodput was calculated over the period between the 60th and 400th second of each run, thereby ignoring the first minute, when only the first link was active, and the latter part of transmissions, where the flow was terminated by the cron scripts thus no longer containing homogenous traffic.

4.4 Comparison to Ns2 setup and predictions The physical layout was chosen to match the Ns2 setup as close as possible, but slight differences could not be avoided. Most obvious is the use of a SLIP link to act as bottleneck link, which not only has a somewhat higher transmission speed but also increases overhead per datagram.

The main parameter influenced by this setup is the transmission time over the bottleneck link. IP- packets from source nodes (1,000 bytes each) take 10,020 bits over the SLIP link, which results in a transmission time of 86.98 ms. This number increased slightly when header bytes need to be escaped. ACK-packets on their way back amount to between 52 and 80 bytes (depending on SACK usage), taking 4.69 to 7.12 ms over the bottleneck.

Based on the theory of the network phase effect, the 'slow' flow from source 2 to destination 6 should be dominant in obtaining goodput over the bottleneck from 80,3 ms total delay, up until 104,3 ms. Those delay settings should ensure that, after receiving an ACK, source 2 sends its next packets no sooner than a second spot in the queue is available, but before source 1 has a chance to claim one of those spots.

4.5 Different TCP flavors During analysis of some early results, it became apparent that one of the sending nodes used a different TCP congestion control mechanism, due to erroneous setup. This was of course an undesirable situation, so all machines were reconfigured with the same TCP implementation ( BIC , the Linux kernel default at the time).

Analysis of the results of the erroneous setup however showed a small influence on the outcome of the tests. This prompted for a comparison of 3 widely used TCP congestion control/avoidance mechanisms:

• TCP BIC • TCP Reno • TCP Scalable

Two scenarios were investigated:

14

• The above setup, with both flows using the same TCP flavor • Different TCP flavors competing against each other, with equal delay settings for both flows and a common starting time (instead of the one minute head start for flow from node 1 to 5)

15

5 Results

5.1 Data processing During the experiments, detailed dataflow information was logged at both routers, on all ingress and egress points. To obtain goodput information per flow, the tcpdump log files of the egress point at the first router were used. These logs were processed by scripts, mostly in AWK, to determine how much useful data a source was able transmit over the bottleneck link in the interval between t=65s (the start of the second flow) and t=400s. The goodput figure per source was obtained by summing all payloads in this interval; for a justification and definition of this goodput measurement method, see appendix A. Averaging the goodput over more than 5 minutes filters out any startup effects and will indicate whether or not a flow was dominant over a longer period of time.

5.2 Phase Effect visible The data processing described above yields a goodput figure for all delay settings of the second flow. Figure represents a graphical representation of the network phase effect results.

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for flow from 2 to 6 in ms

Figure - Goodput for 2 flows, experiencing the phase effect 2

For delay settings from 0 ms to 56 ms, the flow from node 1 to 5 gets around 63% of the total goodput that both flows obtain on the bottleneck. This link even obtains this larger share for delay settings of the other flow below 20 ms (the amount of delay it had itself), indicating that this might be a result of the one minute head start it got. Normal TCP goodput calculation suggest that flows with an (almost) equal RTT and equal conditions should get an equal amount of goodput, but apparently being the first flow to fill the queue at the first router gives an advantage over the second

2 The measured values for 76 ms are omitted as a result of a prematurely terminated measurement; an estimated figure based on partial results is in line with neighbouring results.

16

flow. Defending spots in the queue is easier than stealing them, because the low delay settings almost always allow a source to reclaim the vacant spot in the queue that was created by the transmission of one of its own packets.

Flow 2 (from node 2 to 6) is clearly dominant from 60 to and including 104 ms delay, taking 80% of the total goodput; a clear example of the network phase effect. This window has the same end as predicted during our analysis of the phase effect, but starts at considerably smaller delay settings. This is probably due to buffering at the front of the queue: a packet being sent will free its spot in the queue before it is transmitted entirely. Based on the difference between the theoretical delay at which the phase effect start (80 ms), and our measurement (60 ms), this buffer is probably 256 bytes.

Some further conclusions can be drawn from these results:

• The phase effect does not only show up at the predicted interval around a single transmission time, but also at around twice the transmission time (148 ms up to 192 ms) • When either of the flows takes a larger part of the bandwidth, it has a clear advantage over the other flow, but the ratio decreases at higher delay settings (dropping from 80/20 to 65/35 in these tests) • The ratio between the goodputs remains constant during the delay range at which the phase effect is present (goodput is constant for 0 to 56 ms, 60 to 104, 108 to 144 and 148 to 192 ms)

5.3 Measurement abnormalities During the (highly automated) measurements, a few small glitches occurred.

5.3.1 Small payload packets Close analysis of the captured data showed unintended and unpredicted behavior of the sending nodes: some packets did not carry a full payload. Netcat should have filled all packets up to the MTU with input taken from /dev/zero, but apparently something interfered. Due to late discovery and breakdown of the network setup, no further investigation to this matter could be done.

These smaller packets did however allow for a look at the stability of the network phase effect. Since these packets have a shorter transmission time over the bottleneck, they may disrupt the phase effect, as it heavily depends on a constant packet transmission time. Detailed observation however did not show an interruption of the dominance (if present) of one flow over the other.

On rare occasions, these smaller packets occurred in a strict periodic pattern, being transmitted every 6.5 or 16.5 seconds. No logical explanation for this strange behavior has been found.

5.3.2 Terminated measurements A few measurements were cut short due to an inexplicable error. Unfortunately, these flaws were not discovered until after the network setup had been dismantled, so there was no opportunity to re-do these measurement. Missing data is left out of the associate graphs, and these occurrences are indicated in footnotes at those graphs.

17

5.4 Phase effect for different TCP flavors The results in the previous paragraph were obtained with two TCP BIC flows. To investigate the effect of different flow control mechanisms employed by various TCP flavors, the above tests were repeated with both flows set to TCP Reno (a predecessor to BIC) and TCP Scalable (intended for high bandwidth networks). Figure and Figure show their respective results.

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for flow from 2 to 6 in ms

Figure - Phase effect in TCP Reno

18

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for flow from 2 to 6 in ms

Figure - Phase effect in TCP Scalable 3

These TCP mechanisms too show the phase effect occurring, but some differences can be observed.

5.4.1 Equal tripping points All mechanisms have the same tripping points (i.e after 56, 104, 144 and 192 ms). As these points are entirely determined by the transmission time of a (fully filled) packet, different TCP flavors should not influence them, which is affirmed by the measurements.

5.4.2 Bandwidth sharing ratio Reno and BIC share the bandwidth (both during and outside of phase effect periods) in the same ratio, which becomes more equal for higher delay settings. Scalable on the other hand always divides the bandwidth 85 to 15 percent. This is caused by Scalable’s low penalty for loss (the 7 th congestion window is reduced to /8 instead of ½ !) and quicker recovery from loss. Those conditions make packet loss a minor issue for the dominant flow, whereas the dominated flow has almost no opportunity to steal bandwidth from its competitor.

Figure shows goodput figures for the flow from source 1 to destination 5, for all three investigated flavors. This plot clearly illustrates both of the above observations.

3 Measurement for 28 ms failed; results omitted

19

BIC Reno 10000 Scalable

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for flow from 2 to 6 in ms

Figure - Goodput comparison of BIC, Reno and Scalable (only one flow plotted)

5.4.3 Low delay and Scalable Before the first phase effect period (i.e. for small delay settings, up to 54 ms) Scalable has no constant dominant flow. This seems to be caused by luck: a source that manages to obtain incidental burst of packets to be transmitted, can maintain that burst easily.

5.5 Comparison of different flavors Since a slightly different behavior of the three TCP flavors was observed, it would be interesting to see how these flavors would compete against each other. Since the main focus of these tests would now be on throughput and not the ability to ‘break in’ on the other link, both flows were started at

20

the same time (t=0). See Figure .

Reno v. Scalable, Scalable BIC v. Scalable, Scalable BIC v. Reno, BIC 10000 BIC v. Reno, Reno BIC v. Scalable, BIC Reno v. Scalable, Reno

8000

6000

Goodput on SLIP link in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for both flows in ms

Figure - Different TCP flavors competing against each other, when started at the same time

As with the previous tests on asymmetrical startup times, Scalable proves to be the most aggressive. All other protocols loose out to it. BIC and Reno do not show a clear winner.

5.6 Equal delay settings For comparison, a series of ‘equal opportunity’ tests was done. In these tests, both flows were given the same delay settings on each link, the same TCP flavor was used at all nodes and both flows started at the same time.

These tests would show what influence the delay would have on the distribution of goodput without the phase effect, giving an insight in what part of the phase effect is caused by synchronizing transmission time and RTT.

Figure , Figure and Figure show the results of these tests with two flows of BIC, Reno and Scalable respectively.

21

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for both flows in ms

Figure - Goodput for two TCP BIC streams, started simultaneous and with equal delay settings

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for both flows in ms

Figure - Goodput for two TCP Reno streams, started simultaneous and with equal delay settings

22

Goodput from 1 to 5 Goodput from 2 to 6 10000

8000

6000

Goodput on linkSLIP in B/s 4000

2000

0 0 50 100 150 200 Total roundtrip delay for both flows in ms

Figure - Goodput for two TCP Scalable streams, started simultaneous and with equal delay settings

These results clearly illustrate that the phase effect is not invoked by the kind of TCP flavor used, only by the unequal delay settings, as all test using equal delay settings yield a random winner, if any, and independent of the amount of delay set. Reno and BIC share the available bandwidth almost entirely fairly, especially after the delay setting is greater than one RTT, minus the (suspected) 256 bytes buffer of the UART interface.

Scalable shows the same behavior as during the phase effect experiments: one flow almost always outperforms the other, probably due to the ‘forgiving’ nature of its congestion avoidance window scaling.

23

6 Conclusions

6.1 Phase Effect possible in lab environment The results of the tests on the phase effect confirm the predictions based on the Ns2 simulations, which motivated this project.

In addition to the Ns2 predictions, the phase effect was shown to hold for (total) delay times up to several times the bottleneck link transmission time. TCP Scalable maintained a clear ‘winner/loser’ bandwidth sharing ratio, whereas Reno and BIC see the results of the phase effect shrink for delay of more than one transmission time.

The experiments of both flows given the same extra delay show no phase effect occuring. From these experiments, we can conclude that the phase effect only works when some flows do, and some do not, tune their transmissions to be in phase with transmission periodicity at a bottleneck link.

TCP Scalable shows similar behavior (i.e. one flow obtaining much more goodput over the bottleneck link) even without fulfilling phase effect conditions, but this is a result of its congestion window scaling mechanism and yields a random dominant flow, rather than a predetermined winner as a result of phase effect timing of the RTT.

6.2 Different TCP flavors against each other When different TCP implementations were put up to each other, competing for the bottleneck link capacity under equal circumstances (delay settings and starting time), Scalable proved to be the most aggressive, gaining the most goodput over Reno and BIC . This indicated that Scalable is not suitable in a mixed-traffic environment, because its window scaling mechanism violated TCP friendly rules.

6.3 Predictions on (semi-)public networks No experiments were conducted on semi-public or public networks. However, the unintentional errors during the experiments, which caused some IP packets not to be filled up to the MTU, gave a valuable insight as to how the phase effect would hold in uncontrolled traffic environments. As the phase effect was shown to hold even when slightly disturbed by smaller packets, we expect the phase effect to also hold on public networks (where non-MTU traffic is far likelier to occur), provided that MTU-sized traffic makes up for the bulk of the competing traffic. Thus, the results of the lab experiments suggest that in an environment where traffic is mostly full-load TCP over IP, the phase effect can occur and could be invoked by means of delay addition at a source node.

Future work on this subject might focus on the possibility of a phase effect in less controlled environments, with more than one competing flow.

24

Appendix A - Defining goodput In computer networks analysis, goodput is the figure that represents the amount of useful bits (or bytes) that have been carried from source to destination, per unit of time. This figure excludes overhead like headers of transport protocols and retransmissions of lost or corrupt packets, which are needed for communication to be possible (and are included in throughput figures) but carry no useful data as far as the application or user is concerned.

In this project, goodput figures are calculated based on the packet flow on the bottleneck link, as present in the test network. Due to the setup of this network, packets that make it to the bottleneck will always reach their destination, and so will their acknowledgments. Therefore, it is fair to determine the goodput at that point.

To calculate goodput in a simple, automated way, there are a couple of options, which give slightly different results:

• Difference between first and last sequence number. Packet loss can not occur beyond the measurement point, but can occur before. Therefore, the first and last packet may not necessarily carry the first and last bytes of data transmitted within that period of time. Since we only observe a part of a whole communication between end hosts, out-of-sequence packets at the start or end of the measurement period are omitted in the calculation of goodput, but may be useful beyond that period. • Packet count. This method assumes all packets carry useful data, which is true if when assuming out-of- sequence packets are still useful beyond our observed period. The network topology ensures nu duplicate data will ever be transmitted, as all observed packets make it to their destination (note that packets are observed at the egress of the 10 packet queue, i.e. when they are put on the bottleneck link). Since the sources had an indefinite amount of data to transmit, all packets should be filled to carry maximum payload. Goodput is therefore calculated by multiplying the packet count with the MTU size, minus header sizes. • Payload count. Close observation of logged traffic showed some packets not filled to carry the maximum amount of payload, allowed by the MTU. No logical explanation could be found for this behavior, but it does affect the goodput figure when calculated according to the packet count method. Calculating goodput as the sum of all payloads is more correct method, although differences with the previous method are small, as the payload of a single packet is much less than the total amount of data transmitted, and occurrence of this behavior is almost always seldom.

The last method was used for all goodput figures in this report, since it includes all data, even out-of- sequence, but does not count non-existing payload.

25

Appendix B – NS2 gives faulty/unexpected behavior in a specific case Bert Baesjou, University of Twente, 24-12-2004

Why While running tests with the tcp-hybla module some unexpected results were generated. After some testing it did not seem to be the tcp-hybla module but a part of the tcp suite (at least TCP-Hybla, TCP-Reno and TCP-New Reno suffer from it). The tests were performed on two separate installations of ns-2.27, one with changes with respect to the hybla module which was inserted and an other without any alterations (except for some code printing the congestion window size along with a timestamp to a file and some standard bugfixes ns2 needs when it comes fresh out of the box).

Network setup

Network properties • There goes a flow from node 0 to 4 and from node 1 to 5 • No changes are made in the link capacities • A variable delay is set for the flow between 1 and 5 (this is made the 'slow' link) • No loss in the network, only at routers when the queue fills up • Queue size at the router is 10 • Both flows are New Reno • TTL = 64

Description of the problem When the delay between node 1 and the first router is 20 or larger until 25, and between the second router and node 5 is set to the same delay (both 20, both 21, etc.) the link between 1 and 5 completely starves. For values under 20 or above 25 both links show more common and expected TCP-New-Reno behavior.

26

Test results

Both flows start at 0 Up until 19 ms between node 1 and the fist router, and between the second router and node 5 we see the following type of expected behavior (Red is the 0->4 link Green is the 1->5 link 4)

Between 20 and 25 we see the following unexpected behavior:

4 I believe this is the wrong way round, since this contradicts the figures in the table below

27

After 25 we see the following behavior:

The following table shows the goodput of the both connections for the different links (kb/s), all figures are rounded.

5 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0->4 50 50 49 49 49 49 0 1 1 1 1 1 63 63 63 63 63

1->5 50 50 50 50 50 50 99 99 99 99 99 99 37 37 37 37 37

Starting both flows separate Starting both flows with a gap of 20 seconds gives us the following result (0->4 starts firsts, 1->5 20 seconds later)

28

As one can see the flow 1->5 almost directly starves out.

Conclusion NS2 has unexpected and unwanted behavior. Strange is that the 'fast' link is starving out (instead of the one with the higher delays). What the exact cause of this behavior is, is not exactly clear. The only workaround found is increasing the queue size at the router, but this is not always wanted. A real solution would only be possible with more insight in the exact code running the simulator, which means that tracing every move of the simulator has to be done. This unwanted behavior does not strengthen the thrust in accuracy of the simulations.

29

Literature Floyd, S., & Jacobsen, V. (1992). Traffic Phase Effects in Packet-Switched Gateways.

Scalable TCP, http://www.deneholme.net/tom/scalable/ (retrieved june 2008)

30