Forward Acknowledgment: Re ning TCP Congestion Control



Matthew Mathis and Jamshid Mahdavi

Pittsburgh Sup ercomputing Center

Abstract Wehave develop ed a new to improve TCP

congestion control during recovery. This algorithm, called

Wehave develop ed a Forward Acknowledgment(FACK)

Forward AcknowledgmentorFACK, works in conjunction

congestion control algorithm which addresses many of the

with the prop osed TCP SACK option [MMFR96 ]. The ex-

p erformance problems recently observed in the .

istence of the SACK option alone greatly improves the ro-

The FACK algorithm is based on rst principles of conges-

bustness of TCP following congestion. SACK will help TCP

tion control and is designed to b e used with the prop osed

to survivemultiple segment losses within a single window

TCP SACK option. By decoupling congestion control from

without incurring a retransmission timeout. SACK can also

other such as data recovery, it attains more pre-

glean additional information ab out congestion state, lead-

cise control over the data ow in the network. Weintro duce

ing to improved TCP b ehavior during recovery. The FACK

two additional algorithms to improve the b ehavior in sp e-

algorithm uses this information to add more precise control

ci c situations. Through simulation s we compare FACK to

to the injection of data into the network during recovery.

b oth Reno and Reno with SACK. Finally,we consider the

Because FACK decouples the congestion control algorithms

p otential p erformance and impact of FACK in the Internet.

(which determine when and howmuch data to send) from

the data recovery algorithms (which determine what data

1

to send), we b elieve that it is the simplest and most direct

1 Intro duction

way to use SACK to improve congestion control.

Other researchers are currently studying congestion con-

The evolution of the Internet has pushed TCP to new limits

trol issues in TCP. The research communityisvery in-

over a wide variety of IP infrastructures. Anecdotal evidence

terested in the p otential of TCP Vegas [BOP94 , DLY95].

suggests that TCP exp eriences lower than exp ected p erfor-

Through the use of delay measurements, TCP Vegas at-

mance in a numb er of situations in the Internet [tcp95 ]. The

tempts to eliminate the p erio dic self-induced segment losses

common p erception is that these weaknesses are a conse-

caused in Reno TCP. The Vegas Congestion Avoidance

quence of the failure to deploy a standard SelectiveAcknowl-

Mechanism (CAM) algorithm mo di es the \linear increase"

edgment (SACK) [JB88] in anyoftoday's TCP implemen-

phase of congestion avoidance. In another recent study,

tations. However, SACK is generally viewed as a metho d to

Ho e investigates congestion control issues during Slow-start

address data recovery; it has not b een widely investigated

[Ho e95, Ho e96]. Because our work is fo cused primarily on

to address congestion control issues.

improving congestion control during recovery (the \exp o-

Floyd p ointed out that multiple segment losses can cause

nential decrease" phase of congestion avoidance), it is com-

Reno TCP to lose its Self-clo ck, resulting in a retransmission

patible with these e orts. It is our exp ectation that each

timeout [Flo95 , Flo92 ]. These timeouts can cause a substan-

of these e orts can eventually b e incorp orated into TCP in

tial p erformance degradation. During the timeout interval,

order to incrementally improve p erformance.

no data is sent. In addition, the timeout is followed bya

In section 2 of this pap er, we describ e the principles

p erio d of Slow-start. This sequence of events underutilizes

of congestion control on whichFACK is built. Section 3

the network over several round-trip times, which results in

presents a detailed description of the FACK algorithm. Sec-

a signi cant p erformance reduction on long-delay links. At

tion 4 examines the basic b ehavior of the FACK algorithm

the heart of this problem is the inabili ty of Reno TCP to

and several optional algorithms. Sections 5 and 6 explore

accurately control congestion while recovering from dropp ed

the p erformance of the various algorithms presented in the

segments.

pap er. In section 7 we discuss future research directions for



This work is supp orted in part by National Science Foundation

this work. Finally,we summarize our ndings.

Grant No. NCR-9415552.

2 Congestion Control

2.1 Ideal Principles

In 1988, Van Jacobson published the pap er that has b e-

come the standard for TCP congestion control algorithms

1

This idea has b een prop osed b efore [CLZ87], but it has not b een

implemented for TCP.

[Jac88, Bra89]. We do not mo dify any of the algorithms de- the likeliho o d of out-of-order segments triggering spurious

scrib ed in that pap er. Rather, FACK extends these conges- retransmissions.

tion control algorithms to TCP's recovery interval. The key The Fast Recovery algorithm attempts to estimate how

concepts of \conservation of packets," \Self-clo ck," \Con- much data remains outstanding in the network by counting

gestion Avoidance" and \Slow-start" are reviewed b elow. duplicate ACKs. It arti ciall y in ates cw nd on each dupli-

\Conservation of packets" requires that a new segment cate ACK that is received, causing new data to b e transmit-

not b e injected into the network until an old segment has ted as cw nd b ecomes large enough. Fast Recovery allows one

left. This principle leads to an inherent stabilityby ensuring (halved) window of new data to b e transmitted following a

that the numb er of segments in the network remains con- Fast Retransmit.

stant. Other schemes, esp ecially rate-based transmission, Under single segment losses, Fast Retransmit and Fast

can cause the numb er of segments in the network to grow Recovery preserve TCP's Self-clo ck and enable it to keep the

without b ound during p erio ds of congestion, b ecause dur- network full while recovering from one lost segment. If there

ing congestion the transmission time for segments increases. are multiple lost segments, Reno is unlikely to fully recover,

TCP implements conservation of packets by relying on \Self- resulting in a timeout and subsequent Slow-start [Flo95 ].

clo cking": segment transmissions are generally triggered by

returning acknowledgements. TCP's Self-clo ck contributes

2.3 SACK TCP Behavior

substantiall y to protecting the network from congestion.

The new TCP SACK option [MMFR96] is progressing

\Congestion Avoidance" is the equilibriu m state algo-

through the IETF standards track. It is a slight mo di -

rithm for TCP. TCP maintains a congestion window, cw nd,

cation to the original SACK option describ ed in RFC1072

which represents the maximum amount of outstanding data

[JB88]. When the receiver holds non-contiguous data, it

on the connection. When the TCP sender detects con-

sends duplicate ACKs b earing SACK options to inform the

gestion in the network | identi ed by the loss of one or

sender which segments have b een correctly received. Each

more segments | the congestion window is halved. Un-

blo ck of contiguous data is expressed in the SACK option

der other conditions, the congestion window is increased

using the sequence numb er of the rst o ctet of data in the

linearly by one maximum segment size (MSS) p er round

blo ck, and the sequence numb er of the o ctet just b eyond the

trip on the network. The stability of this linear increase

end of the blo ck. In the new SACK option the rst blo ckis

and multipli cati ve decrease algorithm has b een demon-

required to include the most recently received segment. Ad-

strated in manyinvestigation s since its publication in 1988

ditional SACK blo cks rep eat previously sentSACK blo cks,

[ZSC91 , FJ91, Mog92 , FJ92 , FJ93].

to increase robustness in the presence of lost ACKs.

\Slow-start" is the algorithm which TCP uses to reach

To illustrate FACK, we compare its b ehavior to a SACK

the equilibri um state when cw nd is less than a threshold,

implementation using Reno congestion control. Since there

ssthr esh. S sthr esh attempts to dynamically estimate the

is not yet a standard implementation of SACK, we make the

correct window size for the connection. At connection es-

following assumptions ab out a SACK implementation using

tablishment and after retransmission timeouts, TCP sets

Reno congestion control:

cw nd to 1 MSS and increases cw nd by 1 MSS for each

2

received ACK. This exp onential increase continues until

 Fast Retransmit and Fast Recovery are mo di ed to

cw nd reaches the Slow-start threshold, ssthr esh. Once

not resend already SACKed segments (as one would

ssthr esh is reached, TCP passes into the Congestion Avoid-

exp ect).

ance regime. S sthr esh is set to half of the currentvalue of

cw nd when the sender detects congestion or undergo es a

 Fast Recovery continues to estimate the amount of out-

retransmission timeout.

standing data by counting returning ACKs. This as-

sumption is made in order to retain the congestion

2.2 Reno TCP Behavior

prop erties of Reno TCP, and is the main distinction

between a SACK implementation using Reno conges-

Reno TCP is currently the de facto standard implementation

tion control and a SACK implementation using FACK

of TCP [Ste94]. Reno implements Slow-start and Conges-

congestion control.

tion Avoidance in the manner describ ed ab ove. It includes

the Fast Retransmit algorithm from Taho e TCP and adds

 The algorithm for detecting the end of recovery uses

one new algorithm: Fast Recovery.

the presence of SACK blo cks to prevent partial ad-

3

Both Fast Retransmit and Fast Recovery [Ste96] rely on

vances of snd:una from causing TCP to leave the re-

4

counting \duplicate ACKs" { TCP acknowledgments sent

covery state prematurely.

by the data receiver in resp onse to each additional received

In the remainder of this pap er, \Reno+SACK" will refer

segment following some missing data.

to an implementation as outlined ab ove. A SACK imple-

Fast Retransmit and Fast Recovery [Jac90, Ste94] are

mentation which uses the FACK congestion control algo-

algorithms intended to preserve Self-clo ck during recovery

rithm will b e referred to simply as \FACK".

from a lost segment. Fast Retransmit uses duplicate ACKs

to detect the loss of a segment. When three duplicate ACKs

3

The TCP sender state variable snd:una holds the sequence num-

are detected, TCP assumes that a segment has b een lost and

b er of the rst byte of unacknowledged data, snd:nxt holds the se-

retransmits it. The numb er three was chosen to minimize

quence numb er of the rst byte of unsent data. These variables are

de ned in the TCP standard [Pos81].

2

Jacobson describ es the algorithm as we do, however, he go es on

4

This xes a problem in Reno which has b een p ointed out byHoe

to note that the time it takes to op en to a given windowis\Rlog W

2

[Ho e95] and Floyd [Flo95]. In some cases, Reno may incorrectly rein-

where R is the round-trip-time and W is the window size in packets."

vokeFast Retransmit and Fast Recovery. Floyd and Ho e have ob-

When the receiver's Delayed ACK sends one ACK p er two segments,

served that strengthening Reno's test for the end of recovery improves

this estimate should actually b e Rlog W . It is generally agreed

1:5

its b ehavior in a numb er of situations [FF96 , Ho e95].

that, during Slow-start, it is correct to increase the window size by

one MSS p er ACK, even when the ACK acknowledges more than one MSS of data.

blo ck is received whichacknowledges data with a higher se- 2.4 FACK Design Goals

quence numb er than the currentvalue of snd:f ack , snd:f ack

Under single segment losses, Reno implements the ideal con-

is up dated to re ect the highest sequence numb er known to

gestion control principles set forth ab ove. However in the

have b een received plus one.

case of multiple losses, Reno fails to meet the ideal princi-

Sender algorithms that address reliable transp ort con-

ples b ecause it lacks a suciently accurate estimate of the

tinue to use the existing state variable snd:una. Sender

data outstanding in the network, at precisely the time when

algorithms that address congestion management are altered

5

it is needed most.

to use snd:f ack , which provides a more accurate view for

The requisite network state information can b e obtained

the state of the network.

with accurate knowledge ab out the forward-most data held

We de ne aw nd to b e the data sender's estimate of

by the receiver. By forward-most, we mean the correctly-

the actual quantity of data outstanding in the network.

received data with the highest sequence numb er. This is the

Assuming that all unacknowledged segments have left the

origin of the name \forward acknowledgment." The goal of

7

network:

the FACK algorithm is to p erform precise congestion con-

trol during recovery bykeeping an accurate estimate of the

aw nd = snd:nxt snd:f ack (1)

amount of data outstanding in the network. In doing so,

During recovery, data which is retransmitted must also

FACK attempts to preserve TCP's Self-clo ck and reduce

b e included in the computation of aw nd. The sender com-

the overall burstiness of TCP.

data, re ecting the quantityof putes a new variable, r etr an

Note that all TCP implementation s discussed in this

outstanding retransmitted data in the network. Each time

pap er have nearly identical b ehavior under single segment

a segment is retransmitted, r etr an data is increased by the

losses. This reduces the need for rigorous testing under

segment's size; when a retransmitted segment is determined

\ordinary" conditions b ecause all implementations have the

data is decreased by the to have left the network, r etr an

same exp ected p erformance.

segment's size. Therefore TCP's estimate of the amountof

data outstanding in the network during recovery is given by:

3 The FACK Algorithm

aw nd = snd:nxt snd:f ack + r etr an data (2)

The FACK algorithm uses the additional information pro-

vided by the SACK option to keep an explicit measure of

Using this measure of outstanding data, the FACK con-

the total number of bytes of data outstanding in the net-

gestion control algorithm can regulate the amount of data

work. In contrast, Reno and Reno+SACK b oth attempt

outstanding in the network to b e within one MSS of the

8

to estimate this by assuming that each duplicate ACK re-

currentvalue of cw nd:

ceived represents one segment which has left the network.

The FACK algorithm is able to do this in a straightforward

while (awnd < cwnd)

waybyintro ducing two new state variables, snd:f ack and

sendsomething();

data. Also, the sender must retain information on r etr an

data blo cks held by the receiver, which is required in or-

The FACK congestion control algorithm do es not place

der to use SACK information to correctly retransmit data.

sp ecial requirements on sendsomething (); the algorithm

In addition to what is needed to control data retransmis-

implied by the SACK Internet-Draft is sucient. Gener-

sion, information on retransmitted segments must b e kept

ally sendsomething () should cho ose to send the oldest data

9

in order to accurately determine when they have left the

rst.

network.

FACK derives its robustness from the simplicity of up-

At the core of the FACK congestion control algorithm is

dating its state variables: if sendsomething () retransmits

a new TCP state variable in the data sender. This new vari-

old data, it will increase r etr an data; if it sends new data,

able, snd:f ack , is up dated to re ect the forward-most data

it advances snd:nxt. Corresp ondingl y,ACKs which rep ort

held by the receiver. In non-recovery states, the snd:f ack

new data at the receiver either decrease r etr an data or ad-

variable is up dated from the acknowledgmentnumb er in the

vance snd:f ack .Furthermore, if the sender receives an ACK

TCP header and is the same as snd:una. During recovery

which advances snd:f ack beyond the value of snd:nxt at the

(while the receiver holds non-contiguous data) the sender

time a segmentwas retransmitted (and that retransmitted

continues to up date snd:una from the acknowledgmentnum-

segment is otherwise unaccounted for), the sender knows

b er in the TCP header, but utilizes information contained

that the segment whichwas retransmitted has b een lost.

6

in TCP SACK options to up date snd:f ack . When a SACK

5

3.1 Triggering Recovery

The observation that Reno inaccurately assesses the network

state arose as a part of ongoing research aimed at developing to ols

Reno invokes Fast Recovery by counting duplicate acknowl-

for b enchmarking the pro duction Internet [ipp96]. Our e orts fo cused

edgments:

on a to ol called \TReno" for Traceroute-Reno [Mat95, Mat96], which

is an evolution of an earlier to ol \Windowed Ping" [Mat94]. TReno

nature of FACK and SACK, and the exp ected imminent deployment

attempts to measure the available network headro om byemulating

of SACK, in our researchwe are assuming that FACK is implemented

Reno TCP over a traceroute-like UDP stream. Although based on

in conjunction with SACK.

Reno congestion control, TReno was observed to exhibit signi cantly

7

This is true when the network is not reordering segments and

di erent b ehavior largely due to its precise picture of the congestion

there have b een no retransmissions.

state of the network. Our investigation of the di erences b etween

8

TReno and Reno b ehaviors led us to discover FACK's underlying In the case when cw nd has b een halved immediately following a

principles. lost segment, aw nd will b e signi cant larger than cw nd. This issue

6

is addressed in section 4.5.

In principle, the FACK algorithm could also b e implemented by

9

utilizing the information provided by the receiver through other mech- If sendsomething () cho oses to send new data, it is also con-

anisms, such as TCP Timestamp option, to determine the rightmost strained by the receiver's window(snd:w nd) and must make an addi-

segment received [Kar95]. This would allow the b ene ts of improved tional check to ensure that the new data do es not lie b eyond the limit

congestion control during recovery to b e immediately realized in ex- imp osed by snd:w nd.Ifsendsomething () cho oses to retransmit old

isting TCP implementations. However, b ecause of the complementary data, it is not constrained by the receiver's window. if (dupacks == 3) {

... s2

s1

}

yifseveral

This algorithm causes an unnecessary dela 10 Mb/s,

ts are lost prior to receiving three duplicate acknowl-

segmen 2ms 10 Mb/s, 33ms

ts. In the FACK version, the cw nd adjustment and

edgmen 1.5 Mb/s, er rep orts retransmission are also triggered when the receiv r1 r2

5ms

that the reassembly queue is longer than 3 segments:

The round trip time b etween S1 and S2 is 80 ms, plus another 7 ms

of store and forward delay, yielding a total pip e size of 16.3 kBytes.

if ((snd.fack - snd.una) > (3 * MSS) ||

(dupacks == 3)) {

Figure 1: The test top ology

...

}

If exactly one segment is lost, the two algorithms trigger

recovery on exactly the same duplicate acknowledgment.

3.2 Ending Recovery

The recovery p erio d ends when snd:una advances to or b e-

yond snd:nxt at the time the rst loss was detected. During

the recovery p erio d, cw nd is held constant; when recovery

ends TCP returns to Congestion Avoidance and p erforms

linear increase on cw nd. In the implementation tested in

this pap er, a timeout is forced if it is detected that a re-

transmitted segment has b een lost (again). This condition

is included to preventFACK from b eing to o aggressivein

the presence of p ersistent congestion.

4 FACK b ehavior

In this section we explore the b ehavior of the FACK algo-

rithm in a simulator environment. Weintro duce another al-

gorithm, Overdamping, which estimates the correct window

more conservatively following losses as a result of Slow-start.

Finally,weintro duce a Ramp down algorithm to smo oth

data transmission during the recovery p erio d.

4.1 Simulation Environment

The network is provisioned with queues of length 17 packets. 30

segments are unnecessarily retransmitted.

We tested these new algorithms by implementing them un-

der the LBNL simulator \ns" [MF], where we added the

10

Figure 2: Reno b ehavior during Slow-start.

necessary new congestion control algorithms. The simula-

tor includes mo dels of Taho e, Reno, and Reno+SACK. We

added a FACK sender to the simulator, but were able to use

changes to op erate in networks with more intellige nt queu-

the existing SACK TCP receiver without mo di cation. Our

ing discipli nes. However, the relative b ene t of the FACK

rst set of tests uses a simple network containing four no des

algorithm in these networks will b e slightly lower b ecause

( gure 1).

episo des of congestion in such networks are exp ected to b e

Two of these no des represent routers connected bya5

less extreme.

ms T1 link; one is a host in close proximity to these routers,

In this pap er, most of our examples plot segmentnum-

and the other is a host 33 ms away. The delay

12

b ers vs. time in seconds. Each segment is shown twice,

pro duct for this network is 16.3 kBytes, including store and

once when it enters the b ottleneck queue and once when it

forward delays. In all of our tests we utilize an MSS of 1 kB.

leaves. Dropp ed segments are indicated byan\". Re-

Thus, prop erly provisioned routers in this network should

transmissions always stand out b ecause b oth the enqueue

have queues at least 17 packets long.

and dequeue events are visibly out of order. In some cases,

Wevaried queue lengths in order to examine b oth ade-

plots of window size and router queue o ccupancy are shown 11

quately provisioned and underprovisione d cases. In all of

as well.

our investigations we utilize drop-tail routers. The details of

the FACK algorithm and implementation do not require any

4.2 Behavior During Slow-start

10

An implementation of FACK will b e available in a future release

of ns.

During Slow-start, TCP op ens its window exp onentiall y,

11

Note that many historical pap ers investigating TCP dynamics

forcing the network into congestion and often dropping

use underbu ered networks in their simulations. We b elieve that any

many segments. Figure 2 shows the b ehavior of Reno during

proto col developmentwork must adequately address b oth prop erly

provisioned and underbu ered networks, and proto cols must b e shown

12

See http://www.psc.edu/networking/pap ers/ for enlarged

to b e stable (if not optimal) in b oth environments. gures.

The network is provisioned with queues of length 17 packets. No data

is unnecessarily retransmitted.

Figure 3: Reno+SACK b ehavior during Slow-start.

The network is provisioned with queues of length 10 packets, and

ssthr esh is preset to 35 segments. No data is unnecessarily retrans-

mitted.

Figure 5: SACK and FACK loss recovery details.

Slow-start. Reno is unable to handle the multiple segment

losses; it times out and then pro ceeds with a Slow-start after

the timeout interval.

Figure 3 shows the b ehavior of Reno+SACK under the

same circumstances. Reno+SACK do es not incur the time-

out. However, due to the large numb er of lost segments,

Reno+SACK underestimates the window during recovery,

and requires several round trip times to complete recovery.

Figure 4 shows the b ehavior of FACK in this situation.

FACK divides its window size bytwo, waits half of an RTT

for data to exit the network, and then pro ceeds to retransmit

lost segments.

In these examples, b oth Reno+SACK and FACK make

no unnecessary retransmissions. Reno, on the other hand,

unnecessarily retransmits 30 segments.

4.3 FACK vs. Reno+SACK

Figure 5 compares the detailed b ehaviors of FACK and

Reno+SACK in a slightly di erent case. Here, the variable

ssthr esh is preset to 35 and the b ottleneck queue has only

10 packet bu ers. In this case, the b ehaviors of FACK and

The network is provisioned with queues of length 17 packets. No data

Reno+SACK are very similar. The primary di erence is vis- is unnecessarily retransmitted.

ible in the queue length at the b ottleneck link. At the end of

recovery (ab out .8 sec), Reno+SACK makes a burst trans- Figure 4: FACK b ehavior during Slow-start.

13

mission which causes a spike in the queue length. Since

the window size after the end of recovery is identical for

b oth algorithms, FACK and Reno+SACK will have roughly

the same overall p erformance for environments where TCP

never loses more than half a window of data.

If more than half a window of data is lost, the window

estimate of Reno+SACK will not b e suciently accurate.

Figure 6 shows such a case. Here, in addition to the seg-

ments lost during Slow-start, four additional segments were

dropp ed in transit on the b ottleneck link. In this case TCP

runs out of ACKs b efore invoking Fast Recovery. In the

worst case, this would result in a retransmit timeout fol-

lowed by a Slow-start. One of the requirements of a SACK

implementation is that if the TCP sender takes a retransmit

timeout, it must clear all information ab out SACK blo cks

held by the receiver. Thus, the sender would timeout and

then Slow-start with the p ossibili ty of retransmitting data

which has already b een received. The SACK implementa-

tion in the simulator includes an additional test sp eci cally

for the case where more than half a window of data is lost,

and pro ceeds directly into Slow-start. This avoids the re-

transmit timeout, but still incurs the p enalties of Slow-start

and duplicated data. The nal result, in this case, is that

6 round trip times are lost to the Slow-start, and 25 seg-

The network is provisioned with queues of length 17 packets, and

ments are unnecessarily retransmitted. Note that it would

four non-congestion related losses have b een injected. 25 segments

b e p ossible to further optimize Reno+SACK for this case

are unnecessarily retransmitted.

bykeeping the information stored in the SACK blo cks. The

resulting TCP would only take the p enalty of the Slow-start

Figure 6: SACK recovery detail under greater than 1/2 win-

for this case.

dow of loss.

4.4 Slow-start Oversho ots and the Overdamping Algo-

rithm

In b oth the Reno and FACK examples, the congestion win-

dow is almost immediately cut in half a second time. The

reason for this b ehavior is that when dividing cw nd bytwo,

TCP should utilize the value of cw nd when the rst lost

segmentwas sent. At this p oint, the session lls the avail-

able bu er space exactly, whereas when the loss is detected

14

one RTT later, cw nd has doubled. We can improve this

b ehavior by implementing the following additional window

adjustment:

if (cwnd <= ssthresh + .5*mss)

cwnd /= 2;

15

If TCP has recently b een in Slow-start, it reduces cw nd

by an extra factor of two prior to reducing the window and

setting ssthr esh. This takes into account the fact that, at

the time the segmentwas sent, cw nd was smaller than it

was at the time the loss was detected, and therefore is more

conservative ab out setting cw nd and ssthr esh. With this

additional algorithm in place, the results of our test simu-

lation are shown in gure 7. Note that the rst segment

loss following Slow-start do es not o ccur until time 3.4 sec,

compared with gure 4 where it o ccurs at time 1.7 sec.

The network is provisioned with queues of length 17 packets. No data 4.5 Data Smo othing

is unnecessarily retransmitted.

During a congestion ep o ch, when one or more segments are

lost, TCP p erforms an exp onential backo by cutting cw nd

Figure 7: Behavior of FACK with Overdamping.

13

The size of the burst will b e equal to the numb er of dropp ed

segments plus the numb er of dropp ed ACKs minus one.

14

In this section, wehave not utilized Delayed ACKs, whichwould

cause cw nd to increase by a factor of 1.5. The e ects of Overdamping

in this case are shown in section 5.

15

We de ne \recently" as \within one half of a round-trip" of b eing

in Slow-start. The choice of one half is somewhat sub jective, but

preserves continuity at the b oundary conditions.

round trip of recovery, w intr im is reduced to zero. While

w intr im is non-zero, it acts to smo oth the data evenly over

one round trip, so that exactly cw nd bytes of data are out-

standing at the end of this round trip. The variable w inmul t

is the scale factor controlling how quickly w intr im is pulled

to zero. Normally w inmul t is set to 0.5; if Overdamping is

invoked, w inmul t is set to 0.25 instead.

In gure 8 we set the queue length in the routers to 6

packets, causing the network to b e underutilized following

Slow-start. In eachRTT following Slow-start, FACK with

Overdamping (top of gure 8) clusters its transmissions to-

gether. On the other hand, FACK with Overdamping and

Ramp down (b ottom of gure 8) evenly distributes the data

across a full round trip time, minimizing the e ects of bursts

on the network.

5 Comparison of Algorithm Performance During Slow-

start

In order to compare the p erformance of the various algo-

rithms presented in section 4, we ran simulations of six algo-

rithms over an exhaustive range of queue-lengths in the b ot-

tleneck router. The six algorithms are Reno, Reno+SACK,

FACK, FACK with Overdamping, FACK with Ramp down,

and FACK with b oth Overdamping and Ramp down. In or-

der to compare the p erformance of the various algorithms in

a meaningful way,we computed the \lost opp ortunity" for

each run | the amount of additional data which could have

b een sent if the connection had run entirely in Congestion

Avoidance. Events which cause idle time on the link during

Slow-start, such as retransmit timeouts or deep reductions

in cw nd, result in higher \lost opp ortunity".

The results of this comparison are shown in gure 9. The

The network is provisioned with queues of length 6 packets. No data

upp er graph shows the \lost opp ortunity" for each algorithm

is unnecessarily retransmitted.

with a receiver whichacknowledges every segment (as used

in all of the examples in Section 4). The lower graph uses a

Figure 8: FACK b ehavior with (b ottom) and without (top)

17

receiver with Delayed ACK.

Ramp down. Overdamping is utilized in b oth cases.

In b oth graphs, the e ects of retransmit timeouts in Reno

are clearly visible at all queue sizes. Without Delayed ACK,

in half. In current TCP implementations, the sender stops

Reno loses b etween 300 kB and 500 kB of p otential data

transmitting data until enough data has left the network to

transfer capability during slowstart. With Delayed ACK,

reduce aw nd b elow the new value of cw nd. The sender then

this value increases to b etween 650 kB and 900 kB. All of

resumes transmission of data. This typically results in a full

the options presented for SACK congestion control p erform

window of data b eing transmitted in one half of a round trip

signi cantl y b etter than Reno in the cases presented here.

time, resulting in uneven transmission of data for this and

Without Delayed ACK, the FACK algorithm alone shows

subsequent round trips. Solutions to this problem have b een

p o or p erformance for a subset of the queue sizes examined.

16

suggested [Ho e95 , Jac95 ], but have not yet b een deployed.

In these cases, FACK is to o aggressive following Slow-start,

The recommended solution for this problem is to smo oth

and takes additional resulting in a retransmission

the transmission of data over one RTT by slowly reducing

timeout. Reno+SACK also shows lower p erformance across

cw nd, rather than instantly halving it. We implemented this

all queue sizes than the remaining three variations of FACK.

solution as follows:

This is the result of additional round trips caused byACK

At the time congestion is detected:

starvation immediately following Slow-start (see gure 3).

The twoversions of FACK which include the Overdamping

algorithm show p o orer p erformance at low queue lengths.

w intr im =(snd:nxt snd:f ack )  (1 w inmul t) (3)

The b est and most consistent p erformer is the FACK algo-

rithm with Ramp down alone.

Each time snd:f ack advances byf ack :

With Delayed ACK, the FACK and Reno+SACK cases

no longer exhibit the b ehaviors mentioned ab ove, b ecause

Slow-start do es not push the network as far into congestion.

w intr im = w intr im f ack  (1 w inmul t) (4)

The e ects of Overdamping are even more pronounced, and

even at the largest queue sizes we tested, Overdamping is

Here, w intr im is added to cw nd during the \Ramp-

to o conservative compared with the other algorithms.

down" phase of congestion control. At the time recovery

b egins, cw nd + w intr im is slightly less than aw nd. After one

17

A Delayed ACK receiver sends ACKs less frequently, and at min-

imum, sends one ACK for every two MSS of data received. Delayed

16

We are aware of one research group working with a TCP imple-

ACK is used by almost all TCP implementations in the Internet.

mentation which includes a solution to this problem similar to ours

[Bal96]. s3 s1 Bulk Stream

10 Mb/s, 2ms 10 Mb/s, 33ms 1.5 Mb/s, r1 r2 5ms 10 Mb/s, 10 Mb/s, 2ms 3ms

Tiny

s2 Stream s4

Figure 10: The jitter test top ology

The receiver is not using Delayed ACK.

TCP forward path utilization as a function of the reverse path utiliza-

tion. Note that 7% load on the reverse path causes nearly 45% idle

capacity on the forward path. This example uses a 20 packet queue

length, which is more than sucient bu ering for the network.

Figure 11: Comparison of FACK, Reno, and Reno+SACK

6 Performance Comparisons

Wehaveinvestigated the b ehavior and p erformance of the

various congestion control algorithms under several scenar-

ios. One scenario, in which TCP is sub jected to delay jitter

and bursty losses, demonstrates some interesting di erences

between Reno, Reno+SACK, and FACK.

In the simulator, wehave b een able to investigate TCP's

b ehavior in this situation with a single, very low bandwidth

data stream in the reverse direction ( gure 10). The reverse

data stream is one connection with small, randomly dis-

tributed bursts of data at an average rate of two bursts p er

second. The bursts are of small constant size for each run,

ranging from 1 to 6 kB. This trac could b e, for example,

characteristic of a small NetNews stream or sp oradic e-mail.

In this environment, we ran each of the algorithms | Reno,

Reno+Sack and FACK | and compared their p erformance.

The receiver is using Delayed ACK.

Figure 11 shows the forward path p erformance versus the

reverse path load for each algorithm. Note that with only

Figure 9: Comparison of the b ehavior of various congestion

7% load on the reverse path, Reno leaves almost 50% idle

algorithms during Slow-start.

capacity on the forward path. This re ects the combined

e ects of ACK compression [ZSC91 ], drop-tail routers and

the high p enalty of retransmit timeouts. Note that this ex-

ample uses a 20 packet queue length, which is more than

sucient bu ering for this network.

In this trace we slightly reduced the bu ering from gure 11, to accentinteresting detail. All of the b ehaviors shown in this gure are presentin

one or more of the simulations used to generate gure 11.

Figure 12: Reno and Fack with jitter

...a fast retransmit followed by a retransmit 6.1 Reno vs. FACK

timeout, with the additional condition that the

Figure 12 shows detailed b ehavior of Reno and FACK in a

packet retransmitted after the retransmit time-

situation only slightly di erent than in gure 11. The tiny

out had not b een previously retransmitted...

reverse trac causes ACK compression and comp etes for

[FF96 ]

router bu er space, which, in turn, causes clusters of packet

loss in the bulk stream.

It is most likely that this b ehavior is the result of minor

In resp onse to these clusters of loss, Reno b ehavior ap-

congestion episo des which cause multiple packet loss in one

p ears chaotic, showing multiple window adjustments in a

round trip. Note that b ecause only Reno TCP implemen-

single congestion episo de and timeouts due to loss of its

tations exhibit this particular b ehavior, the prevalence of

Self-clo ck.

multiple packet loss within one round trip may b e signi -

The b ottom of gure 12 shows FACK (with Overdamping

cantly more common than suggested by this data.

and Ramp down) in exactly the same situation. Even though

On our networks at PSC (a national sup ercomputing cen-

many congestion ep o chs exp erience clusters of loss, FACK

ter with high bandwidth connectivity to the global Internet),

correctly p erforms exactly one multiplica tive decrease of

the b ehavior shown in gure 12 app ears regularly for bulk

19

cw nd p er congestion ep o ch, preserves the TCP Self-clo ck,

data transfers over mo derately loaded wide area links. The

18

and avoids all timeouts. In this regime FACK app ears to

deploymentofanyversion of SACK should nearly double the

b e a stable, well-b ehaved control system, consistent with the

throughput of bulk transfers using TCP for these cases. In

principles of ideal congestion control.

addition, we b elieveSACK TCP will b e less biased against

ATM than Reno TCP.For more typical Internet transfers,

the b ene ts of SACK will likely b e more mo derate, but still

6.2 Impact to the Internet

result in overall improvements to b oth and go o dput.

In the Internet, anecdotal evidence suggests that episo des of

19

Over a xed path, Reno's p erformance can b e improved by de-

multiple packet loss in one round trip are common. Paxson

feating TCP's cw nd calculation by setting the maximum window size

observes the following b ehavior in roughly 13% of the traces

to just slightly smaller than needed to ll the network.

he collected at ma jor Internet exchange p oints:

18

Reno+SACK p erforms as well as FACK in this situation.

7 Future Work transmission smo othing. In our investigatio ns, wehave dis-

covered that b oth FACK and Reno+SACK provide ma jor

We are currently working on an implementation of SACK

p erformance improvements over existing Reno implementa-

20

TCP which will include FACK. Once implemented, FACK

tions, due primarily to the avoidance of retransmission time-

should b e evaluated in b oth a testb ed environment and in

outs. Eventually, Reno users will p erceiveSACK implemen-

the Internet, to verify the p erformance of the algorithms

tations as having a signi cant advantage; this will provide

and to lo ok for any adverse side e ects. These investigations

incentive for the rapid widespread deploymentofSACK in

should also explore the data recovery asp ects of SACK.

the Internet.

There are several unresolved issues surrounding the al-

The FACK algorithm has several b ene ts over

gorithms presented in this pap er. We are investigatin g a

Reno+SACK. Since FACK more accurately controls the

single, simple algorithm to replace the Overdamping and

outstanding data in the network, it is less bursty than

Ramp down, as well as several metho ds for addressing p er-

Reno+SACK, and can recover from episo des of heavy loss

sistent congestion (when halving is not a sucient window

b etter than Reno+SACK. Because FACK uniformly adheres

reduction). Wehave b een mo derately successful at deriving

to basic principles of congestion control, it may b e p ossi-

closed-form mathematical mo dels for FACK TCP p erfor-

ble to pro duce formal mathematical mo dels of its b ehavior

mance in some top ologies and b elieve that this technique

and to supp ort further advances in congestion control the-

deserves further exploration.

ory.Furthermore, based on our exp erience in implementing

The new state variable snd:f ack might also b e used to

FACK in the simulator, it is more straightforward to co de

strengthen Round Trip Time Measurements (RTTM) and

and less prone to subtle bugs than Reno+SACK.

Protection Against Wrapp ed Sequence (PAWS) algorithms

For the additional algorithms presented, Overdamping

[JBB92] during recovery.

and Ramp down, we obtained mixed success. The Over-

The FACK algorithm was rst implemented in TReno,

damping algorithm is to o conservative in the general case.

an Internet p erformance metric [Mat96]. To ols to measure

The Ramp down algorithm, however, app ears to work quite

Internet p erformance should track the evolution of TCP

well. Based on the results in this pap er, future work should

[Mat].

explore variations on the Ramp down algorithm which incor-

The pro duction Internet still lacks adequate attention to

p orate the ideas included in the Overdamping algorithm.

issues of congestion and congestion detection. Many routers

Finally,we had diculties developing realistic simula-

are incapable of providing full bandwidthdelay bu ering

tions of the Internet's observed clustered packet loss. Cur-

and do not signal the onset of congestion through mecha-

rent simulation technologies do not accurately mo del the

nisms such as Random Early Detection (RED) [FJ93]. Al-

Internet with its vast complexity and huge p opulations of

21

though the FACK algorithm is designed to help in times of

users, hosts, connections and packets. This limitation

congestion, it is not a substitute for these signals at the In-

makes it dicult to predict the op erational impact of de-

ternet layer. The transp ort and internet layers must work

ploying new proto cols in the Internet. Limited simulations

together to improve the b ehavior of the Internet under high

and trac playback approaches are not likely to reveal phe-

load.

nomena resembling turbulent coupling b etween proto cols.

Other current researchinto TCP congestion is largely

We hop e to investigate new simulation paradigms in the fu-

indep endentofFACK. The Congestion Avoidance Mech-

ture.

anism (CAM) of TCP Vegas [BOP94, DLY95] attempts

to avoid unnecessary in ation of the congestion window

9 Acknowledgements

through delay sensing techniques. Ho e has done extensive

work in analyzing the e ects of congestion during Slow-start

Wewould like to thank Sally Floyd and Steve McCanne

[Ho e95, Ho e96 ], where there can b e signi cant p erformance

for making the LBNL simulator publicly available, without

problems. The implementation of SACK and/or FACK may

whichwewould have b een unable to complete this work. We

reduce the gravity of these problems, but will not eliminate

are esp ecially grateful to the ve anonymous reviewers for

them. Both of these e orts address di erent asp ects of the

their insightful comments on our initial draft of this work,

TCP congestion control problem. Ho e also discusses a form

as well as to Sally Floyd and Craig Partridge for their in-

of Ramp down, whichwas the inspiration for this part of our

valuable assistance in moving it to nal form. Wewould like

work. It should b e p ossible to incorp orate all of these con-

to thank Susan Blackman and Karen Fabrizius for rep eated

cepts in a single TCP implementation , allowing for study of

readings and markups on our grammar and sp elling . Finally,

their combined b ene ts.

wewould liketoacknowledge our management at PSC for

Finally, applications which do not use TCP are b ecoming

encouraging our research activities on TCP p erformance.

more prevalent in the Internet, and many of these applica-

tions pay little or no attention to congestion control issues.

The more predictable b ehavior and b etter understanding of

TCP congestion control may b e a step toward a standardized

transp ort layer congestion b ehavior for use by all Internet

application s.

8 Conclusion

In this pap er, wehave presented the FACK algorithm for

congestion control, the Overdamping algorithm to o set

Slow-start oversho ot, and the Ramp down algorithm for

20

This implementation will b e made publicly available when

21

completed.

In our exp eriments, we did not take advantage of the capabilities

of tcplib [DJ91], which mo dels some of these complexities.

[Jac88] Van Jacobson. Congestion Avoidance and Con- References

trol. Proceedings of ACM SIGCOMM '88, Au-

[Bal96] Hari Balakrishna n, March 1996. Presentation

gust 1988.

to the IETF TCP-LWworking group.

[Jac90] Van L. Jacobson. Fast Retransmit. Message to

the end2end-interest mailing list, April 1990.

[BOP94] Lawrence S. Brakmo, Sean W. O'Malley, and

Larry L. Peterson. TCP Vegas: New Techniques

[Jac95] Van Jacobson, July 1995. Private Communica-

for COngestion Detection and Avoidance. Pro-

tion.

ceedings of ACM SIGCOMM '94, August 1994.

[JB88] V. Jacobson and R. Braden. TCP extensions

[Bra89] R. Braden. Requirements for Internet Hosts {

for long-delay paths, Octob er 1988. Request for

Communication Layers, Octob er 1989. Request

Comments 1072.

for Comments 1122.

[JBB92] V. Jacobson, R. Braden, and D. Borman. TCP

[CLZ87] D. D. Clark, M. L. Lamb ert, and L. Zhang.

Extensions for High Performance, May 1992.

NETBLT: a high throughput transp ort pro-

Request for Comments 1323.

to col. Computer Communications Review,

[Kar95] Phil Karn, Decemb er 1995. Private Communi-

17(5):353{359, 1987.

cation.

[DJ91] Peter B. Danzig and Sugih Jamin. tc-

[Mat] Matthew Mathis. Internet Performance

plib: A library of TCP/IP trac char-

and IP Provider Metrics information page.

acteristics. Technical Rep ort TR-SYS-91-

http://www.psc.edu/~mathis/ipp m/.

01, USC Networking and Distributed Sys-

[Mat94] Matthew B. Mathis. Windowed Ping: An IP

tems Lab oratory, Octob er 1991. Obtain via:

Layer Performance Diagnostic. In Proceedings

ftp://catarina.usc.edu/pub /jami n/tcpl i b.

of INET'94/JENC5,volume 2, Prague, Czech

[DLY95] Peter B. Danzig, Zhen Liu, and Limim Yan. An

Republic, June 1994.

Evaluation of TCP Vegas by LiveEmulation.

[Mat95] Matthew Mathis. Source co de for the TReno

ACM SIGMetrics '95, 1995.

package, 1995. Obtain via:

to ols/treno.shar. ftp://ftp.psc.edu/pub/net [FF96] Kevin Fall and Sally Floyd. Compar-

isons of Taho e, Reno and Sack TCP, May

[Mat96] Matthew Mathis. Diagnosin g Internet Conges-

1996. Submitted to CCR, Obtain via

tion with a Transp ort Layer Performance To ol.

v2.ps.Z. ftp://ftp.ee.lbl.gov/pap ers/sa cks

In Proceedings of INET'96, Montreal, Queb ec,

June 1996.

[FJ91] Sally Floyd and Van Jacobson. Trac Phase

E ects in Packet-Switched Gateways. Computer

[MF] S. McCanne and S. Floyd. ns{LBNL Net-

Communications Review, 21(2), April 1991.

work Simulator. Obtain via: http://www{

nrg.ee.lbl.gov/ns/.

[FJ92] Sally Floyd and Van Jacobson. On Trac Phase

E ects in Packet-Switched Gateways. Inter-

[MMFR96] Matthew Mathis, Jamshid Mahdavi, Sally

networking: Research and Experience, 3(3):115{

Floyd, and Allyn Romanow. TCP Selective Ac-

156, Septemb er 1992.

knowledgement Options, May 1996. Internet

Draft (\work in progress") draft-ietf-tcplw-sack-

[FJ93] Sally Floyd and Van Jacobson. Random Early

02.txt, Expires: 29/7/96.

Detection Gateways for Congestion Avoidance.

[Mog92] Je C. Mogul. Observing TCP Dynamics in

IEEE/ACM Transactions on Networking, Au-

Real Networks. Proceedings of ACM SIGCOMM

gust 1993.

'92, pages 305{317, Octob er 1992.

[Flo92] Sally Floyd, February 1992. Private Communi-

[Pos81] J. Postel. Transmission Control Proto col,

cation.

Septemb er 1981. Request for Comments 793.

[Flo95] Sally Floyd. TCP and Successive Fast

[Ste94] W. Stevens. TCP/IP Il lustrated,volume 1.

Retransmits, February 1995. Obtain via

Addison-Wesley, Reading MA, 1994.

ftp://ftp.ee.lbl.gov/pap ers/fastretra ns.ps.

[Ste96] W. Richard Stevens. TCP Slow Start, Conges-

[Ho e95] Janey C. Ho e. Startup Dynamics of TCP's Con-

tion Avoidance, Fast Retransmit, and Fast Re-

gestion Control and Avoidance Schemes. Mas-

covery Algorithms, March 1996. Currently an

ter's thesis, Massachusetts Institute of Technol-

Internet Draft: draft-stevens-tcp ca-sp ec-01.txt.

ogy, June 1995.

[tcp95] Minutes of the tcp x meeting at the 34th IETF,

[Ho e96] Janey C. Ho e. Improving the Start-up Behavior

in Dallas TX, Decemb er 1995. Obtain via:

of a Congestion Control Scheme for TCP. Pro-

http://www.ietf.cnri.reston.va.us/pro ceedin gs/

ceedings of ACM SIGCOMM '96, August 1996.

95dec/tsv/tcplw.html.

[ipp96] Charter of the Benchmarking Working Group [ZSC91] Lixia Zhang, Scott Shenker, and David D.

(BMWG) of the IETF, 1996. Obtain via:

Clark. Observations on the Dynamics of a Con-

http://www.ietf.cnri.reston.va.us/html.charters/ gestion Control Algorithm: The E ects of Two-

bmwg{charter.html.

WayTrac. Proceedings of ACM SIGCOMM

'91, pages 133{148, 1991.