Forward Acknowledgment: Re ning TCP Congestion Control
Matthew Mathis and Jamshid Mahdavi
Pittsburgh Sup ercomputing Center
Abstract Wehave develop ed a new algorithm to improve TCP
congestion control during recovery. This algorithm, called
Wehave develop ed a Forward Acknowledgment(FACK)
Forward AcknowledgmentorFACK, works in conjunction
congestion control algorithm which addresses many of the
with the prop osed TCP SACK option [MMFR96 ]. The ex-
p erformance problems recently observed in the Internet.
istence of the SACK option alone greatly improves the ro-
The FACK algorithm is based on rst principles of conges-
bustness of TCP following congestion. SACK will help TCP
tion control and is designed to b e used with the prop osed
to survivemultiple segment losses within a single window
TCP SACK option. By decoupling congestion control from
without incurring a retransmission timeout. SACK can also
other algorithms such as data recovery, it attains more pre-
glean additional information ab out congestion state, lead-
cise control over the data ow in the network. Weintro duce
ing to improved TCP b ehavior during recovery. The FACK
two additional algorithms to improve the b ehavior in sp e-
algorithm uses this information to add more precise control
ci c situations. Through simulation s we compare FACK to
to the injection of data into the network during recovery.
b oth Reno and Reno with SACK. Finally,we consider the
Because FACK decouples the congestion control algorithms
p otential p erformance and impact of FACK in the Internet.
(which determine when and howmuch data to send) from
the data recovery algorithms (which determine what data
1
to send), we b elieve that it is the simplest and most direct
1 Intro duction
way to use SACK to improve congestion control.
Other researchers are currently studying congestion con-
The evolution of the Internet has pushed TCP to new limits
trol issues in TCP. The research communityisvery in-
over a wide variety of IP infrastructures. Anecdotal evidence
terested in the p otential of TCP Vegas [BOP94 , DLY95].
suggests that TCP exp eriences lower than exp ected p erfor-
Through the use of delay measurements, TCP Vegas at-
mance in a numb er of situations in the Internet [tcp95 ]. The
tempts to eliminate the p erio dic self-induced segment losses
common p erception is that these weaknesses are a conse-
caused in Reno TCP. The Vegas Congestion Avoidance
quence of the failure to deploy a standard SelectiveAcknowl-
Mechanism (CAM) algorithm mo di es the \linear increase"
edgment (SACK) [JB88] in anyoftoday's TCP implemen-
phase of congestion avoidance. In another recent study,
tations. However, SACK is generally viewed as a metho d to
Ho e investigates congestion control issues during Slow-start
address data recovery; it has not b een widely investigated
[Ho e95, Ho e96]. Because our work is fo cused primarily on
to address congestion control issues.
improving congestion control during recovery (the \exp o-
Floyd p ointed out that multiple segment losses can cause
nential decrease" phase of congestion avoidance), it is com-
Reno TCP to lose its Self-clo ck, resulting in a retransmission
patible with these e orts. It is our exp ectation that each
timeout [Flo95 , Flo92 ]. These timeouts can cause a substan-
of these e orts can eventually b e incorp orated into TCP in
tial p erformance degradation. During the timeout interval,
order to incrementally improve p erformance.
no data is sent. In addition, the timeout is followed bya
In section 2 of this pap er, we describ e the principles
p erio d of Slow-start. This sequence of events underutilizes
of congestion control on whichFACK is built. Section 3
the network over several round-trip times, which results in
presents a detailed description of the FACK algorithm. Sec-
a signi cant p erformance reduction on long-delay links. At
tion 4 examines the basic b ehavior of the FACK algorithm
the heart of this problem is the inabili ty of Reno TCP to
and several optional algorithms. Sections 5 and 6 explore
accurately control congestion while recovering from dropp ed
the p erformance of the various algorithms presented in the
segments.
pap er. In section 7 we discuss future research directions for
This work is supp orted in part by National Science Foundation
this work. Finally,we summarize our ndings.
Grant No. NCR-9415552.
2 Congestion Control
2.1 Ideal Principles
In 1988, Van Jacobson published the pap er that has b e-
come the standard for TCP congestion control algorithms
1
This idea has b een prop osed b efore [CLZ87], but it has not b een
implemented for TCP.
[Jac88, Bra89]. We do not mo dify any of the algorithms de- the likeliho o d of out-of-order segments triggering spurious
scrib ed in that pap er. Rather, FACK extends these conges- retransmissions.
tion control algorithms to TCP's recovery interval. The key The Fast Recovery algorithm attempts to estimate how
concepts of \conservation of packets," \Self-clo ck," \Con- much data remains outstanding in the network by counting
gestion Avoidance" and \Slow-start" are reviewed b elow. duplicate ACKs. It arti ciall y in ates cw nd on each dupli-
\Conservation of packets" requires that a new segment cate ACK that is received, causing new data to b e transmit-
not b e injected into the network until an old segment has ted as cw nd b ecomes large enough. Fast Recovery allows one
left. This principle leads to an inherent stabilityby ensuring (halved) window of new data to b e transmitted following a
that the numb er of segments in the network remains con- Fast Retransmit.
stant. Other schemes, esp ecially rate-based transmission, Under single segment losses, Fast Retransmit and Fast
can cause the numb er of segments in the network to grow Recovery preserve TCP's Self-clo ck and enable it to keep the
without b ound during p erio ds of congestion, b ecause dur- network full while recovering from one lost segment. If there
ing congestion the transmission time for segments increases. are multiple lost segments, Reno is unlikely to fully recover,
TCP implements conservation of packets by relying on \Self- resulting in a timeout and subsequent Slow-start [Flo95 ].
clo cking": segment transmissions are generally triggered by
returning acknowledgements. TCP's Self-clo ck contributes
2.3 SACK TCP Behavior
substantiall y to protecting the network from congestion.
The new TCP SACK option [MMFR96] is progressing
\Congestion Avoidance" is the equilibriu m state algo-
through the IETF standards track. It is a slight mo di -
rithm for TCP. TCP maintains a congestion window, cw nd,
cation to the original SACK option describ ed in RFC1072
which represents the maximum amount of outstanding data
[JB88]. When the receiver holds non-contiguous data, it
on the connection. When the TCP sender detects con-
sends duplicate ACKs b earing SACK options to inform the
gestion in the network | identi ed by the loss of one or
sender which segments have b een correctly received. Each
more segments | the congestion window is halved. Un-
blo ck of contiguous data is expressed in the SACK option
der other conditions, the congestion window is increased
using the sequence numb er of the rst o ctet of data in the
linearly by one maximum segment size (MSS) p er round
blo ck, and the sequence numb er of the o ctet just b eyond the
trip on the network. The stability of this linear increase
end of the blo ck. In the new SACK option the rst blo ckis
and multipli cati ve decrease algorithm has b een demon-
required to include the most recently received segment. Ad-
strated in manyinvestigation s since its publication in 1988
ditional SACK blo cks rep eat previously sentSACK blo cks,
[ZSC91 , FJ91, Mog92 , FJ92 , FJ93].
to increase robustness in the presence of lost ACKs.
\Slow-start" is the algorithm which TCP uses to reach
To illustrate FACK, we compare its b ehavior to a SACK
the equilibri um state when cw nd is less than a threshold,
implementation using Reno congestion control. Since there
ssthr esh. S sthr esh attempts to dynamically estimate the
is not yet a standard implementation of SACK, we make the
correct window size for the connection. At connection es-
following assumptions ab out a SACK implementation using
tablishment and after retransmission timeouts, TCP sets
Reno congestion control:
cw nd to 1 MSS and increases cw nd by 1 MSS for each
2
received ACK. This exp onential increase continues until
Fast Retransmit and Fast Recovery are mo di ed to
cw nd reaches the Slow-start threshold, ssthr esh. Once
not resend already SACKed segments (as one would
ssthr esh is reached, TCP passes into the Congestion Avoid-
exp ect).
ance regime. S sthr esh is set to half of the currentvalue of
cw nd when the sender detects congestion or undergo es a
Fast Recovery continues to estimate the amount of out-
retransmission timeout.
standing data by counting returning ACKs. This as-
sumption is made in order to retain the congestion
2.2 Reno TCP Behavior
prop erties of Reno TCP, and is the main distinction
between a SACK implementation using Reno conges-
Reno TCP is currently the de facto standard implementation
tion control and a SACK implementation using FACK
of TCP [Ste94]. Reno implements Slow-start and Conges-
congestion control.
tion Avoidance in the manner describ ed ab ove. It includes
the Fast Retransmit algorithm from Taho e TCP and adds
The algorithm for detecting the end of recovery uses
one new algorithm: Fast Recovery.
the presence of SACK blo cks to prevent partial ad-
3
Both Fast Retransmit and Fast Recovery [Ste96] rely on
vances of snd:una from causing TCP to leave the re-
4
counting \duplicate ACKs" { TCP acknowledgments sent
covery state prematurely.
by the data receiver in resp onse to each additional received
In the remainder of this pap er, \Reno+SACK" will refer
segment following some missing data.
to an implementation as outlined ab ove. A SACK imple-
Fast Retransmit and Fast Recovery [Jac90, Ste94] are
mentation which uses the FACK congestion control algo-
algorithms intended to preserve Self-clo ck during recovery
rithm will b e referred to simply as \FACK".
from a lost segment. Fast Retransmit uses duplicate ACKs
to detect the loss of a segment. When three duplicate ACKs
3
The TCP sender state variable snd:una holds the sequence num-
are detected, TCP assumes that a segment has b een lost and
b er of the rst byte of unacknowledged data, snd:nxt holds the se-
retransmits it. The numb er three was chosen to minimize
quence numb er of the rst byte of unsent data. These variables are
de ned in the TCP standard [Pos81].
2
Jacobson describ es the algorithm as we do, however, he go es on
4
This xes a problem in Reno which has b een p ointed out byHoe
to note that the time it takes to op en to a given windowis\Rlog W
2
[Ho e95] and Floyd [Flo95]. In some cases, Reno may incorrectly rein-
where R is the round-trip-time and W is the window size in packets."
vokeFast Retransmit and Fast Recovery. Floyd and Ho e have ob-
When the receiver's Delayed ACK sends one ACK p er two segments,
served that strengthening Reno's test for the end of recovery improves
this estimate should actually b e Rlog W . It is generally agreed
1:5
its b ehavior in a numb er of situations [FF96 , Ho e95].
that, during Slow-start, it is correct to increase the window size by
one MSS p er ACK, even when the ACK acknowledges more than one MSS of data.
blo ck is received whichacknowledges data with a higher se- 2.4 FACK Design Goals
quence numb er than the currentvalue of snd:f ack , snd:f ack
Under single segment losses, Reno implements the ideal con-
is up dated to re ect the highest sequence numb er known to
gestion control principles set forth ab ove. However in the
have b een received plus one.
case of multiple losses, Reno fails to meet the ideal princi-
Sender algorithms that address reliable transp ort con-
ples b ecause it lacks a suciently accurate estimate of the
tinue to use the existing state variable snd:una. Sender
data outstanding in the network, at precisely the time when
algorithms that address congestion management are altered
5
it is needed most.
to use snd:f ack , which provides a more accurate view for
The requisite network state information can b e obtained
the state of the network.
with accurate knowledge ab out the forward-most data held
We de ne aw nd to b e the data sender's estimate of
by the receiver. By forward-most, we mean the correctly-
the actual quantity of data outstanding in the network.
received data with the highest sequence numb er. This is the
Assuming that all unacknowledged segments have left the
origin of the name \forward acknowledgment." The goal of
7
network:
the FACK algorithm is to p erform precise congestion con-
trol during recovery bykeeping an accurate estimate of the
aw nd = snd:nxt snd:f ack (1)
amount of data outstanding in the network. In doing so,
During recovery, data which is retransmitted must also
FACK attempts to preserve TCP's Self-clo ck and reduce
b e included in the computation of aw nd. The sender com-
the overall burstiness of TCP.
data, re ecting the quantityof putes a new variable, r etr an
Note that all TCP implementation s discussed in this
outstanding retransmitted data in the network. Each time
pap er have nearly identical b ehavior under single segment
a segment is retransmitted, r etr an data is increased by the
losses. This reduces the need for rigorous testing under
segment's size; when a retransmitted segment is determined
\ordinary" conditions b ecause all implementations have the
data is decreased by the to have left the network, r etr an
same exp ected p erformance.
segment's size. Therefore TCP's estimate of the amountof
data outstanding in the network during recovery is given by:
3 The FACK Algorithm
aw nd = snd:nxt snd:f ack + r etr an data (2)
The FACK algorithm uses the additional information pro-
vided by the SACK option to keep an explicit measure of
Using this measure of outstanding data, the FACK con-
the total number of bytes of data outstanding in the net-
gestion control algorithm can regulate the amount of data
work. In contrast, Reno and Reno+SACK b oth attempt
outstanding in the network to b e within one MSS of the
8
to estimate this by assuming that each duplicate ACK re-
currentvalue of cw nd:
ceived represents one segment which has left the network.
The FACK algorithm is able to do this in a straightforward
while (awnd < cwnd)
waybyintro ducing two new state variables, snd:f ack and
sendsomething();
data. Also, the sender must retain information on r etr an
data blo cks held by the receiver, which is required in or-
The FACK congestion control algorithm do es not place
der to use SACK information to correctly retransmit data.
sp ecial requirements on sendsomething (); the algorithm
In addition to what is needed to control data retransmis-
implied by the SACK Internet-Draft is sucient. Gener-
sion, information on retransmitted segments must b e kept
ally sendsomething () should cho ose to send the oldest data
9
in order to accurately determine when they have left the
rst.
network.
FACK derives its robustness from the simplicity of up-
At the core of the FACK congestion control algorithm is
dating its state variables: if sendsomething () retransmits
a new TCP state variable in the data sender. This new vari-
old data, it will increase r etr an data; if it sends new data,
able, snd:f ack , is up dated to re ect the forward-most data
it advances snd:nxt. Corresp ondingl y,ACKs which rep ort
held by the receiver. In non-recovery states, the snd:f ack
new data at the receiver either decrease r etr an data or ad-
variable is up dated from the acknowledgmentnumb er in the
vance snd:f ack .Furthermore, if the sender receives an ACK
TCP header and is the same as snd:una. During recovery
which advances snd:f ack beyond the value of snd:nxt at the
(while the receiver holds non-contiguous data) the sender
time a segmentwas retransmitted (and that retransmitted
continues to up date snd:una from the acknowledgmentnum-
segment is otherwise unaccounted for), the sender knows
b er in the TCP header, but utilizes information contained
that the segment whichwas retransmitted has b een lost.
6
in TCP SACK options to up date snd:f ack . When a SACK
5
3.1 Triggering Recovery
The observation that Reno inaccurately assesses the network
state arose as a part of ongoing research aimed at developing to ols
Reno invokes Fast Recovery by counting duplicate acknowl-
for b enchmarking the pro duction Internet [ipp96]. Our e orts fo cused
edgments:
on a to ol called \TReno" for Traceroute-Reno [Mat95, Mat96], which
is an evolution of an earlier to ol \Windowed Ping" [Mat94]. TReno
nature of FACK and SACK, and the exp ected imminent deployment
attempts to measure the available network headro om byemulating
of SACK, in our researchwe are assuming that FACK is implemented
Reno TCP over a traceroute-like UDP stream. Although based on
in conjunction with SACK.
Reno congestion control, TReno was observed to exhibit signi cantly
7
This is true when the network is not reordering segments and
di erent b ehavior largely due to its precise picture of the congestion
there have b een no retransmissions.
state of the network. Our investigation of the di erences b etween
8
TReno and Reno b ehaviors led us to discover FACK's underlying In the case when cw nd has b een halved immediately following a
principles. lost segment, aw nd will b e signi cant larger than cw nd. This issue
6
is addressed in section 4.5.
In principle, the FACK algorithm could also b e implemented by
9
utilizing the information provided by the receiver through other mech- If sendsomething () cho oses to send new data, it is also con-
anisms, such as TCP Timestamp option, to determine the rightmost strained by the receiver's window(snd:w nd) and must make an addi-
segment received [Kar95]. This would allow the b ene ts of improved tional check to ensure that the new data do es not lie b eyond the limit
congestion control during recovery to b e immediately realized in ex- imp osed by snd:w nd.Ifsendsomething () cho oses to retransmit old
isting TCP implementations. However, b ecause of the complementary data, it is not constrained by the receiver's window. if (dupacks == 3) {
... s2
s1
}
yifseveral
This algorithm causes an unnecessary dela 10 Mb/s,
ts are lost prior to receiving three duplicate acknowl-
segmen 2ms 10 Mb/s, 33ms
ts. In the FACK version, the cw nd adjustment and
edgmen 1.5 Mb/s, er rep orts retransmission are also triggered when the receiv r1 r2
5ms
that the reassembly queue is longer than 3 segments:
The round trip time b etween S1 and S2 is 80 ms, plus another 7 ms
of store and forward delay, yielding a total pip e size of 16.3 kBytes.
if ((snd.fack - snd.una) > (3 * MSS) ||
(dupacks == 3)) {
Figure 1: The test top ology
...
}
If exactly one segment is lost, the two algorithms trigger
recovery on exactly the same duplicate acknowledgment.
3.2 Ending Recovery
The recovery p erio d ends when snd:una advances to or b e-
yond snd:nxt at the time the rst loss was detected. During
the recovery p erio d, cw nd is held constant; when recovery
ends TCP returns to Congestion Avoidance and p erforms
linear increase on cw nd. In the implementation tested in
this pap er, a timeout is forced if it is detected that a re-
transmitted segment has b een lost (again). This condition
is included to preventFACK from b eing to o aggressivein
the presence of p ersistent congestion.
4 FACK b ehavior
In this section we explore the b ehavior of the FACK algo-
rithm in a simulator environment. Weintro duce another al-
gorithm, Overdamping, which estimates the correct window
more conservatively following losses as a result of Slow-start.
Finally,weintro duce a Ramp down algorithm to smo oth
data transmission during the recovery p erio d.
4.1 Simulation Environment
The network is provisioned with queues of length 17 packets. 30
segments are unnecessarily retransmitted.
We tested these new algorithms by implementing them un-
der the LBNL simulator \ns" [MF], where we added the
10
Figure 2: Reno b ehavior during Slow-start.
necessary new congestion control algorithms. The simula-
tor includes mo dels of Taho e, Reno, and Reno+SACK. We
added a FACK sender to the simulator, but were able to use
changes to op erate in networks with more intellige nt queu-
the existing SACK TCP receiver without mo di cation. Our
ing discipli nes. However, the relative b ene t of the FACK
rst set of tests uses a simple network containing four no des
algorithm in these networks will b e slightly lower b ecause
( gure 1).
episo des of congestion in such networks are exp ected to b e
Two of these no des represent routers connected bya5
less extreme.
ms T1 link; one is a host in close proximity to these routers,
In this pap er, most of our examples plot segmentnum-
and the other is a host 33 ms away. The bandwidthdelay
12
b ers vs. time in seconds. Each segment is shown twice,
pro duct for this network is 16.3 kBytes, including store and
once when it enters the b ottleneck queue and once when it
forward delays. In all of our tests we utilize an MSS of 1 kB.
leaves. Dropp ed segments are indicated byan\". Re-
Thus, prop erly provisioned routers in this network should
transmissions always stand out b ecause b oth the enqueue
have queues at least 17 packets long.
and dequeue events are visibly out of order. In some cases,
Wevaried queue lengths in order to examine b oth ade-
plots of window size and router queue o ccupancy are shown 11
quately provisioned and underprovisione d cases. In all of
as well.
our investigations we utilize drop-tail routers. The details of
the FACK algorithm and implementation do not require any
4.2 Behavior During Slow-start
10
An implementation of FACK will b e available in a future release
of ns.
During Slow-start, TCP op ens its window exp onentiall y,
11
Note that many historical pap ers investigating TCP dynamics
forcing the network into congestion and often dropping
use underbu ered networks in their simulations. We b elieve that any
many segments. Figure 2 shows the b ehavior of Reno during
proto col developmentwork must adequately address b oth prop erly
provisioned and underbu ered networks, and proto cols must b e shown
12
See http://www.psc.edu/networking/pap ers/ for enlarged
to b e stable (if not optimal) in b oth environments. gures.
The network is provisioned with queues of length 17 packets. No data
is unnecessarily retransmitted.
Figure 3: Reno+SACK b ehavior during Slow-start.
The network is provisioned with queues of length 10 packets, and
ssthr esh is preset to 35 segments. No data is unnecessarily retrans-
mitted.
Figure 5: SACK and FACK loss recovery details.
Slow-start. Reno is unable to handle the multiple segment
losses; it times out and then pro ceeds with a Slow-start after
the timeout interval.
Figure 3 shows the b ehavior of Reno+SACK under the
same circumstances. Reno+SACK do es not incur the time-
out. However, due to the large numb er of lost segments,
Reno+SACK underestimates the window during recovery,
and requires several round trip times to complete recovery.
Figure 4 shows the b ehavior of FACK in this situation.
FACK divides its window size bytwo, waits half of an RTT
for data to exit the network, and then pro ceeds to retransmit
lost segments.
In these examples, b oth Reno+SACK and FACK make
no unnecessary retransmissions. Reno, on the other hand,
unnecessarily retransmits 30 segments.
4.3 FACK vs. Reno+SACK
Figure 5 compares the detailed b ehaviors of FACK and
Reno+SACK in a slightly di erent case. Here, the variable
ssthr esh is preset to 35 and the b ottleneck queue has only
10 packet bu ers. In this case, the b ehaviors of FACK and
The network is provisioned with queues of length 17 packets. No data
Reno+SACK are very similar. The primary di erence is vis- is unnecessarily retransmitted.
ible in the queue length at the b ottleneck link. At the end of
recovery (ab out .8 sec), Reno+SACK makes a burst trans- Figure 4: FACK b ehavior during Slow-start.
13
mission which causes a spike in the queue length. Since
the window size after the end of recovery is identical for
b oth algorithms, FACK and Reno+SACK will have roughly
the same overall p erformance for environments where TCP
never loses more than half a window of data.
If more than half a window of data is lost, the window
estimate of Reno+SACK will not b e suciently accurate.
Figure 6 shows such a case. Here, in addition to the seg-
ments lost during Slow-start, four additional segments were
dropp ed in transit on the b ottleneck link. In this case TCP
runs out of ACKs b efore invoking Fast Recovery. In the
worst case, this would result in a retransmit timeout fol-
lowed by a Slow-start. One of the requirements of a SACK
implementation is that if the TCP sender takes a retransmit
timeout, it must clear all information ab out SACK blo cks
held by the receiver. Thus, the sender would timeout and
then Slow-start with the p ossibili ty of retransmitting data
which has already b een received. The SACK implementa-
tion in the simulator includes an additional test sp eci cally
for the case where more than half a window of data is lost,
and pro ceeds directly into Slow-start. This avoids the re-
transmit timeout, but still incurs the p enalties of Slow-start
and duplicated data. The nal result, in this case, is that
6 round trip times are lost to the Slow-start, and 25 seg-
The network is provisioned with queues of length 17 packets, and
ments are unnecessarily retransmitted. Note that it would
four non-congestion related losses have b een injected. 25 segments
b e p ossible to further optimize Reno+SACK for this case
are unnecessarily retransmitted.
bykeeping the information stored in the SACK blo cks. The
resulting TCP would only take the p enalty of the Slow-start
Figure 6: SACK recovery detail under greater than 1/2 win-
for this case.
dow of loss.
4.4 Slow-start Oversho ots and the Overdamping Algo-
rithm
In b oth the Reno and FACK examples, the congestion win-
dow is almost immediately cut in half a second time. The
reason for this b ehavior is that when dividing cw nd bytwo,
TCP should utilize the value of cw nd when the rst lost
segmentwas sent. At this p oint, the session lls the avail-
able bu er space exactly, whereas when the loss is detected
14
one RTT later, cw nd has doubled. We can improve this
b ehavior by implementing the following additional window
adjustment:
if (cwnd <= ssthresh + .5*mss)
cwnd /= 2;
15
If TCP has recently b een in Slow-start, it reduces cw nd
by an extra factor of two prior to reducing the window and
setting ssthr esh. This takes into account the fact that, at
the time the segmentwas sent, cw nd was smaller than it
was at the time the loss was detected, and therefore is more
conservative ab out setting cw nd and ssthr esh. With this
additional algorithm in place, the results of our test simu-
lation are shown in gure 7. Note that the rst segment
loss following Slow-start do es not o ccur until time 3.4 sec,
compared with gure 4 where it o ccurs at time 1.7 sec.
The network is provisioned with queues of length 17 packets. No data 4.5 Data Smo othing
is unnecessarily retransmitted.
During a congestion ep o ch, when one or more segments are
lost, TCP p erforms an exp onential backo by cutting cw nd
Figure 7: Behavior of FACK with Overdamping.
13
The size of the burst will b e equal to the numb er of dropp ed
segments plus the numb er of dropp ed ACKs minus one.
14
In this section, wehave not utilized Delayed ACKs, whichwould
cause cw nd to increase by a factor of 1.5. The e ects of Overdamping
in this case are shown in section 5.
15
We de ne \recently" as \within one half of a round-trip" of b eing
in Slow-start. The choice of one half is somewhat sub jective, but
preserves continuity at the b oundary conditions.
round trip of recovery, w intr im is reduced to zero. While
w intr im is non-zero, it acts to smo oth the data evenly over
one round trip, so that exactly cw nd bytes of data are out-
standing at the end of this round trip. The variable w inmul t
is the scale factor controlling how quickly w intr im is pulled
to zero. Normally w inmul t is set to 0.5; if Overdamping is
invoked, w inmul t is set to 0.25 instead.
In gure 8 we set the queue length in the routers to 6
packets, causing the network to b e underutilized following
Slow-start. In eachRTT following Slow-start, FACK with
Overdamping (top of gure 8) clusters its transmissions to-
gether. On the other hand, FACK with Overdamping and
Ramp down (b ottom of gure 8) evenly distributes the data
across a full round trip time, minimizing the e ects of bursts
on the network.
5 Comparison of Algorithm Performance During Slow-
start
In order to compare the p erformance of the various algo-
rithms presented in section 4, we ran simulations of six algo-
rithms over an exhaustive range of queue-lengths in the b ot-
tleneck router. The six algorithms are Reno, Reno+SACK,
FACK, FACK with Overdamping, FACK with Ramp down,
and FACK with b oth Overdamping and Ramp down. In or-
der to compare the p erformance of the various algorithms in
a meaningful way,we computed the \lost opp ortunity" for
each run | the amount of additional data which could have
b een sent if the connection had run entirely in Congestion
Avoidance. Events which cause idle time on the link during
Slow-start, such as retransmit timeouts or deep reductions
in cw nd, result in higher \lost opp ortunity".
The results of this comparison are shown in gure 9. The
The network is provisioned with queues of length 6 packets. No data
upp er graph shows the \lost opp ortunity" for each algorithm
is unnecessarily retransmitted.
with a receiver whichacknowledges every segment (as used
in all of the examples in Section 4). The lower graph uses a
Figure 8: FACK b ehavior with (b ottom) and without (top)
17
receiver with Delayed ACK.
Ramp down. Overdamping is utilized in b oth cases.
In b oth graphs, the e ects of retransmit timeouts in Reno
are clearly visible at all queue sizes. Without Delayed ACK,
in half. In current TCP implementations, the sender stops
Reno loses b etween 300 kB and 500 kB of p otential data
transmitting data until enough data has left the network to
transfer capability during slowstart. With Delayed ACK,
reduce aw nd b elow the new value of cw nd. The sender then
this value increases to b etween 650 kB and 900 kB. All of
resumes transmission of data. This typically results in a full
the options presented for SACK congestion control p erform
window of data b eing transmitted in one half of a round trip
signi cantl y b etter than Reno in the cases presented here.
time, resulting in uneven transmission of data for this and
Without Delayed ACK, the FACK algorithm alone shows
subsequent round trips. Solutions to this problem have b een
p o or p erformance for a subset of the queue sizes examined.
16
suggested [Ho e95 , Jac95 ], but have not yet b een deployed.
In these cases, FACK is to o aggressive following Slow-start,
The recommended solution for this problem is to smo oth
and takes additional packet loss resulting in a retransmission
the transmission of data over one RTT by slowly reducing
timeout. Reno+SACK also shows lower p erformance across
cw nd, rather than instantly halving it. We implemented this
all queue sizes than the remaining three variations of FACK.
solution as follows:
This is the result of additional round trips caused byACK
At the time congestion is detected:
starvation immediately following Slow-start (see gure 3).
The twoversions of FACK which include the Overdamping
algorithm show p o orer p erformance at low queue lengths.
w intr im =(snd:nxt snd:f ack ) (1 w inmul t) (3)
The b est and most consistent p erformer is the FACK algo-
rithm with Ramp down alone.
Each time snd:f ack advances byf ack :
With Delayed ACK, the FACK and Reno+SACK cases
no longer exhibit the b ehaviors mentioned ab ove, b ecause
Slow-start do es not push the network as far into congestion.
w intr im = w intr im f ack (1 w inmul t) (4)
The e ects of Overdamping are even more pronounced, and
even at the largest queue sizes we tested, Overdamping is
Here, w intr im is added to cw nd during the \Ramp-
to o conservative compared with the other algorithms.
down" phase of congestion control. At the time recovery
b egins, cw nd + w intr im is slightly less than aw nd. After one
17
A Delayed ACK receiver sends ACKs less frequently, and at min-
imum, sends one ACK for every two MSS of data received. Delayed
16
We are aware of one research group working with a TCP imple-
ACK is used by almost all TCP implementations in the Internet.
mentation which includes a solution to this problem similar to ours
[Bal96]. s3 s1 Bulk Stream
10 Mb/s, 2ms 10 Mb/s, 33ms 1.5 Mb/s, r1 r2 5ms 10 Mb/s, 10 Mb/s, 2ms 3ms
Tiny
s2 Stream s4
Figure 10: The jitter test top ology
The receiver is not using Delayed ACK.
TCP forward path utilization as a function of the reverse path utiliza-
tion. Note that 7% load on the reverse path causes nearly 45% idle
capacity on the forward path. This example uses a 20 packet queue
length, which is more than sucient bu ering for the network.
Figure 11: Comparison of FACK, Reno, and Reno+SACK
6 Performance Comparisons
Wehaveinvestigated the b ehavior and p erformance of the
various congestion control algorithms under several scenar-
ios. One scenario, in which TCP is sub jected to delay jitter
and bursty losses, demonstrates some interesting di erences
between Reno, Reno+SACK, and FACK.
In the simulator, wehave b een able to investigate TCP's
b ehavior in this situation with a single, very low bandwidth
data stream in the reverse direction ( gure 10). The reverse
data stream is one connection with small, randomly dis-
tributed bursts of data at an average rate of two bursts p er
second. The bursts are of small constant size for each run,
ranging from 1 to 6 kB. This trac could b e, for example,
characteristic of a small NetNews stream or sp oradic e-mail.
In this environment, we ran each of the algorithms | Reno,
Reno+Sack and FACK | and compared their p erformance.
The receiver is using Delayed ACK.
Figure 11 shows the forward path p erformance versus the
reverse path load for each algorithm. Note that with only
Figure 9: Comparison of the b ehavior of various congestion
7% load on the reverse path, Reno leaves almost 50% idle
algorithms during Slow-start.
capacity on the forward path. This re ects the combined
e ects of ACK compression [ZSC91 ], drop-tail routers and
the high p enalty of retransmit timeouts. Note that this ex-
ample uses a 20 packet queue length, which is more than
sucient bu ering for this network.
In this trace we slightly reduced the bu ering from gure 11, to accentinteresting detail. All of the b ehaviors shown in this gure are presentin
one or more of the simulations used to generate gure 11.
Figure 12: Reno and Fack with jitter
...a fast retransmit followed by a retransmit 6.1 Reno vs. FACK
timeout, with the additional condition that the
Figure 12 shows detailed b ehavior of Reno and FACK in a
packet retransmitted after the retransmit time-
situation only slightly di erent than in gure 11. The tiny
out had not b een previously retransmitted...
reverse trac causes ACK compression and comp etes for
[FF96 ]
router bu er space, which, in turn, causes clusters of packet
loss in the bulk stream.
It is most likely that this b ehavior is the result of minor
In resp onse to these clusters of loss, Reno b ehavior ap-
congestion episo des which cause multiple packet loss in one
p ears chaotic, showing multiple window adjustments in a
round trip. Note that b ecause only Reno TCP implemen-
single congestion episo de and timeouts due to loss of its
tations exhibit this particular b ehavior, the prevalence of
Self-clo ck.
multiple packet loss within one round trip may b e signi -
The b ottom of gure 12 shows FACK (with Overdamping
cantly more common than suggested by this data.
and Ramp down) in exactly the same situation. Even though
On our networks at PSC (a national sup ercomputing cen-
many congestion ep o chs exp erience clusters of loss, FACK
ter with high bandwidth connectivity to the global Internet),
correctly p erforms exactly one multiplica tive decrease of
the b ehavior shown in gure 12 app ears regularly for bulk
19
cw nd p er congestion ep o ch, preserves the TCP Self-clo ck,
data transfers over mo derately loaded wide area links. The
18
and avoids all timeouts. In this regime FACK app ears to
deploymentofanyversion of SACK should nearly double the
b e a stable, well-b ehaved control system, consistent with the
throughput of bulk transfers using TCP for these cases. In
principles of ideal congestion control.
addition, we b elieveSACK TCP will b e less biased against
ATM than Reno TCP.For more typical Internet transfers,
the b ene ts of SACK will likely b e more mo derate, but still
6.2 Impact to the Internet
result in overall improvements to b oth latency and go o dput.
In the Internet, anecdotal evidence suggests that episo des of
19
Over a xed path, Reno's p erformance can b e improved by de-
multiple packet loss in one round trip are common. Paxson
feating TCP's cw nd calculation by setting the maximum window size
observes the following b ehavior in roughly 13% of the traces
to just slightly smaller than needed to ll the network.
he collected at ma jor Internet exchange p oints:
18
Reno+SACK p erforms as well as FACK in this situation.
7 Future Work transmission smo othing. In our investigatio ns, wehave dis-
covered that b oth FACK and Reno+SACK provide ma jor
We are currently working on an implementation of SACK
p erformance improvements over existing Reno implementa-
20
TCP which will include FACK. Once implemented, FACK
tions, due primarily to the avoidance of retransmission time-
should b e evaluated in b oth a testb ed environment and in
outs. Eventually, Reno users will p erceiveSACK implemen-
the Internet, to verify the p erformance of the algorithms
tations as having a signi cant advantage; this will provide
and to lo ok for any adverse side e ects. These investigations
incentive for the rapid widespread deploymentofSACK in
should also explore the data recovery asp ects of SACK.
the Internet.
There are several unresolved issues surrounding the al-
The FACK algorithm has several b ene ts over
gorithms presented in this pap er. We are investigatin g a
Reno+SACK. Since FACK more accurately controls the
single, simple algorithm to replace the Overdamping and
outstanding data in the network, it is less bursty than
Ramp down, as well as several metho ds for addressing p er-
Reno+SACK, and can recover from episo des of heavy loss
sistent congestion (when halving is not a sucient window
b etter than Reno+SACK. Because FACK uniformly adheres
reduction). Wehave b een mo derately successful at deriving
to basic principles of congestion control, it may b e p ossi-
closed-form mathematical mo dels for FACK TCP p erfor-
ble to pro duce formal mathematical mo dels of its b ehavior
mance in some top ologies and b elieve that this technique
and to supp ort further advances in congestion control the-
deserves further exploration.
ory.Furthermore, based on our exp erience in implementing
The new state variable snd:f ack might also b e used to
FACK in the simulator, it is more straightforward to co de
strengthen Round Trip Time Measurements (RTTM) and
and less prone to subtle bugs than Reno+SACK.
Protection Against Wrapp ed Sequence (PAWS) algorithms
For the additional algorithms presented, Overdamping
[JBB92] during recovery.
and Ramp down, we obtained mixed success. The Over-
The FACK algorithm was rst implemented in TReno,
damping algorithm is to o conservative in the general case.
an Internet p erformance metric [Mat96]. To ols to measure
The Ramp down algorithm, however, app ears to work quite
Internet p erformance should track the evolution of TCP
well. Based on the results in this pap er, future work should
[Mat].
explore variations on the Ramp down algorithm which incor-
The pro duction Internet still lacks adequate attention to
p orate the ideas included in the Overdamping algorithm.
issues of congestion and congestion detection. Many routers
Finally,we had diculties developing realistic simula-
are incapable of providing full bandwidthdelay bu ering
tions of the Internet's observed clustered packet loss. Cur-
and do not signal the onset of congestion through mecha-
rent simulation technologies do not accurately mo del the
nisms such as Random Early Detection (RED) [FJ93]. Al-
Internet with its vast complexity and huge p opulations of
21
though the FACK algorithm is designed to help in times of
users, hosts, connections and packets. This limitation
congestion, it is not a substitute for these signals at the In-
makes it dicult to predict the op erational impact of de-
ternet layer. The transp ort and internet layers must work
ploying new proto cols in the Internet. Limited simulations
together to improve the b ehavior of the Internet under high
and trac playback approaches are not likely to reveal phe-
load.
nomena resembling turbulent coupling b etween proto cols.
Other current researchinto TCP congestion is largely
We hop e to investigate new simulation paradigms in the fu-
indep endentofFACK. The Congestion Avoidance Mech-
ture.
anism (CAM) of TCP Vegas [BOP94, DLY95] attempts
to avoid unnecessary in ation of the congestion window
9 Acknowledgements
through delay sensing techniques. Ho e has done extensive
work in analyzing the e ects of congestion during Slow-start
Wewould like to thank Sally Floyd and Steve McCanne
[Ho e95, Ho e96 ], where there can b e signi cant p erformance
for making the LBNL simulator publicly available, without
problems. The implementation of SACK and/or FACK may
whichwewould have b een unable to complete this work. We
reduce the gravity of these problems, but will not eliminate
are esp ecially grateful to the ve anonymous reviewers for
them. Both of these e orts address di erent asp ects of the
their insightful comments on our initial draft of this work,
TCP congestion control problem. Ho e also discusses a form
as well as to Sally Floyd and Craig Partridge for their in-
of Ramp down, whichwas the inspiration for this part of our
valuable assistance in moving it to nal form. Wewould like
work. It should b e p ossible to incorp orate all of these con-
to thank Susan Blackman and Karen Fabrizius for rep eated
cepts in a single TCP implementation , allowing for study of
readings and markups on our grammar and sp elling . Finally,
their combined b ene ts.
wewould liketoacknowledge our management at PSC for
Finally, applications which do not use TCP are b ecoming
encouraging our research activities on TCP p erformance.
more prevalent in the Internet, and many of these applica-
tions pay little or no attention to congestion control issues.
The more predictable b ehavior and b etter understanding of
TCP congestion control may b e a step toward a standardized
transp ort layer congestion b ehavior for use by all Internet
application s.
8 Conclusion
In this pap er, wehave presented the FACK algorithm for
congestion control, the Overdamping algorithm to o set
Slow-start oversho ot, and the Ramp down algorithm for
20
This implementation will b e made publicly available when
21
completed.
In our exp eriments, we did not take advantage of the capabilities
of tcplib [DJ91], which mo dels some of these complexities.
[Jac88] Van Jacobson. Congestion Avoidance and Con- References
trol. Proceedings of ACM SIGCOMM '88, Au-
[Bal96] Hari Balakrishna n, March 1996. Presentation
gust 1988.
to the IETF TCP-LWworking group.
[Jac90] Van L. Jacobson. Fast Retransmit. Message to
the end2end-interest mailing list, April 1990.
[BOP94] Lawrence S. Brakmo, Sean W. O'Malley, and
Larry L. Peterson. TCP Vegas: New Techniques
[Jac95] Van Jacobson, July 1995. Private Communica-
for COngestion Detection and Avoidance. Pro-
tion.
ceedings of ACM SIGCOMM '94, August 1994.
[JB88] V. Jacobson and R. Braden. TCP extensions
[Bra89] R. Braden. Requirements for Internet Hosts {
for long-delay paths, Octob er 1988. Request for
Communication Layers, Octob er 1989. Request
Comments 1072.
for Comments 1122.
[JBB92] V. Jacobson, R. Braden, and D. Borman. TCP
[CLZ87] D. D. Clark, M. L. Lamb ert, and L. Zhang.
Extensions for High Performance, May 1992.
NETBLT: a high throughput transp ort pro-
Request for Comments 1323.
to col. Computer Communications Review,
[Kar95] Phil Karn, Decemb er 1995. Private Communi-
17(5):353{359, 1987.
cation.
[DJ91] Peter B. Danzig and Sugih Jamin. tc-
[Mat] Matthew Mathis. Internet Performance
plib: A library of TCP/IP trac char-
and IP Provider Metrics information page.
acteristics. Technical Rep ort TR-SYS-91-
http://www.psc.edu/~mathis/ipp m/.
01, USC Networking and Distributed Sys-
[Mat94] Matthew B. Mathis. Windowed Ping: An IP
tems Lab oratory, Octob er 1991. Obtain via:
Layer Performance Diagnostic. In Proceedings
ftp://catarina.usc.edu/pub /jami n/tcpl i b.
of INET'94/JENC5,volume 2, Prague, Czech
[DLY95] Peter B. Danzig, Zhen Liu, and Limim Yan. An
Republic, June 1994.
Evaluation of TCP Vegas by LiveEmulation.
[Mat95] Matthew Mathis. Source co de for the TReno
ACM SIGMetrics '95, 1995.
package, 1995. Obtain via:
to ols/treno.shar. ftp://ftp.psc.edu/pub/net [FF96] Kevin Fall and Sally Floyd. Compar-
isons of Taho e, Reno and Sack TCP, May
[Mat96] Matthew Mathis. Diagnosin g Internet Conges-
1996. Submitted to CCR, Obtain via
tion with a Transp ort Layer Performance To ol.
v2.ps.Z. ftp://ftp.ee.lbl.gov/pap ers/sa cks
In Proceedings of INET'96, Montreal, Queb ec,
June 1996.
[FJ91] Sally Floyd and Van Jacobson. Trac Phase
E ects in Packet-Switched Gateways. Computer
[MF] S. McCanne and S. Floyd. ns{LBNL Net-
Communications Review, 21(2), April 1991.
work Simulator. Obtain via: http://www{
nrg.ee.lbl.gov/ns/.
[FJ92] Sally Floyd and Van Jacobson. On Trac Phase
E ects in Packet-Switched Gateways. Inter-
[MMFR96] Matthew Mathis, Jamshid Mahdavi, Sally
networking: Research and Experience, 3(3):115{
Floyd, and Allyn Romanow. TCP Selective Ac-
156, Septemb er 1992.
knowledgement Options, May 1996. Internet
Draft (\work in progress") draft-ietf-tcplw-sack-
[FJ93] Sally Floyd and Van Jacobson. Random Early
02.txt, Expires: 29/7/96.
Detection Gateways for Congestion Avoidance.
[Mog92] Je C. Mogul. Observing TCP Dynamics in
IEEE/ACM Transactions on Networking, Au-
Real Networks. Proceedings of ACM SIGCOMM
gust 1993.
'92, pages 305{317, Octob er 1992.
[Flo92] Sally Floyd, February 1992. Private Communi-
[Pos81] J. Postel. Transmission Control Proto col,
cation.
Septemb er 1981. Request for Comments 793.
[Flo95] Sally Floyd. TCP and Successive Fast
[Ste94] W. Stevens. TCP/IP Il lustrated,volume 1.
Retransmits, February 1995. Obtain via
Addison-Wesley, Reading MA, 1994.
ftp://ftp.ee.lbl.gov/pap ers/fastretra ns.ps.
[Ste96] W. Richard Stevens. TCP Slow Start, Conges-
[Ho e95] Janey C. Ho e. Startup Dynamics of TCP's Con-
tion Avoidance, Fast Retransmit, and Fast Re-
gestion Control and Avoidance Schemes. Mas-
covery Algorithms, March 1996. Currently an
ter's thesis, Massachusetts Institute of Technol-
Internet Draft: draft-stevens-tcp ca-sp ec-01.txt.
ogy, June 1995.
[tcp95] Minutes of the tcp x meeting at the 34th IETF,
[Ho e96] Janey C. Ho e. Improving the Start-up Behavior
in Dallas TX, Decemb er 1995. Obtain via:
of a Congestion Control Scheme for TCP. Pro-
http://www.ietf.cnri.reston.va.us/pro ceedin gs/
ceedings of ACM SIGCOMM '96, August 1996.
95dec/tsv/tcplw.html.
[ipp96] Charter of the Benchmarking Working Group [ZSC91] Lixia Zhang, Scott Shenker, and David D.
(BMWG) of the IETF, 1996. Obtain via:
Clark. Observations on the Dynamics of a Con-
http://www.ietf.cnri.reston.va.us/html.charters/ gestion Control Algorithm: The E ects of Two-
bmwg{charter.html.
WayTrac. Proceedings of ACM SIGCOMM
'91, pages 133{148, 1991.