Internet Routing Instability
Craig Lab ovitz, G. Rob ert Malan, and Farnam Jahanian
UniversityofMichigan
Department of Electrical Engineering and Computer Science
1301 Beal Ave.
Ann Arb or, Michigan 48109-2122
flab ovit, rmalan, [email protected]
network reachability and top ology information, has a num- Abstract
b er of origins including router con guration errors, transient
This pap er examines the network inter-domain routing in-
physical and data link problems, and software bugs. Insta-
formation exchanged b etween backb one service providers at
bility, also referred to as \route aps", signi cantl y con-
the ma jor U.S. public Internet exchange p oints. Internet
tributes to p o or end-to-end network p erformance and de-
routing instabili ty, or the rapid uctuation of network reach-
grades the overall eciency of the Internet infrastructure.
ability information, is an imp ortant problem currently fac-
All of these sources of network instabili ty result in a large
ing the Internet engineering community. High levels of net-
numb er of routing up dates that are passed to the core Inter-
work instabili ty can lead to packet loss, increased network
net exchange p oint routers. Network instabili ty can spread
latency and time to convergence. At the extreme, high lev-
from router to router and propagate throughout the net-
els of routing instabilityhave lead to the loss of internal
work. At the extreme, route aps have led to the transient
connectivity in wide-area, national networks. In this pap er,
loss of connectivity for large p ortions of the Internet. Over-
we describ e several unexp ected trends in routing instabili ty,
all, instability has three primary e ects: increased packet
and examine a numb er of anomalies and pathologies ob-
loss, delays in the time for network convergence, and addi-
served in the exchange of inter-domain routing information.
tional resource overheard (memory, CPU, etc.) within the
The analysis in this pap er is based on data collected from
Internet infrastructure.
BGP routing messages generated by b order routers at ve
The Internet is comprised of a large number of intercon-
of the Internet core's public exchange p oints during a nine
nected regional and national backb ones. The large public
month p erio d. We show that the volume of these routing up-
exchange p oints are often considered the \core" of the In-
dates is several orders of magnitude more than exp ected and
ternet, where backb one service providers peer, or exchange
that the ma jority of this routing information is redundant,
trac and routing information with one another. Backb one
or pathological . Furthermore, our analysis reveals several
service providers participatin g in the Internet core must
unexp ected trends and ill-b ehaved systematic prop erties in
maintain a complete map, or default-free routing table, of all
Internet routing. We nally p osit a numb er of explanations
globally visible network-layer addresses reachable through-
for these anomalies and evaluate their p otential impact on
out the Internet.
the Internet infrastructure.
The Internet is divided into a large numb er of di er-
ent regions of administrative control commonly called au-
tonomous systems. These autonomous systems (AS) usually
1 Intro duction
have distinct routing p olicies and connect to one or more
remote autonomous systems at private or public exchange
Since the end of the NSFNet backb one in April of 1995, the
points. Autonomous systems are traditionall y comp osed of
Internet has seen explosive growth in b oth size and top olog-
network service providers or large organizational units like
ical complexity. This growth has placed severe strain on the
college campuses and corp orate networks. At the b oundary
commercial Internet infrastructure. Regular network p er-
of each autonomous system, p eer b order routers exchange
formance degradations stemming from bandwidth shortages
reachability information to destination IP address blo cks [2],
and a lack of router switching capacity,have lead the p op-
or pre xes, for b oth transit networks, and networks origi-
ular press to decry the imminent death of the Internet [13].
nating in that routing domain. Most autonomous systems
Routing instability, informally de ned as the rapid change of
exchange routing information through the Border Gateway
Supp orted by National Science Foundation Grant NCR-9321060
Proto col (BGP ) [12 ].
and a generous gift from the Intel Corp oration.
Unlikeinterior gateway proto cols, such as IGRP and
This pap er app ears in the Pro ceedings of the ACM SIGCOMM '97.
OSPF, that p erio dical ly o o d an intra-domain network with
c
Copyright 1997 by the Asso ciation for Computing Machinery, Inc.
all known routing table entries, BGP is an incremental pro-
Permission to make digital or hard copies of part or all of this work
for p ersonal or classro om use is granted without fee provided that
to col that sends up date information only up on changes in
copies are not made or distributed for pro t or direct commercial ad-
network top ology or routing p olicy. Moreover, BGP uses
vantage and that copies b ear this notice and the full citation on the
TCP as its underlying transp ort mechanism in contrast to
rst page. Copyrights for comp onents of this work owned by others
manyinterior proto cols that build their own reliabili tyon
than ACM must b e honored. Abstracting with credit is p ermitted.
To copy otherwise, to republish, to p ost on servers, or to redistribute
top of a datagram service. As a path vector routing pro-
to lists, requires prior sp eci c p ermission and/or a fee. Request p er-
to col, BGP limits the distribution of a router's reachability
missions from Publications Dept., ACM Inc., fax +1 (212) 869-0481,
information to its peer, or neighb or routers. A path is a se-
or (p [email protected]). 1
quence of intermediate autonomous systems b etween source Routing information is dominated by pathological , or
and destination routers that form a directed route for pack- redundant up dates, whichmay not re ect changes in
ets to travel. Router con guration les allow the stipulation routing p olicy or top ology.
of routing policies that may sp ecify the ltering of sp eci c
Instability and redundant up dates exhibit a sp eci c
routes, or the mo di cation of path attributes sent to neigh-
p erio dici ty of 30 and 60 seconds.
b or routers. Routers may b e con gured to make p olicy deci-
sions based on b oth the announcement of routes from p eers
Instability and redundant up dates show a surprising
and their accompanying attributes. These attributes, such
correlation to network usage and exhibit corresp onding
as Multi Exit Descriptor (MED), may serveashints to help
daily and weekly cyclic trends.
routers chose b etween alternate paths to a given destination.
Backb one b order routers at public exchange p oints com-
Instability is not dominated by a small set of autono-
monly have thirty or more external, or inter-domain, p eers,
mous systems or routes.
as well as a large number of intra-domain p eering sessions
Instability and redundant up dates exhibit b oth strong
with internal backb one routers. After each router makes a
high and low frequency comp onents. Much of the high
new lo cal decision on the b est route to a destination, it will
frequency instabili ty is pathological.
send that route, or path information along with accompa-
nying distance metrics and path attributes, to each of its
Discounting p olicy uctuation and pathological b ehav-
p eers. As this reachability information travels through the
ior, there remains a signi cant level of Internet for-
network, each router along the path app ends its unique AS
warding instability.
numb er to a list in the BGP message. This list is the route's
ASPATH. An ASPATH in conjunction with a pre x provide
This work has led to sp eci c architectural and pro-
a sp eci c handle for a one-way transit route through the
to col implementation changes in commercial Internet
network.
routers through our collab orati on with vendors.
Routing information shared b etween p eers in BGP has
two forms: announcements and withdrawals. A route an-
The remainder of this pap er is organized as follows: Sec-
nouncement indicates a router has either learned of a new
tion 2 describ es the infrastructure used to collect the rout-
network attachment or has made a p olicy decision to prefer
ing stability data analyzed in this pap er. Section 3 provides
another route to a network destination. Route withdrawals
further background on Internet routing and related work.
are sent when a router makes a new lo cal decision that a net-
Section 4 describ es a numb er of anomalies and pathologies
work is no longer reachable. We distinguish b etween explicit
observed in BGP routing information. It de nes a taxon-
and implicit withdrawls. Explicit withdrawls are those asso-
omy for discussing the di erent categories of BGP up date
ciated with a withdrawl message; whereas an implicit with-
information, and p osits a numb er of plausible explanations
drawl o ccurs when an existing route is replaced by the an-
for the anomalous routing b ehavior. Section 5 describ es key
nouncement of a new route to the destination pre x without
trends and characteristics of forwarding instability. Finally,
an intervening withdrawl message. A BGP update may con-
the pap er concludes with a discussion on the p ossible im-
tain multiple route announcements and withdrawals. In an
pact of di erent categories of instabili ty on the p erformance
optimal, stable wide-area network, routers only should gen-
of the Internet infrastructure.
erate routing up dates for relatively infrequent p olicy changes
and the addition of new physical networks.
2 Metho dology
In this pap er, we measured the BGP up dates generated
by service provider backb one routers at the ma jor U.S. pub-
Our analysis in this pap er is based on data collected from the
lic exchange p oints. Our exp erimental instrumentation of
exp erimental instrumentation of key p ortions of the Internet
these exchanges p oints has provided signi cant data ab out
infrastructure. Over the course of nine months, we logged
the internal routing b ehavior of the core Internet. This data
BGP routing messages exchanged with the Routing Arbiter
re ects the stabilityofinter-domain Internet routing, or
pro ject's route servers at ve of the ma jor U.S. network ex-
changes in top ology or p olicy b etween autonomous systems.
change p oints: AADS, Mae-East, Mae-West, PacBell, and
Intra-domain routing instabili ty is not explicitl y measured,
Sprint. At these geographicall y diverse exchange p oints,
and is only indirectly observed through BGP information
network service providers p eer by exchanging b oth trac
exchanged with a domain's p eer. We distinguish b etween
and routing information. The largest public exchange, Mae-
three typ es of inter-domain routing up dates: forwarding in-
East lo cated near Washington D.C., currently hosts over 60
stability may re ect legitimate top ological changes and af-
service providers, includin g ANS, BBN, MCI, Sprint, and
fects the paths on which data will b e forwarded b etween au-
UUNet. Figure 1 shows the lo cation of each exchange p oint,
tonomous systems; routing policy uctuation re ects changes
and the numb er of service providers p eering with the route
in routing p olicy information that may not a ect forwarding
servers at each exchange.
paths b etween autonomous systems; and pathological up-
Although the route servers do not forward network traf-
dates are redundant BGP information that re ect neither
c, they do p eer with the ma jority(over 90 p ercent) of the
routing nor forwarding instability.We de ne instability as
service providers at each exchange p oint. The route servers
an instance of either forwarding instability or p olicy uctua-
provide aggregate route server BGP information to a num-
tion. Although some of the preliminary results of our study
b er of client p eers. Unlike the sp ecialized routing hardware
have b een rep orted at recent NANOG, IETF, and IEPG
used by most service providers, the route servers are Unix-
meetings, this pap er is the rst detailed written rep ort of
based systems which provide a unique platform for exchange
our ndings. The ma jor results of our work include:
p oint statistics collection and monitoring.
The Routing Arbiter pro ject has amassed 12 gigabytes
The numb er of BGP up dates exchanged p er dayinthe
of compressed data since January of 1996. In January 1997,
Internet core is one or more orders of magnitude larger
the op erational phase of the Routing Arbiter pro ject ended.
than exp ected. 2
memory problems at heavy levels of routing instability. Many
of the commonly deployed Internet routers are based on the
older Motorola 68000 series pro cessor. Under stable network
conditions, these low-end pro cessors are sucient for most
of the router's computational needs since the bulk of the
router's activity happ ens directly on the forwarding hard-
ware, leaving the pro cessor to handle the pro cessing of BGP
and interior gateway proto col (IGP ) messages. But heavy
instabili ty places larger demands on a router's CPU and
may frequently lead to problems in memory consumption
and queuing delay of packet pro cessing. Frequently, the de-
lays in pro cessing are so severe that routers delay routing
Keep-Alive packets and are subsequently agged as down,
Figure 1: Map of ma jor U.S. Internet exchange p oints.
or unreachable by other routers. Wehave deterministicall y
repro duced this e ect under lab oratory conditions with only
mo derate levels of route uctuation. These exp eriments are
Data collection and analysis has continued under the aus-
corrob orated by the exp erience of router vendors and ISP
pices of the Internet Performance Measurement and Analy-
backb one engineers.
sis (IPMA) pro ject [8]. We use several to ols from the Mul-
Exp erience with the NSFNet and wide-area backb ones
tithreaded Routing To olkit (MRT) to olkit [9] to deco de and
has demonstrated that a router which fails under heavy
analyze the BGP packet logs from the route server p eering
routing instabili ty can instigate a \route ap storm." In
sessions. Although we analyze data from all of the ma jor
this mo de of pathological oscillation , overloaded routers are
exchange p oints, we simplify the discussion in much of this
marked as unreachable by BGP p eers as they fail to main-
pap er by concentrating on the logs of the largest exchange,
tain the required interval of Keep-Alive transmissions. As
Mae-East. We analyze the BGP data in an attempt to char-
routers are marked as unreachable, p eer routers cho ose al-
acterize and understand b oth the origins and op erational
ternative paths for destinations previously reachable through
impact of routing instabili ty.For the purp oses of data ver-
the \down" router and will transmit up dates re ecting the
i cation, wehave also analyzed sample BGP backb one logs
1
change in top ology to each of their p eers. In turn, after re-
from a numb er of large service providers .
covering from transient CPU problems, the \down" router
Increasingly, ma jor Internet service providers (ISP) are
will attempt to re-initiate a BGP p eering session with each
utilizing private p eering p oints for the exchange of inter-
of its p eer routers, generating large state dump transmis-
domain trac. However, this role was not signi cant during
sions. This increased load will cause yet more routers to
the data collection p erio d represented by the analysis in this
fail and initiate a storm that b egins a ecting ever larger
work. A greater level of co op eration with the ma jor ISPs
sections of the Internet. Several route ap storms in the
will b e needed in the future for continued measurementof
past year have caused extended outages for several million
Internet routing instabili ty.
network customers. The latest generation of routers from
several vendors (includin g Cisco Systems and Ascend Com-
3 Background
munications) provide a mechanism in which BGP trac is
given a higher priority and Keep-Alive messages p ersist even
The uctuation of network top ology can have a direct im-
under heavy instability.
pact on end-to-end p erformance. A network that has not
Instability is not unique to the Internet. Rather, insta-
yet reached convergence may drop packets, or deliver pack-
bilityischaracteristic of any dynamically adaptive routing
ets out of order. In addition, through analysis of our data
system. Routing instabili ty hasanumb er of p ossible ori-
and ongoing discussions with router vendors, wehave found
gins, including problems with leased lines, router failures,
that a signi cantnumb er of the core Internet routers to day
high levels of congestion and software con guration errors.
are based on a route caching architecture [11]. In this archi-
After one or more of these problems a ects the availabil ity
tecture, routers maintain a routing table cache of destina-
of a path to a set of pre x destinations, the routers top ologi-
tion and next-hop lo okups. As long as the router's interface
cally closest to the failure will detect the fault, withdraw the
card nds a cache entry for an incoming packet's destination
route and make a new lo cal decision on the preferred alter-
addresses, the packet is switched on a \fast-path" indep en-
native route, if any, to the set of destinations. These routers
dently of the router's CPU. Under sustained levels of routing
will then propagate the new top ological information to each
instabili ty, the cache undergo es frequent up dates and the
router within the autonomous system. The network's b or-
probabili ty of a packet encountering a cache miss increases.
der routers will in turn propagate the up dated information
A large numb er of cache misses results in increased load on
to each external p eer router, p ending lo cal p olicy decisions.
the CPU, increased switching latency and the loss of packets.
Routing p olicies on an autonomous system's b order routers
Anumb er of researchers are currently studying the e ects
may result in di erent up date information b eing transmit-
of loss and out-of-order delivery on TCP and UDP-based
ted to each external p eer.
application s [23]. A number of vendors have develop ed a
The ASPATH attribute present in each BGP announce-
new generation of routers that do not require caching and
ment allows routers to detect, and prevent forwarding loops.
are able to maintain the full routing table in memory on the
We de ne a forwarding lo op as a steady-state cyclic trans-
forwarding hardware. Initial empirical observations suggest
mission of user data b etween a set of p eers. As describ ed ear-
these routers do not exhibit the same pathological loss under
lier, up on receipt of an up date every BGP router p erforms
heavy routing up date load [11].
lo op veri cation by testing if its own autonomous system
Internet routers may exp erience severe CPU load and
numb er already exists in the ASPATH of an incoming up-
1
date. Until recently, many backb one engineers b elieved that
Additional data was supplied byVerio, Inc., ANS CO+RE Sys-
tems, and the statewide networking division of Merit Network, Inc.
the ASPATH mechanism in BGP was sucient to ensure 3
network convergence. A recent study,however, has shown ing increasingly less hierarchical with the rapid addition of
that under certain unconstrained routing p olicies, BGP may new exchange p oints and p eering relationship s [6]. As the
not converge and will sustain p ersistent route oscillations top ological complexity grows, the qualityofInternet address
[21]. aggregation will likely decrease, and the p otential for insta-
Anumb er of solutions have b een prop osed to address the bility will increase as the numb er of globally visible routes
problem of routing instabili ty, including the deploymentof expands. Since commercial and mission critical applications
route damp ening algorithms and the increased use of route are increasingly migrating towards using the Internet as a
aggregation [18, 19, 2]. Aggregation,or supernetting, com- communication medium, it is imp ortant to understand and
bines a numb er of smaller IP pre xes into a single, less sp e- characterize routing instability for proto col design and sys-
ci c route announcement. Aggregation is a p owerful to ol to tem architecture evolution.
combat instability b ecause it can reduce the overall num- The b ehavior and dynamics of Internet routing stability
berofnetworks visible in the core Internet. Aggregation have gone virtually without formal study, with the exception
also hides, or abstracts, information ab out individu al com- of Govindan and Reddy [6], Paxson [17] and Chinoy [3].
p onents of a service provider's networks at the edges of the Chinoy measured the instabili ty of the NSFNet backb one
backb one. Aggregation is successful when there is co op er- in 1993. Unlike the current commercial Internet, the now
ation b etween service providers and well-planned network decommissioned NSFNet had a relatively simple top ology
addressing. Unfortunately, the increasingly comp etitive In- and homogeneous routing technology. Chinoy's analysis did
ternet is sometimes lacking b oth. not fo cus on any of the pathological b ehaviors or trends we
Comp ounding the problem, a rapidly increasing number describ e in this pap er [3].
of end-sites are cho osing to obtain redundant connectivity Paxson studied routing stability from the standp ointof
to the Internet via multiple service providers. This redun- end-to-end p erformance [17]. We approach the analysis from
dant connectivity,ormulti-homing, requires that each core a complimentary direction { by analyzing the internal rout-
Internet router maintain a more sp eci c, or longer, pre x in ing information that will give rise to end-to-end paths. The
addition to any less sp eci c aggregate address blo ck pre xes analysis of this pap er is based on data collected at Internet
covering the multi-homed site. routing exchange p oints. Govidian examined similar data,
Our study shows that more than 25 p ercent of pre xes but fo cused primarily on gross top ological characterizations,
are currently multi-homed and non-aggregatable. Further, such as the growth and top ological rate of change of the In-
we nd that the prevalence of multi-homing exhibits a rel- ternet [6].
atively steep linear rate of growth. This result is consistent
with some of the recent ndings of Govindan and Reddy [6].
4 Analysis of Pathological Routing Information
Route servers provide an additional to ol to help back-
b one op erators cop e with the high levels of Internet routing
In this section, we rst discuss the exp ected b ehavior of
instabili ty. Each router at an exchange p oint normally must
awell-b ehaved inter-domain routing system. We then de-
exchange routing information with every other p eer router.
scrib e the observed b ehavior of Internet routing, and de ne a
2
This requires O (N ) bilateral p eering sessions, where N is
taxonomy for discussing the di erent classi cation s of rout-
the numb er of p eers. Although route servers do not help
ing information. We will demonstrate that much of the b e-
limit the o o d of instability information, they do help ooad
havior of inter-domain routing is pathological and suggests
computationall y complex p eering from individu al routers
widespread, systematic problems in p ortions of the Inter-
onto a centralized route server. This server maintains p eer-
net infrastructure. We distinguish b etween three classes of
ing sessions with each exchange p oint router and p erforms
routing information: forwarding instabili ty, p olicy uctua-
routing table p olicy computations on b ehalf of each client
tion, and pathologic (or redundant) up dates. In this section
p eer. The route server transmits a summary of p ost-p olicy
we fo cus on the characterization of pathological routing in-
routing table changes to each client p eer. Each p eer router
formation. In Section 5, we will discuss long-term trends
then needs only to maintain a single p eering session with
and temp oral b ehavior of b oth forwarding instabili ty and
the route server, reducing the numb er of p eering sessions to
p olicy uctuation.
O (N ).
Although the default-free Internet routing tables cur-
Anumberofvendors have also implemented route damp-
rently contain approximately 45,000 pre xes [8], our study
ening [22] algorithms in their routers. These algorithms
has shown that routers in the Internet core currently ex-
\hold-down", or refuse to b elieve, up dates ab out routes that
change b etween three and six million routing pre x up dates
exceed certain parameters of instability, such as exceeding a
eachday.Onaverage, this accounts for 125 up dates per net-
certain numb er of up dates in an hour. A router will not
work on the Internet every day. More signi cantl y,wehave
pro cess additional up dates for a damp ened route until a
found that the ow of routing up date information tends to
preset p erio d of time has exp erienced. Route damp ening
b e extremely bursty.At times, core Internet routers receive
algorithms, however, are not a panacea. Damp ening algo-
bursts of up dates at a rates exceeding several hundred pre-
rithms can intro duce arti cial connectivity problems, as \le-
x announcements a second. Our data shows that on at
gitimate" announcements ab out a new network may b e de-
least one o ccasion, the total numb er of up dates exchanged
2
layed due to earlier damp ened instabili ty.Anumb er of ISPs
at the Internet core has exceeded 30 million p er day . This
have implemented a more draconian version of enforcing sta-
aggregate rate of instabili ty can place a substantial load on
bilityby either ltering all route announcements longer than
recipient routers as each route may b e matched against a p o-
a given pre x length or refusing to p eer with small service
tentially extensive list of p olicy lters and op erators. The
providers.
current high level of Internet instabili ty is a signi cant prob-
Overall, our research has shown that the Internet con-
lem for all but the most high-end of commercial routers.
tinues to exhibit high levels of routing instabili ty despite
And even high-end routers may exp erience increasing levels
the increased emphasis on aggregation and the aggressive
2
Our data collection infrastructure failed for the day after record-
deployment of route damp ening technology. Further, re-
ing 30 million up dates in a six hour p erio d. The numb er of up dates
cent studies have shown that the Internet top ology is grow-
that daymay actually have b een much higher. 4
of packet loss, delay, and time to reach convergence as the circuits or routers to the scop e of a single autonomous sys-
rate of instabili ty increases. tem.
In this pap er, we analyze sequences of BGP up dates for Unfortunately, p ortions of the Internet address space are
each (pre x, p eer) tuple over the duration of our nine month not well-aggregated and contain considerably more routes
study.Aswe describ e later, the ma jority of BGP up dates than theoretically necessary. Although aggregation of a sin-
from a p eer for a given pre x exhibit a high lo cality of refer- gle site, or campus-level network is relatively straightfor-
ence, usually o ccurring within several minutes of each other. ward, aggregation at a larger scale, includin g across multi-
In these sequences of up dates for a given (pre x, p eer) tuple, ple backb one providers, is considerably more dicult and
we identify vetyp es of successiveevents: requires close co op eration b etween service providers.
Perhaps the largest factor contributing to p o or aggrega-
WADi : A route is explicitly withdrawn as it b ecomes un-
tion is the increasing trend towards multi-homing of cus-
reachable and it is later replaced with an alternative
tomer end-sites [6]. Since the multi-homed customer pre-
route to the same destination. The alternative route
xes require global visibil ity, it is problematic for these ad-
di ers in its ASPATH or nexthop attribute informa-
dresses to b e aggregated into larger sup ernets. In addition,
tion. This is a typ e of forwarding instabili ty.
the lack of hierarchical allo cation of the early, pre-CIDR IP
address space exacerbates the current poor level of aggrega-
AADi : A route is implici tly withdrawn and replaced by
tion. Prior to the intro duction of RFC-1338, most customer
an alternative route as the original route b ecomes un-
sites obtained address space directly from the Internic in-
reachable, or a preferred alternative path b ecomes avail-
stead of from their provider's CIDR blo ck. Similarl y, the
able. This is a typ e of forwarding instabili ty.
technical diculties and asso ciated reluctance of customer
networks to renumb er IP addresses when selecting a new
WADup: A route is explicitl y withdrawn and then rean-
service provider contribute to the numb er of unaggregated
nounced as reachable. This may re ect transient top o-
addresses.
logical (link or router) failure, or it may representa
The sub optimal aggregation of Internet address space
pathological oscillation . This is generated by either
has resulted in large numb er of globally visible addresses.
forwarding instability or pathological b ehavior.
More signi cantl y, many of these globally visible pre xes are
reachable via one or more paths. Wewould exp ect Internet
AADup: A route is implicitl y withdrawn and replaced with
instabili ty to b e prop ortional to the total number of avail-
a duplicate of the original route. We de ne a duplicate
able paths to all of the globally visible network addresses or
route as a subsequent route announcement that do es
aggregates. Analysis of our exp erimentall y collected BGP
not di er in the nexthop or ASPATH attribute infor-
data has revealed signi cantl y more BGP up dates than we
mation. This may re ect pathological b ehavior as a
originall y anticipated. The Internet \default-free" routing
router should only send a BGP up date for a change in
tables currently contain approximately 45,000 pre xes with
top ology or p olicy. Since our initial study only exam-
1,500 unique ASPATHs interconnecting 1,300 di erent au-
ined the attributes re ectiveofinter-domain forward-
tonomous systems [8]. As shown later in this pap er, instabil-
ing path (ASPATH and nexthop), this may also re ect
ityiswell-distributed over destination pre xes, p eer routers,
p olicy uctuation.
and origin autonomous system space. In other words, no
WWDup: The rep eated transmission of BGP withdrawals
single pre x or path dominates the routing statistics or con-
for a pre x that is currently unreachable. This is
tributes a disprop ortionate amount of BGP up dates. Thus,
pathological b ehavior.
wewould exp ect that instability should b e prop ortional to
the 1,500 paths and 45,000 pre xes, or substantially less
4.1 Gross Observations than the three to six million up dates p er daywe currently
observe.
In the remainder of the pap er, we will refer to AADi , WAD-
The ma jority of these millions of unexp ected up dates,
i and WADup as instability.We will refer to WWDup as
however, may not re ect legitimate changes in network to-
pathological instability. AADup may represent either patho-
p ology. Instead, our study has shown that the majority
logical instabili ty or p olicy uctuation. A BGP up date may
of inter-domain routing information consists of pathologi-
contain additional attributes (MED, communities, lo calpref,
cal up dates. Sp eci c examples of these pathologies include:
etc.), but only changes in the (Pre x, NextHop, ASPATH)
rep eated, duplicate withdrawal announcements (WWDup),
tuple will re ect inter-domain top ological changes, or for-
oscillatin g reachability announcements (WADup), and dup-
warding instabili ty. Successive pre x advertisements with
licate path announcements (AADup). Figure 2 shows the
di erences in other attributes may re ect routing p olicy
relative distribution of each class of instabili tyover a seven
changes. For example, a network may announce a route
month p erio d. For the clarity and simpli cati on of the fol-
with a new BGP community. The new community repre-
lowing discussions, wehave excluded WWDup from Figure 2
sents a p olicy change, but may not directly re ect a change
so as not to obscure the salient features of the other data.
in the inter-domain forwarding path of user data.
The breakdown of instability categories shows that b oth the
In principle, the intro duction of classless inter-domain
AADup and WADup classi catio ns consistently dominate
routing (CIDR) [19] has allowed backb one op erators to group
other categories of routing instabili ty. The relative magni-
a large numb er of customer network IP addresses into one
tude of AADup up dates was unexp ected. Closer analysis
or more large \sup ernet" route advertisements at their au-
has shown that the AADup category is dominated by p ol-
tonomous system's b oundaries. A high level of aggregation
icy changes that do not directly a ect forwarding instability
will result in a small numb er of globally visible pre xes,
and will b e the topic of future work. Only a small p or-
and a greater stability in pre xes that are announced. In
tion of the BGP up dates (AADi , WADi ) eachdaymay
general, an autonomous system will maintain a path to an
directly re ect p ossible exogenous network events, suchas
aggregate sup ernet pre x as long as a path to one or more of
router failures and leased line disconnectivi ty. In Section 6,
the comp onent pre xes is available. This e ectively limits
we discuss the impact of the pathological up dates on In-
the visibili ty of instabili ty stemming from unstable customer 5
routing proto cols op erating within an autonomous system. ternet infrastructure. In general, the rep eated transmission
of these pathological up dates is a sub optimal use of critical
Internet infrastructure resources.
4.2 Possible Origins of Routing Pathologies
Our analysis indicates that a small p ortion of the extrane-
Network Announce Withdraw Unique
ous, pathological withdrawals may b e attributable to a sp e-
Provider A 1127 23276 4344
Provider B 0 36776 8424
ci c router vendor's implementation decisions. In particular,
Provider C 32 10 12
one Internet router vendor has made a time-space trade-
Provider D 63 171 28
Provider E 1350 1351 8
o implementation decision in their routers not to main-
Provider F 11 86417 12435
Provider G 2 61780 10659
tain state on the information advertised to the router's BGP
Provider H 21197 77931 14030
p eers. Up on receipt of any top ology change, these routers
Provider I 259 2479023 14112
Provider J 2335 1363 853
will transmit announcements or withdrawals to all BGP
p eers regardless of whether they had previously sent the
p eer an announcement for the route. Withdrawals are sent
Table 1: Partial list of up date totals p er ISP on February 1, 1997
for every explicitly and implicitl y withdrawn pre x. We will
at AADS. This data is representative of daily routing up date to-
tals. These totals should not b e interpreted as p erformance of
subsequently refer to this implementation as stateless BGP.
particular backb one provider. Data may b e more re ectiveofa
At each public exchange p oint, this stateless BGP imple-
provider's customers and the relative quality of address aggrega-
mentation may contribute an additional O (N U ) up dates
tion.
for each legitimate change in top ology, where N is the num-
b er of p eer routers and U is the numb er of up dates. It is
imp ortant to note that the stateless BGP implementation is
Analysis of nine months of BGP trac indicates that the
compliant with the current IETF BGP standard [12]. Sev-
ma jority of BGP up dates consist entirely of pathological ,
eral pro ducts from other router vendors do maintain knowl-
duplicate withdrawals (WWDup). Most of these WWDup
edge of the information transmitted to BGP p eers and will
withdrawals are transmitted by routers b elonging to au-
only transmit up dates when top ology changes a ect a route
tonomous systems that never previously announced reach-
between the lo cal and p eer routers. After the initial pre-
ability for the withdrawn pre xes. On average, we observe
sentation of our results [10], the vendor resp onsible for the
between 500,000 to 6 million pathological withdrawals p er
stateless BGP implementation up dated their router op er-
day b eing exchanged at the Mae-East exchange p oint. As
ating software to maintain partial state on BGP advertise-
Table 1 illustrates, many of the exchange p oint routers with-
ments. Several ISPs havenow b egun deploying the up dated
draw an order of magnitude more routes then they announce
software on their backb one routers. Preliminary results af-
during a given day.For example, Table 1 shows that ISP-
ter deployment of this new software indicate that it limits
I announced 259 pre xes, but transmitted over 2.4 mil lion
distribution of WWDup up dates. As we describ e b elow,
withdrawals for just 14,112 di erent pre xes.
although the software up date may b e e ective in masking
The 2.4 million up dates illustrates an imp ortant prop erty
WWDup b ehavior, it do es not explain the origins of the
of inter-domain routing { the disprop ortio nate e ect that a
oscillatin g WWDup b ehavior.
single service provider can have on the global routing mesh.
Overall, our study indicates that the stateless BGP im-
Our analysis of the data shows that all pathologicalrouting
plementation by itself contributes an insigni cant number of
incidents were caused by small service providers. We de ne
additional up dates to the global routing mesh. Sp eci call y,
a pathological routing incident as a time when the aggre-
the stateless BGP implementation do es not account for the
gate level of routing instability seen at an exchange p oint
oscillatin g b ehavior of WWDup, and AADup up dates. In
exceeds the normal level of instabili tyby one or more orders
the case of a single-homed customer and a numb er of state-
of magnitude. Further interaction with these providers has
less p eer routers, every legitimate announce-withdrawal se-
revealed several typ es of problems including miscon gured
quence should result in at most O (N ) up dates at the ex-
routers, and faulty new hardware/software in their infras-
change p oint, where N is the numb er of p eers. Instead,
tructure.
empirical evidence suggests that each legitimate withdrawal
Our data also indicates that not all service providers
may induce some typ e of short-lived pathological network
exhibit this pathological b ehavior. Empirical observations
oscillation . Wehave observed that the p ersistence of these
show that there is a strong causal relationshi p b etween the
up dates is b etween one and ve minutes.
manufacturer of a router used by an ISP and the level of
In general, Internet routing instability remains p o orly
pathological BGP b ehavior exhibited by the ISP.For exam-
understo o d and there is no consensus among the research
ple, in a particular case, we observed that b efore a large
and engineering communities on the characterization or sig-
service provider's transition to a backb one infrastructure
ni cance of many of the b ehaviors we observed. Researchers
based on particular router, the service provider exhibited
and the memb ers of the North American Network Op era-
well-b ehaved routing. Immediately following the transition,
tors Group (NANOG) have suggested a numb er of plausi-
the service provider b egan demonstrating pathological b e-
ble explanation s for the p erio dic b ehavior, includin g: CSU
havior similar to b ehaviors describ ed previously.
timer problems, miscon gured interaction of IGP/BGP pro-
Our analysis of the data also indicates that routing up-
to cols, router vendor software bugs, timer problems, and
dates have a regular, sp eci c p erio dicity. Wehave found
self-synchronizati on.
that most of these up dates demonstrate a p erio dicity of ei-
Most Internet leased lines (T1, T3) use a typ e of broad-
ther 30 or 60 seconds, as discussed b elow. We de ne the
band mo dem referred to as a Channel Service Units (CSU).
persistence of instabili ty and pathologies as the duration of
Miscon gured CSUs mayhave clo cks which derive from dif-
time routing information uctuates b efore it stabilizes. Our
ferent sources. The drift b etween two clo ck sources can
data indicate that the p ersistence of most pathological BGP
cause the line to oscillate b etween p erio ds of normal service
b ehaviors are under ve minutes. This short-lived patho-
and corrupted data. Unlike telephone customers, router in-
logical b ehavior suggests some typ e of delayinconvergence
terface cards are sensitive to milliseco nd loss of line carrier
between inter-domain BGP routers, or multiple IGP/EGP 6 800000 AA Different WA Different WA Duplicate 600000 AA Duplicate Uncatogorized
400000
BGP Announcments 200000
0 April May June July August September
Days (March through September 1996)
Figure 2: Breakdown of Mae-East routing up dates from April through Septemb er 1996.
Another plausibl e explanation for the source of the p eri- and will ag the link as down. If these CSU problems are
o dic routing instabilitymay b e the improp er con guration widespread, the resulting link oscillati on may contribute a
of the interaction b etween interior gateway proto cols and signi cantnumb er of the p erio dic BGP route withdrawals
BGP. The injection of routes from IGP proto cols, suchas and announcements we describ e.
OSPF, into BGP, and vice versa, requires a complex, and Another p ossible explanation involves a p opular router
often mishandled , ltering of pre xes. Since the conversion vendor's inclusion of an unjittered 30 second interval timer
between proto cols is lossy, path information (e.g., ASPATH) on BGP's up date pro cessing. Most BGP implementations
is not preserved across proto cols and routers will not b e able use a small, jittered timer to coalesce multiple outb ound
to detect an inter-proto col routing up date oscillati on. This routing up dates into a single BGP up date message in order
typ e of interaction is highly susp ect as most IGP proto cols to reduce proto col pro cessing overhead on the receiving p eer
utilize internal timers based on some multiple of 30 seconds. [11]. The combination of this timer and a stateless BGP im-
We are working closely with router vendors and backb one plementation mayintro duce some unintended side-e ects.
providers on an ongoing analysis of these interactions. Sp eci call y,we examine the sequence of an announcement
As describ ed earlier, Varadhan et al. [21] show that un- for a pre x with ASPATH A1, followed by an announcement
constrained routing p olicies can lead to p ersistent route os- (and subsequent implicit withdrawal for A1) for the pre x
cillations . Only the severely restrictive shortest-path route with ASPATH A2, followed by a re-announcement of the
selection algorithm is provably safe. Since the end of the pre x with ASPATH A1. If the sequence A1,A2,A1 o ccurs
NSFNet, routing p olicies have b een growing in size and within the expiration of the timer interval, the routing soft-
complexity. As the numb er of p eering arrangements and ware may ag the route as changed and transmit a duplicate
the top ological complexity of the Internet continue to grow, route announcement at the end of the interval. A similar
the p otential for developing p ersistent route oscillation in- sequence of events for the availabil i ty of a route, W,A,W,
creases. We note, however, that there have b een no known could account for WWDup b ehavior of some routers. Over-
rep orts to date of p ersistent route oscillation o ccurring in all, the 30 second interval timer may b e acting as an arti -
op erational networks. The evaluation and characterization cial route damp ening mechanism, and as such, the WWDup
of p otentially dangerous unconstrained p olicies remains an and AADup b ehavior may b e masking real instabili ty.We
op en issue currently b eing investigated by several research will discuss the implication and e ects of redundant BGP
groups. up dates and pathological b ehavior more in Section 5.
Unjittered timers in a router may also lead to self syn-
chronization. In [5], Floyd and Jacobson describ e a means
5 Analysis of Instability
by which an initially unsynchronized system of apparently
indep endent routers may inadvertently b ecome synchronized.
In the previous section we explored characteristics of patho-
In the Internet, the unjittered BGP interval timer used on a
logical routing b ehavior. In this section, we fo cus on the
large number of inter-domain b order routers mayintro duce
trends and characteristics of b oth forwarding instabili ty and
aweak coupling b etween those routers through the p erio dic
route p olicy uctuation. The remainder of this discussion
transmission of the BGP up dates. Our analysis suggests
presents routing statistics collected at the Mae-East exchange
that these Internet routers will ful ll the requirements of
p oint. It is imp ortant to note that these results are repre-
the Perio dic Message mo del [5] and may undergo abrupt
sentative of other exchange p oints, includin g PacBell and
synchronizatio n. This synchronization would result in a
Sprint.
large numb er of BGP routers transmitting up dates simulta-
neously. Floyd and Jacobson describ e self-synchronizati on
5.1 Instability Density
b ehavior with Decnet DNA proto col, the Cisco IGRP proto-
col, and the RIP1 proto col on the NSFNet backb one. The
Ignoring attribute changes and pathological trac (AADup
simultaneous transmission of up dates has the p otential to
and WWDup) we examined the remaining BGP up dates for
overwhelm the pro cessing capacity of recipient routers and
anyoverall patterns and trends. Figure 3 represents Internet
lead to p erio dic link or router failures. Wehave discussed
routing instabili ty for a seven month p erio d. This instabil-
the p ossibili ty of self-synchronization with router vendors
ity is measured as the sum of AADi , WADi , and WADup
and are exploring the validity of this conjecture.
up dates seen during the day for seven months. Eachdayis 7
4000
represented byavertical slice of small squares, each of which
represent a ten minute aggregate of instabili ty up dates. The
black squares represent a level of instabili tyabove a certain
3000
threshold; the light-gray squares a level b elow; and the white
squares represent times for which data is not available. Ad-
ditionall y, the horizontal axis has a raised indentation that
ts weekends. The raw data were detrended using
represen 2000
a least-square regression { routing instabili ty increased lin-
early during the seven month p erio d. Moreover, b ecause we ere lo oking for gross trends, the magnitude of the di er-
w 1000
ence b etween minimal and maximal instabili tywas reduced
Number of Instability Events
by examining the logarithm of this detrended data. Figure 3
represents the mo di ed data. The threshold was chosen as
0
tabove the mean of the mo di ed data, and as such
a p oin Saturday Sunday Monday Tuesday Wednesday Thursday Friday
represents a signi cant level of raw up dates that varies de-
p ending on the date. The values for the threshold corre-
Figure 4: Representativeweek of raw forwarding instabil-
sp ond to a raw up date rate from 345 up dates p er 10 minute
ity up dates (August 3 through 9, 1996) aggregated at ten
aggregate in April to 770 up dates in Octob er.
minute intervals.
24:00
week. From the data there app ears to b e a b ell-shap ed curve
of raw up dates that p eaks during the afterno on. Similarl y,
18:00
there is relatively little instability during the weekend. The
exception is Saturday's spike. Saturdays often have high
amounts of temp orally lo calized instabili ty.Wehavenoim- mediate explanation for this o ccurrence.
12:00
A more rigorous approach to identifying temp oral trends
in the the routing up dates was undertaken using time series
Time of Day (EST)
analysis. Sp eci cally, the mo di ed data represented in g-
ere analyzed using sp ectrum analysis. The data from
06:00 ure 3 w
August through September were used due to their complete-
ness. Again, these detrended data were ideal for harmonic
analysis having b een ltered in a manner similar to the treat-
00:00
t of Beverage's wheat prices by Blo om eld in [1]. The
April May June July August September October men
rate of routing up dates is mo deled as x = T I , where T is
t t t t
the trend at time t and I is an irregular or oscillating term.
t
Figure 3: Internet forwarding instability density measured
Since all three terms are strictly p ositive, we conclude that
at the Mae-East exchange p oint during 1996.
logx = logT + logI . T can b e assumed as some value of x
t t t t
near time t, and I some dimensionless quantity close to 1;
t
Figure 3 shows several interesting phenomena. The b ot-
hence logI oscillates ab out 0. This avoids adding frequency
t
tom of the graph represents midnight EST for each given
biases that can b e intro duced due to linear ltering.
day. Notice that during the hours of midnight EST (9:00pm
PST) to 6:00am EST there are signi cantl y fewer up dates
10
than during the rest of the day; the up dates app ear to b e
FFT
viest during North American network usage hours. In hea MEM
7 Days
particular, from no on to midnight are the densest hours.
24 Hours
The second ma jor trend is represented byvertical strip es
of less instabili ty (light gray) that corresp ond to weekends.
Perhaps the most striking visual pattern that emerges from
ertical lines at the end of May and
the graph are the b old v 1
b eginning of June. These represent the state of the Internet
during a ma jor ISP's infrastructure upgrade. Some networks
Power Spectrum Density
exp erienced esp ecially high levels of congestion, disconnec-
tivity, and latency during this p erio d. Another interesting
pattern is the horizontal line of dense up dates at approx-
imately 10:00am (7:00am PST). This line represents large 0
0.00 0.10 0.20 0.30 0.40 0.50
spikes of raw up dates that are consistently measured. A
Frequency (1/hour units)
plausible explanation for this lo calized density is that this
time may corresp ond to backb one maintenance windows.
Figure 5: Results from time series analysis of the Internet
Finally, notice that the up dates measured during June, July
forwarding instabili ty up dates measured at the Mae-East
and early August from ab out 5:00pm to midnight are sparser
exchange p oint during August and Septemb er 1996 using
than those times in May and late August and Septemb er.
hourly aggregates.
This may represent summer vacation at most of the educa-
tional hosts in the Internet, and re ects a pattern closer to
Figure 5 shows a correlogram of the data generated by
the usage of business.
two techniques: a traditional fast Fourier transform (FFT)
The week of routing up dates represented in gure 4 pro-
of the auto correlation function of the data; and maximum-
vides a representative display of the general trends over a 8
entropy (MEM) sp ectral estimation. These two approaches not a correlation b etween the size of an AS, and its share of
di er in their estimation metho ds, and provide a mechanism the up date statistics.
for validation of results. They b oth nd signi cant frequen- The Internet routing tables are dominated by six to eight
cies at seven days, and 24 hours. These con rm the visual ISPs. These ISPs represent the clusters of p oints highlighted
trends identi ed in gures 3 and 4. in gure 6a. Over the course of the month, their share of
It is somewhat surprising that the measured routing in- the default-free routing tables did not change signi cantl y.
stability corresp onds so closely to the trends seen in Internet Over the course of our analysis no single ISP consistently
bandwidth usage [15] and packet loss. As to the causality contributes disprop ortio natel y to the measured instability
of these phenomena, we can only o er supp osition s. With a in all three categories. The exception, shown in the gures,
high level of packet loss and a signi cant rate of BGP up- is ISP-E which during August was going through an infras-
dates, keep-alive messages can b ecome delayed long enough tructure transition. While it is not characteristic of ISP-E 's
to drop BGP connections b etween p eering routers. The sp e- b ehavior for every month, it was characteristic of our analy-
ci c levels of up date load and congestion necessary to sever sis that at least one of the ma jor ISPs was going through an
these connections vary dep ending on the routing technol- infrastructure change at any given p oint in time. Some au-
ogy in place. Once a BGP connection is severed, all of the tonomous systems always represent a somewhat larger share
p eer's routes are withdrawn. An alternate explanation is of instabili ty, but this may b e explained by a large number
that this cycle is due to Internet engineering activity that of factors. For example, ISP-A provides connectivitytoa
o ccurs within a business day.However, the data seem to large numberofinternational networks; ISP-B is a relatively
indicate that a signi cant level of instabili ty remains until new ISP that has a muchyounger customer base and has
late evening, correlating more with Internet usage than engi- b een able to provide address space from under its own set
neering maintenance hours. While the relationshi p b etween of aggregated CIDR blo cks, p erhaps hiding internal insta-
network usage and routing instabili tymay seem intuitively bility through b etter aggregation. Additional factors that
obvious to some, a more rigorous justi cation is problem- can skew ISP b ehavior include: customer b ehavior, routing
atic due to the size and heterogeneity of the Internet. We p olicies, and quality of aggregation.
are continuing to investigate this relationship in our current Wenow fo cus on the instability on a p er-route basis.
work [8]. Sp eci call y,we lo ok at the instabili ty measured at the Mae-
East exchange p oint during August for (pre x, AS-p eer)
pairs, or Pre x+AS. A Pre x+AS represents a set of routes
5.2 Fine-grained Instability Statistics
that an AS announces for a given destination. It is more
Having examined aggregate instabili ty statistics, wenow an-
sp eci c than a pre x since the same pre x could b e reached
alyze the data at a ner granularity: autonomous system
through several ASes; and more general than a route which
and route contributions. To simplify the following presenta-
uniquely sp eci es the ASPATH. By aggregating routing up-
tion, we fo cus on a single month of instabili ty, August 1996,
dates on Pre x+AS pairs, we can pinp oint several rout-
measured at the Mae-East exchange p oint. This month was
ing up date phenomena: up dates that oscillate over several
chosen since it typi es the results seen at the other exchange
routes for a given pre x; AS contribution for given pre x;
p oints across our measurements. Sp eci call y,we show that:
and pre x b ehavior.
Figure 7 shows the cumulative distributio n of Pre x+AS
No single autonomous system consistently dominates
instabili ty for the four BGP announcement categories. In
the instability statistics.
all four graphs, the horizontal axes represent the number
of Pre x+AS pairs that exhibited a sp eci c numb er of BGP
There is not a correlation b etween the size of an AS
instabili tyevents; the vertical axes show the cumulative pro-
(measured at the public exchange p oint as the num-
p ortion of all suchevents. The graphs contain lines that
b er of routes which it announces to non-customer and
represent daily cumulative distribution s for August 1996.
non-transit p eers) and its prop ortion of the instability
Examining these graphs, one can see that from 80 to 100
statistics.
p ercent of the daily instabili ty is contributed by Pre x+AS
pairs announced less than fty times. For example, gure 7a
A small set of paths or pre xes do not dominate the
shows that dep ending on the day, from 20 to 90 p ercent
instability statistics; instabili tyisevenly distributed
(median of approximately 75%) of the AADi events are
across routes.
contributed by routes that changed ten times or less. To-
gether, these graphs show that no single route consistently
The graphs in gure 6 break down the routing up dates
dominates the instabili ty measured at the exchange p oint.
seen during August measured in each of the route server's
However, there are days where a single Pre x+AS pair con-
p eers. Three up date categories (AADi , WADi , and WA-
tributes substantiall y, such as August 11, a day where sev-
Dup) are shown where p oints represent the normalized num-
eral pre x+AS pairs contributed ab out 40% of the daily ag-
b er of up dates announced by a p eer on a sp eci c day. That
gregate AADi s, graphicall y displayed as the lowest curve
is, there is a p oint for every p eer for every day in August.
in gure 7a. Sp eci call y, in this example, ISP-A announced
The horizontal axes show the prop ortion of the Internet's
seven routes eachbetween 630 and 650 times. These same
default-free routing table for which the p eer is resp onsible
seven routes had an equal amount of AADups that day and
on a sp eci c day; the vertical axes signify the prop ortion of
also account for the low curve in gure 7c. Moreover, there
that day's route up dates that the p eer generated. The diago-
are zero withdrawals on these seven pre xes.
nal represents the break-even p oints: where a p eer generates
When comparing the four typ es of routing up dates in g-
a prop ortion of announcements equal to its resp onsibil i ty for
ure 7, one can see that WADi climbs to a plateau of ab out
routes in the routing table. If routing up dates were equally
95% faster than the other three categories. WADi also has
distributed across all routes, wewould exp ect to see au-
the fewest numb er of Pre x+AS pairs that dominate their
tonomous systems generating them at a rate equal to their
days. In fact, there are very few days where a Pre x+AS has
share of the routing table. Generally,we do not see that:
more than 100 WADi events. Similarl y, there are very few
few days cluster ab out the line which indicates that there is 9 1.0 1.0 1.0
A 0.8 0.8 0.8
0.6 E 0.6 0.6
0.4 0.4 0.4 Proportion of Announcements Others
0.2 Proportion of WADUP Announcements C Proportion of WADIFF Announcements 0.2 0.2 B F D
0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Proportion of Routing Table Proportion of Routing Table Proportion of Routing Table
(a) AS AADi Contribution
(b) AS WADi Contribution (c) AS WADup Contribution
Figure 6: AS contribution to routing up dates measured at the Mae-East exchange p oint during August 1996. These graphs
measure the relative level of routing up dates generated by backb one providers. This data do es not represent relative p erfor-
mance of ISPs, and may b e more re ective of customer instabili ty and address allo cation p olicies.
1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 Cumulative proportion Cumulative Proportion Cumulative Proportion Cumulative Proportion
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000
PrefixAS Announcements Prefix+AS Announcements Prefix+AS Announcements Prefix+AS Announcements
(a) AADi (b) WADi (c) AADup (d) WADup
Figure 7: Cumulative distribution of Pre x+AS routing up dates measured at the Mae-East exchange p oint during August
1996. Each line in a graph represents the up date distributio n for a single day.
to examine the high frequency comp onents. Our results days where a Pre x+AS sees more than 200 AADi events.
are shown in gure 8. The graphs in gure 8 represents Taken together, this information is comforting since these
a histogram distributio n for each of the four instabili ty cat- categories p erhaps b est represent actual top ological insta-
egories. The graphs' horizontal axes mark the histogram bility. In contrast, the categories than may represent re-
bins in a log-time scale that ranges from one second (1s) dundant instability information, AADup and WADup, b oth
to one day(24h); the vertical axes show the prop ortion of have a signi cantnumber of days where from 5% to 10%
up dates contained in the histogram bins. The data shown of their events come from Pre x+AS pairs that o ccur 200
in these graphs take the form of a mo di ed b ox plot: the times or more. An investigation of instability aggregated on
black dot represents the median prop ortion for all the days pre x alone generated results similar to those shown in this
for eachevent bin; the vertical line b elow the dot contains section and have b een omitted.
the rst quartile of daily prop ortions for the bin; and the
line ab ove the dot represents the fourth quartile.
5.3 Temp oral Prop erties of Instability Statistics
As illustrated gure 8, the predominant frequencies in
We next turn our attention to the temp oral prop erties of
each of the graphs are captured by the thirty second and
Internet routing instability. Section 5.1 describ ed the aggre-
one minute bins. The fact that these frequencies account
gate temp oral b ehavior and identi ed the weekly and daily
for half of the measured statistics was surprising. Normally
frequencies. Here weinvestigate the frequency distributions
one would exp ect a exp onential distributio n for the inter-
for instabili tyevents at the Pre x+AS level. Again our anal-
arrival time of routing up dates as they might re ect exoge-
ysis lo oks at the statistics from August 1996 measured at
nous events, suchaspower outages, b er cuts and other
the Mae-East exchange p oint. For this analysis, we de ne a
natural and human events. The thirty second p erio dicity
routing up date's frequency as the inverse of the inter-arrival
suggests some wide-spread, systematic in uence in the ori-
time b etween routing up dates; a high frequency corresp onds
gin, or on the ow of instability information. There are
to a short inter-arrival time.
several p ossible causes for this p erio dicity including rout-
Wewere particularl y interested in the high frequency
ing software timers, self synchronization, and routing lo ops.
comp onent of routing instability in our analysis. Other work
The presence of these frequencies in the more legitimate in-
has b een able to capture the lower frequencies through b oth
stability categories, suchasWADi and AADi almost cer-
routing table snapshots [6] and end-to-end techniques [17].
tainly represents some pathology whichmay b e caused by
Our measurement apparatus allowed a unique opp ortunity
CSU handshaking timeouts on leased lines or a aw in the 10 0.4
0.4 0.3
0.2 0.2
0.2 Proportion of Events Proportion of Events Proportion of Events 0.1
0.0 0.0 0.0 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h
Histogram Bins Histogram Bins Histogram Bins
(a) AADi (b) WADi (c) WADup
Figure 8: Histogram distribution of up date inter-arrival time distances for Pre x+AS instabili ty measured at the Mae-East
exchange p oint during August 1996.
resources. routing proto cols.
Our analysis of the data showed that instabili tyiswell
distributed across b oth autonomous systems and pre x space.
6 Impact of Routing Instability and Conclusion
More succinctly, no single service provider or set of network
destinations app ears to b e at fault. We describ ed a strong
As we describ ed earlier, forwarding instability can havea
correlation b etween the version and manufacturer of a router
signi cant deleterious impact on the Internet infrastructure.
used by an ISP and the level of pathological b ehavior exhib-
Instability that re ects real top ological changes can lead
ited by that ISP. As noted earlier, router vendors resp onded
to increased packet loss, delay in time for network conver-
to our nding, and develop ed software up dates to limit sev-
gence, and additional memory/CPU overhead on routers.
eral pathologies. Up dated software is now actively b eing
In the currentInternet, network op erators routinely rep ort
deployed by backb one op erators. Preliminary results indi-
backb one outages and other signi cant network problems
cate that it will b e successful in limiting the ow of some
directly related to the o ccurrence of route aps [14].
pathologies, particularl y those involving WWDup up dates.
Our analysis in this pap er demonstrated that the ma-
We also showed that instabili ty and redundant infor-
jority (99 p ercent) of routing information is pathological
mation exhibit strong temp oral prop erties. We describ e a
and may not re ect real network top ological changes. We
strong correlation b etween the level of routing activity and
de ned a taxonomy for discussing routing information and
network usage. The magnitude of routing information ex-
suggested a numb er of plausibl e explanations that may ac-
hibits the same signi cantweekly, daily and holiday cycles
count for some of the anomalous b ehaviors. Router vendors
as network usage and congestion. Although the relation
and ISPs are currently pro ceeding with the deploymentof
between instabili ty and congestion may seem intuitive, a
up dated routing software to correct some of the p otential
formal explanation for this relationshi p is more dicult.
problems we describ ed.
Instability and redundant routing information also ex-
Since pathological , or redundant, routing information
hibit a strong p erio dici ty. Sp eci call y,we describ ed 30 and
do es not a ect a router's forwarding tables or cache, the
60 second p erio dicity in b oth instability and redundant BGP
overall impact of this phenomena may b e relatively b enign
information. We o ered a numb er of plausibl e explanations
and may not substantially impact a router's p erformance.
for this phenomena, including: self-synchronizati on, miscon-
Most of the pathological up dates will b e quickly discarded
guration of IGP/BGP interactions, router software prob-
by routers and will not undergo p olicy evaluation. More im-
lems, and CSU link oscillation . The origins of this p erio dic
p ortantly, these pathological up dates will not trigger router
phenomena, however, remain an op en question.
cache churn and the resultant cache misses and subsequent
If we ignore the impact of redundant up dates and other
packet loss.
pathological b ehaviors, Figure 9 shows that most (80 p er-
Anumb er of network op erators, however, b elieve that
cent) of Internet routes exhibit a relatively high level of sta-
the the sheer volume of pathological up dates may still b e
bility. Only b etween 3 and 10 p ercent of routes exhibit one
problematic [16]. Even pathological up dates require some
or more WADi p er day, and b etween 5 and 20 p ercent ex-
minimal router resources, including CPU, bu ers and the
hibit one or more AADi eachday. This conforms with
exp ense of marshaling pathological pre x data into b oth in-
empirical observations by most end-users that the Internet
b ound and outb ound packets. Informal exp eriments with
usually seems to work. Our data also agrees with Paxson's
several p opular routers suggest that suciently high rates
ndings that only a very small fraction of routes exhibit
of pathological up dates (e.g. 300 up dates p er second) are
some typ e of top ological instability eachday[17].
enough to crash a widely deployed, high-end mo del of com-
One of our diculties in evaluating the impact of insta-
mercial Internet router. We de ne crash as a state in which
bilityonInternet p erformance is that wehave not yet fully
the router is completely unresp onsive and do es not resp ond
b een able to characterize and understand the signi cance of
to future routing proto col messages, or console interrupts.
the di erent classes of routing information. Figure 9 shows
Other studies have rep orted high CPU consumption and
that b etween 35 and 100 p ercent (50 p ercent median) of
loss of p eering sessions at mo derate rates of routing insta-
pre x+AS tuples are involved in at least one category of
bility. Although our analysis of the impact of redundant
routing up date (p olicy uctuation, forwarding instabili ty,
information on Internet p erformance is still ongoing, webe-
pathological information) eachday. Sp eci call y,we do not
lieve pathological up dates are a sub optimal use of Internet 11 1.0
Any
CM SIGCOMM '92, pp. 40-52, Baltimore, MD, Au- AADUP A
WADUP gust, 1992. 0.8 AADIFF
WADIFF
[5] S. Floyd, and V. Jacobson, \The Synchronizati on of
Perio dic Routing Messages," IEEE/ACM Transactions
0.6
on Networking, V.2 N.2, p. 122-136, April 1994.
[6] R. Govindan and A. Reddy, \An Analysis of Inter-
0.4
Domain Top ology and Route Stability," in Proceedings
of the IEEE INFOCOM '97, Kob e, Japan. April 1997.
0.2
[7] J. Honig, D. Katz, M. Mathis, Y. Rekhter, and J. Yu,
Proportion of Routes Experiencing Event
\Applicatio n of the Border Gateway Proto col in the
ternet," RFC-1164, June 1990. 0.0 In
April May June July August September
[8] Internet Performance Measurement and Analysis
pro ject (IPMA), http://www.merit.edu/ipma.
Figure 9: Prop ortion of Internet Routes a ected by routing
up dates (1996). Days shown have at least 80 p ercent of the
[9] C. Lab ovitz, \Multithreaded Routing To olkit," Merit
date's data collected.
Technical Rep ort, 1996.
[10] C. Lab ovitz, NANOG presentation, Washington, D.C.,
know what p ercentage of redundant up dates may actually
May 1996.
b e re ective of \legitimate" changes in forwarding informa-
tion. As we describ ed earlier, some of our analysis suggests
[11] Dave O'Leary, Cisco Systems, Inc. Private communica-
that a p ortion of the AADup and WWDup b ehaviors may
tion, January 1997.
originate in the interaction b etween forwarding instability
[12] K. Lougheed and Y. Rekhter, \A Border Gateway Pro-
and the 30 second interval timer on some routers. If this
to col (BGP),", RFC-1163 June 1990.
is the case, then some p ortion of pathological b ehavior may
re ect legitimate top ological changes.
[13] B. Metcalf, \Predicting the Internet's Catastrophic
By directly measuring the BGP information shared by
Collapse and Ghost Sites Galore in 1996," InfoWorld,
Internet Service Providers at several ma jor exchange p oints,
Decemb er 4, 1995.
this pap er identi ed several imp ortant trends and anomalies
in inter-domain routing b ehavior. This work in conjunction
[14] Merit Joint Technical Sta mail archives,
with several other research e orts has b egun to examine
http://www.merit.edu/mjts/msg00078.html.
inter-domain routing through exp erimental measurements.
[15] MFS Communications Mae-East Statistics Page, These research e orts help characterize the e ect of added
http://www.mfst.com/MAE/east.stats.html. top ological complexity in the Internet since the end of of the
NSFNet backb one. Further studies are crucial for gaining
[16] North American Network Op erators Group,
insightinto routing b ehavior and network p erformance so
http://www.nanog.org.
that a rational growth of the Internet can b e sustained.
[17] V. Paxson, \End-to-End Routing Behavior in the Inter-
Acknowledgments
net," in Proceedings of the ACM SIGCOMM '96, Stan-
ford, C.A., August 1996.
We wish to thank Vadim Antonov, Hans-Werner Braun,
[18] Y. Rekhter, \Scalable Supp ort for Multi-homed Multi-
Randy Bush, Kim Cla y,Paul Ferguson, Ramesh Govin-
Provider Connectivity," NANOG, Ann Arb or, MI. Oc-
dan, Sue Hares, John Hawkinson, Tony Li, Dave O'Leary,
tob er 1996.
Dave Meyer, Yakov Rekhter, Brian Renaud, Dave Thaler,
Curtis Villamizar, and David Ward for their comments and
[19] Y. Rekhter and C. Top olcic, \Exchanging Routing In-
helpful insights. We also thank the anonymous referees for
formation Across Provider Boundaries in the CIDR En-
their feedback and constructive criticism.
vironment," RFC-1520. Septemb er 1993.
References [20] Routing Arbiter web pages, http://www.ra.net.
[21] K. Varadhan, R. Govindan, and D. Estrin, \Persis-
[1] P. Blo om eld, \Fourier Analysis of Time Series: An
tent Routing Oscillations in Inter-Domain Routing,"
Intro duction," John Wiley & Sons, New York. 1976.
USC/ISI, Available at the Routing Arbiter pro ject's
[2] H.-W. Braun, P.S.Ford and Y. Rekhter, \CIDR and
homepage at USC/ISI.
the Evolution of the Internet," SDSC Rep ort GA-
[22] C. Villamizer, R. Chandra, and R. Govindan, \draft-
A21364, in Proceedings of INET'93, Republished in
ietf-idr-route-damp en-00-p revi ew", Internet Engineer-
ConneXions Sep 1993 (InterOp93 version).
ing Task Force Draft, July 21, 1995.
[3] B. Chinoy, \Dynamics of Internet Routing Informa-
[23] C. Villamizer, \TCP Resp onse Under Loss Conditions",
tion," in Proceedings of ACM SIGCOMM '93, pp. 45-
NANOG Presentation, San Francisco, February 1997.
52, Septemb er 1993.
[4] D. Estrin, Y. Rekhter, and S. Hotz, \A Scalable Inter-
domain Routing Architecture," in Proceedings of the 12