<<



Internet Routing Instability

Craig Lab ovitz, G. Rob ert Malan, and Farnam Jahanian

UniversityofMichigan

Department of Electrical Engineering and Science

1301 Beal Ave.

Ann Arb or, Michigan 48109-2122

flab ovit, rmalan, [email protected]

network reachability and top ology information, has a num- Abstract

b er of origins including con guration errors, transient

This pap er examines the network inter-domain routing in-

physical and data link problems, and software bugs. Insta-

formation exchanged b etween backb one service providers at

bility, also referred to as \route aps", signi cantl y con-

the ma jor U.S. public exchange p oints. Internet

tributes to p o or end-to-end network p erformance and de-

routing instabili ty, or the rapid uctuation of network reach-

grades the overall eciency of the Internet infrastructure.

ability information, is an imp ortant problem currently fac-

All of these sources of network instabili ty result in a large

ing the Internet engineering community. High levels of net-

numb er of routing up dates that are passed to the core Inter-

work instabili ty can lead to packet loss, increased network

net exchange p oint routers. Network instabili ty can spread

latency and time to . At the extreme, high lev-

from router to router and propagate throughout the net-

els of routing instabilityhave lead to the loss of internal

work. At the extreme, route aps have led to the transient

connectivity in wide-area, national networks. In this pap er,

loss of connectivity for large p ortions of the Internet. Over-

we describ e several unexp ected trends in routing instabili ty,

all, instability has three primary e ects: increased packet

and examine a numb er of anomalies and pathologies ob-

loss, delays in the time for network convergence, and addi-

served in the exchange of inter-domain routing information.

tional resource overheard (memory, CPU, etc.) within the

The analysis in this pap er is based on data collected from

Internet infrastructure.

BGP routing messages generated by b order routers at ve

The Internet is comprised of a large number of intercon-

of the Internet core's public exchange p oints during a nine

nected regional and national backb ones. The large public

month p erio d. We show that the volume of these routing up-

exchange p oints are often considered the \core" of the In-

dates is several orders of magnitude more than exp ected and

ternet, where backb one service providers peer, or exchange

that the ma jority of this routing information is redundant,

trac and routing information with one another. Backb one

or pathological . Furthermore, our analysis reveals several

service providers participatin g in the Internet core must

unexp ected trends and ill-b ehaved systematic prop erties in

maintain a complete map, or default-free , of all

Internet routing. We nally p osit a numb er of explanations

globally visible network-layer addresses reachable through-

for these anomalies and evaluate their p otential impact on

out the Internet.

the Internet infrastructure.

The Internet is divided into a large numb er of di er-

ent regions of administrative control commonly called au-

tonomous systems. These autonomous systems (AS) usually

1 Intro duction

have distinct routing p olicies and connect to one or more

remote autonomous systems at private or public exchange

Since the end of the NSFNet backb one in April of 1995, the

points. Autonomous systems are traditionall y comp osed of

Internet has seen explosive growth in b oth size and top olog-

network service providers or large organizational units like

ical complexity. This growth has placed severe strain on the

college campuses and corp orate networks. At the b oundary

commercial Internet infrastructure. Regular network p er-

of each autonomous system, p eer b order routers exchange

formance degradations stemming from shortages

reachability information to destination IP address blo cks [2],

and a lack of router switching capacity,have lead the p op-

or pre xes, for b oth transit networks, and networks origi-

ular press to decry the imminent death of the Internet [13].

nating in that routing domain. Most autonomous systems

Routing instability, informally de ned as the rapid change of

exchange routing information through the Border Gateway



Supp orted by National Science Foundation Grant NCR-9321060

Proto col (BGP ) [12 ].

and a generous gift from the Intel Corp oration.

Unlikeinterior gateway proto cols, such as IGRP and

This pap er app ears in the Pro ceedings of the ACM SIGCOMM '97.

OSPF, that p erio dical ly o o d an intra-domain network with

c

Copyright 1997 by the Asso ciation for Computing Machinery, Inc.

all known routing table entries, BGP is an incremental pro-

Permission to make digital or hard copies of part or all of this work

for p ersonal or classro om use is granted without fee provided that

to col that sends up date information only up on changes in

copies are not made or distributed for pro t or direct commercial ad-

network top ology or routing p olicy. Moreover, BGP uses

vantage and that copies b ear this notice and the full citation on the

TCP as its underlying transp ort mechanism in contrast to

rst page. Copyrights for comp onents of this work owned by others

manyinterior proto cols that build their own reliabili tyon

than ACM must b e honored. Abstracting with credit is p ermitted.

To copy otherwise, to republish, to p ost on servers, or to redistribute

top of a datagram service. As a path vector routing pro-

to lists, requires prior sp eci c p ermission and/or a fee. Request p er-

to col, BGP limits the distribution of a router's reachability

missions from Publications Dept., ACM Inc., fax +1 (212) 869-0481,

information to its peer, or neighb or routers. A path is a se-

or (p [email protected]). 1

quence of intermediate autonomous systems b etween source  Routing information is dominated by pathological , or

and destination routers that form a directed route for pack- redundant up dates, whichmay not re ect changes in

ets to travel. Router con guration les allow the stipulation routing p olicy or top ology.

of routing policies that may sp ecify the ltering of sp eci c

 Instability and redundant up dates exhibit a sp eci c

routes, or the mo di cation of path attributes sent to neigh-

p erio dici ty of 30 and 60 seconds.

b or routers. Routers may b e con gured to make p olicy deci-

sions based on b oth the announcement of routes from p eers

 Instability and redundant up dates show a surprising

and their accompanying attributes. These attributes, such

correlation to network usage and exhibit corresp onding

as Multi Exit Descriptor (MED), may serveashints to help

daily and weekly cyclic trends.

routers chose b etween alternate paths to a given destination.

Backb one b order routers at public exchange p oints com-

 Instability is not dominated by a small set of autono-

monly have thirty or more external, or inter-domain, p eers,

mous systems or routes.

as well as a large number of intra-domain p eering sessions

 Instability and redundant up dates exhibit b oth strong

with internal backb one routers. After each router makes a

high and low frequency comp onents. Much of the high

new lo cal decision on the b est route to a destination, it will

frequency instabili ty is pathological.

send that route, or path information along with accompa-

nying distance metrics and path attributes, to each of its

 Discounting p olicy uctuation and pathological b ehav-

p eers. As this reachability information travels through the

ior, there remains a signi cant level of Internet for-

network, each router along the path app ends its unique AS

warding instability.

numb er to a list in the BGP message. This list is the route's

ASPATH. An ASPATH in conjunction with a pre x provide

 This work has led to sp eci c architectural and pro-

a sp eci c handle for a one-way transit route through the

to col implementation changes in commercial Internet

network.

routers through our collab orati on with vendors.

Routing information shared b etween p eers in BGP has

two forms: announcements and withdrawals. A route an-

The remainder of this pap er is organized as follows: Sec-

nouncement indicates a router has either learned of a new

tion 2 describ es the infrastructure used to collect the rout-

network attachment or has made a p olicy decision to prefer

ing stability data analyzed in this pap er. Section 3 provides

another route to a network destination. Route withdrawals

further background on Internet routing and related work.

are sent when a router makes a new lo cal decision that a net-

Section 4 describ es a numb er of anomalies and pathologies

work is no longer reachable. We distinguish b etween explicit

observed in BGP routing information. It de nes a taxon-

and implicit withdrawls. Explicit withdrawls are those asso-

omy for discussing the di erent categories of BGP up date

ciated with a withdrawl message; whereas an implicit with-

information, and p osits a numb er of plausible explanations

drawl o ccurs when an existing route is replaced by the an-

for the anomalous routing b ehavior. Section 5 describ es key

nouncement of a new route to the destination pre x without

trends and characteristics of forwarding instability. Finally,

an intervening withdrawl message. A BGP update may con-

the pap er concludes with a discussion on the p ossible im-

tain multiple route announcements and withdrawals. In an

pact of di erent categories of instabili ty on the p erformance

optimal, stable wide-area network, routers only should gen-

of the Internet infrastructure.

erate routing up dates for relatively infrequent p olicy changes

and the addition of new physical networks.

2 Metho dology

In this pap er, we measured the BGP up dates generated

by service provider backb one routers at the ma jor U.S. pub-

Our analysis in this pap er is based on data collected from the

lic exchange p oints. Our exp erimental instrumentation of

exp erimental instrumentation of key p ortions of the Internet

these exchanges p oints has provided signi cant data ab out

infrastructure. Over the course of nine months, we logged

the internal routing b ehavior of the core Internet. This data

BGP routing messages exchanged with the Routing Arbiter

re ects the stabilityofinter-domain Internet routing, or

pro ject's route servers at ve of the ma jor U.S. network ex-

changes in top ology or p olicy b etween autonomous systems.

change p oints: AADS, Mae-East, Mae-West, PacBell, and

Intra-domain routing instabili ty is not explicitl y measured,

Sprint. At these geographicall y diverse exchange p oints,

and is only indirectly observed through BGP information

network service providers p eer by exchanging b oth trac

exchanged with a domain's p eer. We distinguish b etween

and routing information. The largest public exchange, Mae-

three typ es of inter-domain routing up dates: forwarding in-

East lo cated near Washington D.C., currently hosts over 60

stability may re ect legitimate top ological changes and af-

service providers, includin g ANS, BBN, MCI, Sprint, and

fects the paths on which data will b e forwarded b etween au-

UUNet. Figure 1 shows the lo cation of each exchange p oint,

tonomous systems; routing policy uctuation re ects changes

and the numb er of service providers p eering with the route

in routing p olicy information that may not a ect forwarding

servers at each exchange.

paths b etween autonomous systems; and pathological up-

Although the route servers do not forward network traf-

dates are redundant BGP information that re ect neither

c, they do p eer with the ma jority(over 90 p ercent) of the

routing nor forwarding instability.We de ne instability as

service providers at each exchange p oint. The route servers

an instance of either forwarding instability or p olicy uctua-

provide aggregate route server BGP information to a num-

tion. Although some of the preliminary results of our study

b er of client p eers. Unlike the sp ecialized routing hardware

have b een rep orted at recent NANOG, IETF, and IEPG

used by most service providers, the route servers are Unix-

meetings, this pap er is the rst detailed written rep ort of

based systems which provide a unique platform for exchange

our ndings. The ma jor results of our work include:

p oint statistics collection and monitoring.

The Routing Arbiter pro ject has amassed 12 gigabytes

 The numb er of BGP up dates exchanged p er dayinthe

of compressed data since January of 1996. In January 1997,

Internet core is one or more orders of magnitude larger

the op erational phase of the Routing Arbiter pro ject ended.

than exp ected. 2

memory problems at heavy levels of routing instability. Many

of the commonly deployed Internet routers are based on the

older Motorola 68000 series pro cessor. Under stable network

conditions, these low-end pro cessors are sucient for most

of the router's computational needs since the bulk of the

router's activity happ ens directly on the forwarding hard-

ware, leaving the pro cessor to handle the pro cessing of BGP

and interior gateway proto col (IGP ) messages. But heavy

instabili ty places larger demands on a router's CPU and

may frequently lead to problems in memory consumption

and queuing delay of packet pro cessing. Frequently, the de-

lays in pro cessing are so severe that routers delay routing

Keep-Alive packets and are subsequently agged as down,

Figure 1: Map of ma jor U.S. Internet exchange p oints.

or unreachable by other routers. Wehave deterministicall y

repro duced this e ect under lab oratory conditions with only

mo derate levels of route uctuation. These exp eriments are

Data collection and analysis has continued under the aus-

corrob orated by the exp erience of router vendors and ISP

pices of the Internet Performance Measurement and Analy-

backb one engineers.

sis (IPMA) pro ject [8]. We use several to ols from the Mul-

Exp erience with the NSFNet and wide-area backb ones

tithreaded Routing To olkit (MRT) to olkit [9] to deco de and

has demonstrated that a router which fails under heavy

analyze the BGP packet logs from the route server p eering

routing instabili ty can instigate a \route ap storm." In

sessions. Although we analyze data from all of the ma jor

this mo de of pathological oscillation , overloaded routers are

exchange p oints, we simplify the discussion in much of this

marked as unreachable by BGP p eers as they fail to main-

pap er by concentrating on the logs of the largest exchange,

tain the required interval of Keep-Alive transmissions. As

Mae-East. We analyze the BGP data in an attempt to char-

routers are marked as unreachable, p eer routers cho ose al-

acterize and understand b oth the origins and op erational

ternative paths for destinations previously reachable through

impact of routing instabili ty.For the purp oses of data ver-

the \down" router and will transmit up dates re ecting the

i cation, wehave also analyzed sample BGP backb one logs

1

change in top ology to each of their p eers. In turn, after re-

from a numb er of large service providers .

covering from transient CPU problems, the \down" router

Increasingly, ma jor Internet service providers (ISP) are

will attempt to re-initiate a BGP p eering session with each

utilizing private p eering p oints for the exchange of inter-

of its p eer routers, generating large state dump transmis-

domain trac. However, this role was not signi cant during

sions. This increased load will cause yet more routers to

the data collection p erio d represented by the analysis in this

fail and initiate a storm that b egins a ecting ever larger

work. A greater level of co op eration with the ma jor ISPs

sections of the Internet. Several route ap storms in the

will b e needed in the future for continued measurementof

past year have caused extended outages for several million

Internet routing instabili ty.

network customers. The latest generation of routers from

several vendors (includin g and Ascend Com-

3 Background

munications) provide a mechanism in which BGP trac is

given a higher priority and Keep-Alive messages p ersist even

The uctuation of network top ology can have a direct im-

under heavy instability.

pact on end-to-end p erformance. A network that has not

Instability is not unique to the Internet. Rather, insta-

yet reached convergence may drop packets, or deliver pack-

bilityischaracteristic of any dynamically adaptive routing

ets out of order. In addition, through analysis of our data

system. Routing instabili ty hasanumb er of p ossible ori-

and ongoing discussions with router vendors, wehave found

gins, including problems with leased lines, router failures,

that a signi cantnumb er of the core Internet routers to day

high levels of congestion and software con guration errors.

are based on a route caching architecture [11]. In this archi-

After one or more of these problems a ects the availabil ity

tecture, routers maintain a routing table cache of destina-

of a path to a set of pre x destinations, the routers top ologi-

tion and next- lo okups. As long as the router's interface

cally closest to the failure will detect the fault, withdraw the

card nds a cache entry for an incoming packet's destination

route and make a new lo cal decision on the preferred alter-

addresses, the packet is switched on a \fast-path" indep en-

native route, if any, to the set of destinations. These routers

dently of the router's CPU. Under sustained levels of routing

will then propagate the new top ological information to each

instabili ty, the cache undergo es frequent up dates and the

router within the autonomous system. The network's b or-

probabili ty of a packet encountering a cache miss increases.

der routers will in turn propagate the up dated information

A large numb er of cache misses results in increased load on

to each external p eer router, p ending lo cal p olicy decisions.

the CPU, increased switching latency and the loss of packets.

Routing p olicies on an autonomous system's b order routers

Anumb er of researchers are currently studying the e ects

may result in di erent up date information b eing transmit-

of loss and out-of-order delivery on TCP and UDP-based

ted to each external p eer.

application s [23]. A number of vendors have develop ed a

The ASPATH attribute present in each BGP announce-

new generation of routers that do not require caching and

ment allows routers to detect, and prevent forwarding loops.

are able to maintain the full routing table in memory on the

We de ne a forwarding lo op as a steady-state cyclic trans-

forwarding hardware. Initial empirical observations suggest

mission of user data b etween a set of p eers. As describ ed ear-

these routers do not exhibit the same pathological loss under

lier, up on receipt of an up date every BGP router p erforms

heavy routing up date load [11].

lo op veri cation by testing if its own autonomous system

Internet routers may exp erience severe CPU load and

numb er already exists in the ASPATH of an incoming up-

1

date. Until recently, many backb one engineers b elieved that

Additional data was supplied byVerio, Inc., ANS CO+RE Sys-

tems, and the statewide networking division of Merit Network, Inc.

the ASPATH mechanism in BGP was sucient to ensure 3

network convergence. A recent study,however, has shown ing increasingly less hierarchical with the rapid addition of

that under certain unconstrained routing p olicies, BGP may new exchange p oints and p eering relationship s [6]. As the

not converge and will sustain p ersistent route oscillations top ological complexity grows, the qualityofInternet address

[21]. aggregation will likely decrease, and the p otential for insta-

Anumb er of solutions have b een prop osed to address the bility will increase as the numb er of globally visible routes

problem of routing instabili ty, including the deploymentof expands. Since commercial and mission critical applications

route damp ening algorithms and the increased use of route are increasingly migrating towards using the Internet as a

aggregation [18, 19, 2]. Aggregation,or supernetting, com- communication medium, it is imp ortant to understand and

bines a numb er of smaller IP pre xes into a single, less sp e- characterize routing instability for proto col design and sys-

ci c route announcement. Aggregation is a p owerful to ol to tem architecture evolution.

combat instability b ecause it can reduce the overall num- The b ehavior and dynamics of Internet routing stability

berofnetworks visible in the core Internet. Aggregation have gone virtually without formal study, with the exception

also hides, or abstracts, information ab out individu al com- of Govindan and Reddy [6], Paxson [17] and Chinoy [3].

p onents of a service provider's networks at the edges of the Chinoy measured the instabili ty of the NSFNet backb one

backb one. Aggregation is successful when there is co op er- in 1993. Unlike the current commercial Internet, the now

ation b etween service providers and well-planned network decommissioned NSFNet had a relatively simple top ology

addressing. Unfortunately, the increasingly comp etitive In- and homogeneous routing technology. Chinoy's analysis did

ternet is sometimes lacking b oth. not fo cus on any of the pathological b ehaviors or trends we

Comp ounding the problem, a rapidly increasing number describ e in this pap er [3].

of end-sites are cho osing to obtain redundant connectivity Paxson studied routing stability from the standp ointof

to the Internet via multiple service providers. This redun- end-to-end p erformance [17]. We approach the analysis from

dant connectivity,ormulti-homing, requires that each core a complimentary direction { by analyzing the internal rout-

Internet router maintain a more sp eci c, or longer, pre x in ing information that will give rise to end-to-end paths. The

addition to any less sp eci c aggregate address blo ck pre xes analysis of this pap er is based on data collected at Internet

covering the multi-homed site. routing exchange p oints. Govidian examined similar data,

Our study shows that more than 25 p ercent of pre xes but fo cused primarily on gross top ological characterizations,

are currently multi-homed and non-aggregatable. Further, such as the growth and top ological rate of change of the In-

we nd that the prevalence of multi-homing exhibits a rel- ternet [6].

atively steep linear rate of growth. This result is consistent

with some of the recent ndings of Govindan and Reddy [6].

4 Analysis of Pathological Routing Information

Route servers provide an additional to ol to help back-

b one op erators cop e with the high levels of Internet routing

In this section, we rst discuss the exp ected b ehavior of

instabili ty. Each router at an exchange p oint normally must

awell-b ehaved inter-domain routing system. We then de-

exchange routing information with every other p eer router.

scrib e the observed b ehavior of Internet routing, and de ne a

2

This requires O (N ) bilateral p eering sessions, where N is

taxonomy for discussing the di erent classi cation s of rout-

the numb er of p eers. Although route servers do not help

ing information. We will demonstrate that much of the b e-

limit the o o d of instability information, they do help ooad

havior of inter-domain routing is pathological and suggests

computationall y complex p eering from individu al routers

widespread, systematic problems in p ortions of the Inter-

onto a centralized route server. This server maintains p eer-

net infrastructure. We distinguish b etween three classes of

ing sessions with each exchange p oint router and p erforms

routing information: forwarding instabili ty, p olicy uctua-

routing table p olicy computations on b ehalf of each client

tion, and pathologic (or redundant) up dates. In this section

p eer. The route server transmits a summary of p ost-p olicy

we fo cus on the characterization of pathological routing in-

routing table changes to each client p eer. Each p eer router

formation. In Section 5, we will discuss long-term trends

then needs only to maintain a single p eering session with

and temp oral b ehavior of b oth forwarding instabili ty and

the route server, reducing the numb er of p eering sessions to

p olicy uctuation.

O (N ).

Although the default-free Internet routing tables cur-

Anumberofvendors have also implemented route damp-

rently contain approximately 45,000 pre xes [8], our study

ening [22] algorithms in their routers. These algorithms

has shown that routers in the Internet core currently ex-

\hold-down", or refuse to b elieve, up dates ab out routes that

change b etween three and six million routing pre x up dates

exceed certain parameters of instability, such as exceeding a

eachday.Onaverage, this accounts for 125 up dates per net-

certain numb er of up dates in an hour. A router will not

work on the Internet every day. More signi cantl y,wehave

pro cess additional up dates for a damp ened route until a

found that the ow of routing up date information tends to

preset p erio d of time has exp erienced. Route damp ening

b e extremely bursty.At times, core Internet routers receive

algorithms, however, are not a panacea. Damp ening algo-

bursts of up dates at a rates exceeding several hundred pre-

rithms can intro duce arti cial connectivity problems, as \le-

x announcements a second. Our data shows that on at

gitimate" announcements ab out a new network may b e de-

least one o ccasion, the total numb er of up dates exchanged

2

layed due to earlier damp ened instabili ty.Anumb er of ISPs

at the Internet core has exceeded 30 million p er day . This

have implemented a more draconian version of enforcing sta-

aggregate rate of instabili ty can place a substantial load on

bilityby either ltering all route announcements longer than

recipient routers as each route may b e matched against a p o-

a given pre x length or refusing to p eer with small service

tentially extensive list of p olicy lters and op erators. The

providers.

current high level of Internet instabili ty is a signi cant prob-

Overall, our research has shown that the Internet con-

lem for all but the most high-end of commercial routers.

tinues to exhibit high levels of routing instabili ty despite

And even high-end routers may exp erience increasing levels

the increased emphasis on aggregation and the aggressive

2

Our data collection infrastructure failed for the day after record-

deployment of route damp ening technology. Further, re-

ing 30 million up dates in a six hour p erio d. The numb er of up dates

cent studies have shown that the Internet top ology is grow-

that daymay actually have b een much higher. 4

of packet loss, delay, and time to reach convergence as the circuits or routers to the scop e of a single autonomous sys-

rate of instabili ty increases. tem.

In this pap er, we analyze sequences of BGP up dates for Unfortunately, p ortions of the Internet address space are

each (pre x, p eer) tuple over the duration of our nine month not well-aggregated and contain considerably more routes

study.Aswe describ e later, the ma jority of BGP up dates than theoretically necessary. Although aggregation of a sin-

from a p eer for a given pre x exhibit a high lo cality of refer- gle site, or campus-level network is relatively straightfor-

ence, usually o ccurring within several minutes of each other. ward, aggregation at a larger scale, includin g across multi-

In these sequences of up dates for a given (pre x, p eer) tuple, ple backb one providers, is considerably more dicult and

we identify vetyp es of successiveevents: requires close co op eration b etween service providers.

Perhaps the largest factor contributing to p o or aggrega-

WADi : A route is explicitly withdrawn as it b ecomes un-

tion is the increasing trend towards multi-homing of cus-

reachable and it is later replaced with an alternative

tomer end-sites [6]. Since the multi-homed customer pre-

route to the same destination. The alternative route

xes require global visibil ity, it is problematic for these ad-

di ers in its ASPATH or nexthop attribute informa-

dresses to b e aggregated into larger sup ernets. In addition,

tion. This is a typ e of forwarding instabili ty.

the lack of hierarchical allo cation of the early, pre-CIDR IP

address space exacerbates the current poor level of aggrega-

AADi : A route is implici tly withdrawn and replaced by

tion. Prior to the intro duction of RFC-1338, most customer

an alternative route as the original route b ecomes un-

sites obtained address space directly from the Internic in-

reachable, or a preferred alternative path b ecomes avail-

stead of from their provider's CIDR blo ck. Similarl y, the

able. This is a typ e of forwarding instabili ty.

technical diculties and asso ciated reluctance of customer

networks to renumb er IP addresses when selecting a new

WADup: A route is explicitl y withdrawn and then rean-

service provider contribute to the numb er of unaggregated

nounced as reachable. This may re ect transient top o-

addresses.

logical (link or router) failure, or it may representa

The sub optimal aggregation of Internet address space

pathological oscillation . This is generated by either

has resulted in large numb er of globally visible addresses.

forwarding instability or pathological b ehavior.

More signi cantl y, many of these globally visible pre xes are

reachable via one or more paths. Wewould exp ect Internet

AADup: A route is implicitl y withdrawn and replaced with

instabili ty to b e prop ortional to the total number of avail-

a duplicate of the original route. We de ne a duplicate

able paths to all of the globally visible network addresses or

route as a subsequent route announcement that do es

aggregates. Analysis of our exp erimentall y collected BGP

not di er in the nexthop or ASPATH attribute infor-

data has revealed signi cantl y more BGP up dates than we

mation. This may re ect pathological b ehavior as a

originall y anticipated. The Internet \default-free" routing

router should only send a BGP up date for a change in

tables currently contain approximately 45,000 pre xes with

top ology or p olicy. Since our initial study only exam-

1,500 unique ASPATHs interconnecting 1,300 di erent au-

ined the attributes re ectiveofinter-domain forward-

tonomous systems [8]. As shown later in this pap er, instabil-

ing path (ASPATH and nexthop), this may also re ect

ityiswell-distributed over destination pre xes, p eer routers,

p olicy uctuation.

and origin autonomous system space. In other words, no

WWDup: The rep eated transmission of BGP withdrawals

single pre x or path dominates the routing statistics or con-

for a pre x that is currently unreachable. This is

tributes a disprop ortionate amount of BGP up dates. Thus,

pathological b ehavior.

wewould exp ect that instability should b e prop ortional to

the 1,500 paths and 45,000 pre xes, or substantially less

4.1 Gross Observations than the three to six million up dates p er daywe currently

observe.

In the remainder of the pap er, we will refer to AADi , WAD-

The ma jority of these millions of unexp ected up dates,

i and WADup as instability.We will refer to WWDup as

however, may not re ect legitimate changes in network to-

pathological instability. AADup may represent either patho-

p ology. Instead, our study has shown that the majority

logical instabili ty or p olicy uctuation. A BGP up date may

of inter-domain routing information consists of pathologi-

contain additional attributes (MED, communities, lo calpref,

cal up dates. Sp eci c examples of these pathologies include:

etc.), but only changes in the (Pre x, NextHop, ASPATH)

rep eated, duplicate withdrawal announcements (WWDup),

tuple will re ect inter-domain top ological changes, or for-

oscillatin g reachability announcements (WADup), and dup-

warding instabili ty. Successive pre x advertisements with

licate path announcements (AADup). Figure 2 shows the

di erences in other attributes may re ect routing p olicy

relative distribution of each class of instabili tyover a seven

changes. For example, a network may announce a route

month p erio d. For the clarity and simpli cati on of the fol-

with a new BGP community. The new community repre-

lowing discussions, wehave excluded WWDup from Figure 2

sents a p olicy change, but may not directly re ect a change

so as not to obscure the salient features of the other data.

in the inter-domain forwarding path of user data.

The breakdown of instability categories shows that b oth the

In principle, the intro duction of classless inter-domain

AADup and WADup classi catio ns consistently dominate

routing (CIDR) [19] has allowed backb one op erators to group

other categories of routing instabili ty. The relative magni-

a large numb er of customer network IP addresses into one

tude of AADup up dates was unexp ected. Closer analysis

or more large \sup ernet" route advertisements at their au-

has shown that the AADup category is dominated by p ol-

tonomous system's b oundaries. A high level of aggregation

icy changes that do not directly a ect forwarding instability

will result in a small numb er of globally visible pre xes,

and will b e the topic of future work. Only a small p or-

and a greater stability in pre xes that are announced. In

tion of the BGP up dates (AADi , WADi ) eachdaymay

general, an autonomous system will maintain a path to an

directly re ect p ossible exogenous network events, suchas

aggregate sup ernet pre x as long as a path to one or more of

router failures and leased line disconnectivi ty. In Section 6,

the comp onent pre xes is available. This e ectively limits

we discuss the impact of the pathological up dates on In-

the visibili ty of instabili ty stemming from unstable customer 5

routing proto cols op erating within an autonomous system. ternet infrastructure. In general, the rep eated transmission

of these pathological up dates is a sub optimal use of critical

Internet infrastructure resources.

4.2 Possible Origins of Routing Pathologies

Our analysis indicates that a small p ortion of the extrane-

Network Announce Withdraw Unique

ous, pathological withdrawals may b e attributable to a sp e-

Provider A 1127 23276 4344

Provider B 0 36776 8424

ci c router vendor's implementation decisions. In particular,

Provider C 32 10 12

one Internet router vendor has made a time-space trade-

Provider D 63 171 28

Provider E 1350 1351 8

o implementation decision in their routers not to main-

Provider F 11 86417 12435

Provider G 2 61780 10659

tain state on the information advertised to the router's BGP

Provider H 21197 77931 14030

p eers. Up on receipt of any top ology change, these routers

Provider I 259 2479023 14112

Provider J 2335 1363 853

will transmit announcements or withdrawals to all BGP

p eers regardless of whether they had previously sent the

p eer an announcement for the route. Withdrawals are sent

Table 1: Partial list of up date totals p er ISP on February 1, 1997

for every explicitly and implicitl y withdrawn pre x. We will

at AADS. This data is representative of daily routing up date to-

tals. These totals should not b e interpreted as p erformance of

subsequently refer to this implementation as stateless BGP.

particular backb one provider. Data may b e more re ectiveofa

At each public exchange p oint, this stateless BGP imple-

provider's customers and the relative quality of address aggrega-

mentation may contribute an additional O (N  U ) up dates

tion.

for each legitimate change in top ology, where N is the num-

b er of p eer routers and U is the numb er of up dates. It is

imp ortant to note that the stateless BGP implementation is

Analysis of nine months of BGP trac indicates that the

compliant with the current IETF BGP standard [12]. Sev-

ma jority of BGP up dates consist entirely of pathological ,

eral pro ducts from other router vendors do maintain knowl-

duplicate withdrawals (WWDup). Most of these WWDup

edge of the information transmitted to BGP p eers and will

withdrawals are transmitted by routers b elonging to au-

only transmit up dates when top ology changes a ect a route

tonomous systems that never previously announced reach-

between the lo cal and p eer routers. After the initial pre-

ability for the withdrawn pre xes. On average, we observe

sentation of our results [10], the vendor resp onsible for the

between 500,000 to 6 million pathological withdrawals p er

stateless BGP implementation up dated their router op er-

day b eing exchanged at the Mae-East exchange p oint. As

ating software to maintain partial state on BGP advertise-

Table 1 illustrates, many of the exchange p oint routers with-

ments. Several ISPs havenow b egun deploying the up dated

draw an order of magnitude more routes then they announce

software on their backb one routers. Preliminary results af-

during a given day.For example, Table 1 shows that ISP-

ter deployment of this new software indicate that it limits

I announced 259 pre xes, but transmitted over 2.4 mil lion

distribution of WWDup up dates. As we describ e b elow,

withdrawals for just 14,112 di erent pre xes.

although the software up date may b e e ective in masking

The 2.4 million up dates illustrates an imp ortant prop erty

WWDup b ehavior, it do es not explain the origins of the

of inter-domain routing { the disprop ortio nate e ect that a

oscillatin g WWDup b ehavior.

single service provider can have on the global routing mesh.

Overall, our study indicates that the stateless BGP im-

Our analysis of the data shows that all pathologicalrouting

plementation by itself contributes an insigni cant number of

incidents were caused by small service providers. We de ne

additional up dates to the global routing mesh. Sp eci call y,

a pathological routing incident as a time when the aggre-

the stateless BGP implementation do es not account for the

gate level of routing instability seen at an exchange p oint

oscillatin g b ehavior of WWDup, and AADup up dates. In

exceeds the normal level of instabili tyby one or more orders

the case of a single-homed customer and a numb er of state-

of magnitude. Further interaction with these providers has

less p eer routers, every legitimate announce-withdrawal se-

revealed several typ es of problems including miscon gured

quence should result in at most O (N ) up dates at the ex-

routers, and faulty new hardware/software in their infras-

change p oint, where N is the numb er of p eers. Instead,

tructure.

empirical evidence suggests that each legitimate withdrawal

Our data also indicates that not all service providers

may induce some typ e of short-lived pathological network

exhibit this pathological b ehavior. Empirical observations

oscillation . Wehave observed that the p ersistence of these

show that there is a strong causal relationshi p b etween the

up dates is b etween one and ve minutes.

manufacturer of a router used by an ISP and the level of

In general, Internet routing instability remains p o orly

pathological BGP b ehavior exhibited by the ISP.For exam-

understo o d and there is no consensus among the research

ple, in a particular case, we observed that b efore a large

and engineering communities on the characterization or sig-

service provider's transition to a backb one infrastructure

ni cance of many of the b ehaviors we observed. Researchers

based on particular router, the service provider exhibited

and the memb ers of the North American Network Op era-

well-b ehaved routing. Immediately following the transition,

tors Group (NANOG) have suggested a numb er of plausi-

the service provider b egan demonstrating pathological b e-

ble explanation s for the p erio dic b ehavior, includin g: CSU

havior similar to b ehaviors describ ed previously.

timer problems, miscon gured interaction of IGP/BGP pro-

Our analysis of the data also indicates that routing up-

to cols, router vendor software bugs, timer problems, and

dates have a regular, sp eci c p erio dicity. Wehave found

self-synchronizati on.

that most of these up dates demonstrate a p erio dicity of ei-

Most Internet leased lines (T1, T3) use a typ e of broad-

ther 30 or 60 seconds, as discussed b elow. We de ne the

band mo dem referred to as a Channel Service Units (CSU).

persistence of instabili ty and pathologies as the duration of

Miscon gured CSUs mayhave clo cks which derive from dif-

time routing information uctuates b efore it stabilizes. Our

ferent sources. The drift b etween two clo ck sources can

data indicate that the p ersistence of most pathological BGP

cause the line to oscillate b etween p erio ds of normal service

b ehaviors are under ve minutes. This short-lived patho-

and corrupted data. Unlike telephone customers, router in-

logical b ehavior suggests some typ e of delayinconvergence

terface cards are sensitive to milliseco nd loss of line carrier

between inter-domain BGP routers, or multiple IGP/EGP 6 800000 AA Different WA Different WA Duplicate 600000 AA Duplicate Uncatogorized

400000

BGP Announcments 200000

0 April May June July August September

Days (March through September 1996)

Figure 2: Breakdown of Mae-East routing up dates from April through Septemb er 1996.

Another plausibl e explanation for the source of the p eri- and will ag the link as down. If these CSU problems are

o dic routing instabilitymay b e the improp er con guration widespread, the resulting link oscillati on may contribute a

of the interaction b etween interior gateway proto cols and signi cantnumb er of the p erio dic BGP route withdrawals

BGP. The injection of routes from IGP proto cols, suchas and announcements we describ e.

OSPF, into BGP, and vice versa, requires a complex, and Another p ossible explanation involves a p opular router

often mishandled , ltering of pre xes. Since the conversion vendor's inclusion of an unjittered 30 second interval timer

between proto cols is lossy, path information (e.g., ASPATH) on BGP's up date pro cessing. Most BGP implementations

is not preserved across proto cols and routers will not b e able use a small, jittered timer to coalesce multiple outb ound

to detect an inter-proto col routing up date oscillati on. This routing up dates into a single BGP up date message in order

typ e of interaction is highly susp ect as most IGP proto cols to reduce proto col pro cessing overhead on the receiving p eer

utilize internal timers based on some multiple of 30 seconds. [11]. The combination of this timer and a stateless BGP im-

We are working closely with router vendors and backb one plementation mayintro duce some unintended side-e ects.

providers on an ongoing analysis of these interactions. Sp eci call y,we examine the sequence of an announcement

As describ ed earlier, Varadhan et al. [21] show that un- for a pre x with ASPATH A1, followed by an announcement

constrained routing p olicies can lead to p ersistent route os- (and subsequent implicit withdrawal for A1) for the pre x

cillations . Only the severely restrictive shortest-path route with ASPATH A2, followed by a re-announcement of the

selection algorithm is provably safe. Since the end of the pre x with ASPATH A1. If the sequence A1,A2,A1 o ccurs

NSFNet, routing p olicies have b een growing in size and within the expiration of the timer interval, the routing soft-

complexity. As the numb er of p eering arrangements and ware may ag the route as changed and transmit a duplicate

the top ological complexity of the Internet continue to grow, route announcement at the end of the interval. A similar

the p otential for developing p ersistent route oscillation in- sequence of events for the availabil i ty of a route, W,A,W,

creases. We note, however, that there have b een no known could account for WWDup b ehavior of some routers. Over-

rep orts to date of p ersistent route oscillation o ccurring in all, the 30 second interval timer may b e acting as an arti -

op erational networks. The evaluation and characterization cial route damp ening mechanism, and as such, the WWDup

of p otentially dangerous unconstrained p olicies remains an and AADup b ehavior may b e masking real instabili ty.We

op en issue currently b eing investigated by several research will discuss the implication and e ects of redundant BGP

groups. up dates and pathological b ehavior more in Section 5.

Unjittered timers in a router may also lead to self syn-

chronization. In [5], Floyd and Jacobson describ e a means

5 Analysis of Instability

by which an initially unsynchronized system of apparently

indep endent routers may inadvertently b ecome synchronized.

In the previous section we explored characteristics of patho-

In the Internet, the unjittered BGP interval timer used on a

logical routing b ehavior. In this section, we fo cus on the

large number of inter-domain b order routers mayintro duce

trends and characteristics of b oth forwarding instabili ty and

aweak coupling b etween those routers through the p erio dic

route p olicy uctuation. The remainder of this discussion

transmission of the BGP up dates. Our analysis suggests

presents routing statistics collected at the Mae-East exchange

that these Internet routers will ful ll the requirements of

p oint. It is imp ortant to note that these results are repre-

the Perio dic Message mo del [5] and may undergo abrupt

sentative of other exchange p oints, includin g PacBell and

synchronizatio n. This synchronization would result in a

Sprint.

large numb er of BGP routers transmitting up dates simulta-

neously. Floyd and Jacobson describ e self-synchronizati on

5.1 Instability Density

b ehavior with Decnet DNA proto col, the Cisco IGRP proto-

col, and the RIP1 proto col on the NSFNet backb one. The

Ignoring attribute changes and pathological trac (AADup

simultaneous transmission of up dates has the p otential to

and WWDup) we examined the remaining BGP up dates for

overwhelm the pro cessing capacity of recipient routers and

anyoverall patterns and trends. Figure 3 represents Internet

lead to p erio dic link or router failures. Wehave discussed

routing instabili ty for a seven month p erio d. This instabil-

the p ossibili ty of self-synchronization with router vendors

ity is measured as the sum of AADi , WADi , and WADup

and are exploring the validity of this conjecture.

up dates seen during the day for seven months. Eachdayis 7

4000

represented byavertical slice of small squares, each of which

represent a ten minute aggregate of instabili ty up dates. The

black squares represent a level of instabili tyabove a certain

3000

threshold; the light-gray squares a level b elow; and the white

squares represent times for which data is not available. Ad-

ditionall y, the horizontal axis has a raised indentation that

ts weekends. The raw data were detrended using

represen 2000

a least-square regression { routing instabili ty increased lin-

early during the seven month p erio d. Moreover, b ecause we ere lo oking for gross trends, the magnitude of the di er-

w 1000

ence b etween minimal and maximal instabili tywas reduced

Number of Instability Events

by examining the logarithm of this detrended data. Figure 3

represents the mo di ed data. The threshold was chosen as

0

tabove the mean of the mo di ed data, and as such

a p oin Saturday Sunday Monday Tuesday Wednesday Thursday Friday

represents a signi cant level of raw up dates that varies de-

p ending on the date. The values for the threshold corre-

Figure 4: Representativeweek of raw forwarding instabil-

sp ond to a raw up date rate from 345 up dates p er 10 minute

ity up dates (August 3 through 9, 1996) aggregated at ten

aggregate in April to 770 up dates in Octob er.

minute intervals.

24:00

week. From the data there app ears to b e a b ell-shap ed curve

of raw up dates that p eaks during the afterno on. Similarl y,

18:00

there is relatively little instability during the weekend. The

exception is Saturday's spike. Saturdays often have high

amounts of temp orally lo calized instabili ty.Wehavenoim- mediate explanation for this o ccurrence.

12:00

A more rigorous approach to identifying temp oral trends

in the the routing up dates was undertaken using time series

Time of Day (EST)

analysis. Sp eci cally, the mo di ed data represented in g-

ere analyzed using sp ectrum analysis. The data from

06:00 ure 3 w

August through September were used due to their complete-

ness. Again, these detrended data were ideal for harmonic

analysis having b een ltered in a manner similar to the treat-

00:00

t of Beverage's wheat prices by Blo om eld in [1]. The

April May June July August September October men

rate of routing up dates is mo deled as x = T I , where T is

t t t t

the trend at time t and I is an irregular or oscillating term.

t

Figure 3: Internet forwarding instability density measured

Since all three terms are strictly p ositive, we conclude that

at the Mae-East exchange p oint during 1996.

logx = logT + logI . T can b e assumed as some value of x

t t t t

near time t, and I some dimensionless quantity close to 1;

t

Figure 3 shows several interesting phenomena. The b ot-

hence logI oscillates ab out 0. This avoids adding frequency

t

tom of the graph represents midnight EST for each given

biases that can b e intro duced due to linear ltering.

day. Notice that during the hours of midnight EST (9:00pm

PST) to 6:00am EST there are signi cantl y fewer up dates

10

than during the rest of the day; the up dates app ear to b e

FFT

viest during North American network usage hours. In hea MEM

7 Days

particular, from no on to midnight are the densest hours.

24 Hours

The second ma jor trend is represented byvertical strip es

of less instabili ty (light gray) that corresp ond to weekends.

Perhaps the most striking visual pattern that emerges from

ertical lines at the end of May and

the graph are the b old v 1

b eginning of June. These represent the state of the Internet

during a ma jor ISP's infrastructure upgrade. Some networks

Power Spectrum Density

exp erienced esp ecially high levels of congestion, disconnec-

tivity, and latency during this p erio d. Another interesting

pattern is the horizontal line of dense up dates at approx-

imately 10:00am (7:00am PST). This line represents large 0

0.00 0.10 0.20 0.30 0.40 0.50

spikes of raw up dates that are consistently measured. A

Frequency (1/hour units)

plausible explanation for this lo calized density is that this

time may corresp ond to backb one maintenance windows.

Figure 5: Results from time series analysis of the Internet

Finally, notice that the up dates measured during June, July

forwarding instabili ty up dates measured at the Mae-East

and early August from ab out 5:00pm to midnight are sparser

exchange p oint during August and Septemb er 1996 using

than those times in May and late August and Septemb er.

hourly aggregates.

This may represent summer vacation at most of the educa-

tional hosts in the Internet, and re ects a pattern closer to

Figure 5 shows a correlogram of the data generated by

the usage of business.

two techniques: a traditional fast Fourier transform (FFT)

The week of routing up dates represented in gure 4 pro-

of the auto correlation function of the data; and maximum-

vides a representative display of the general trends over a 8

entropy (MEM) sp ectral estimation. These two approaches not a correlation b etween the size of an AS, and its share of

di er in their estimation metho ds, and provide a mechanism the up date statistics.

for validation of results. They b oth nd signi cant frequen- The Internet routing tables are dominated by six to eight

cies at seven days, and 24 hours. These con rm the visual ISPs. These ISPs represent the clusters of p oints highlighted

trends identi ed in gures 3 and 4. in gure 6a. Over the course of the month, their share of

It is somewhat surprising that the measured routing in- the default-free routing tables did not change signi cantl y.

stability corresp onds so closely to the trends seen in Internet Over the course of our analysis no single ISP consistently

bandwidth usage [15] and packet loss. As to the causality contributes disprop ortio natel y to the measured instability

of these phenomena, we can only o er supp osition s. With a in all three categories. The exception, shown in the gures,

high level of packet loss and a signi cant rate of BGP up- is ISP-E which during August was going through an infras-

dates, keep-alive messages can b ecome delayed long enough tructure transition. While it is not characteristic of ISP-E 's

to drop BGP connections b etween p eering routers. The sp e- b ehavior for every month, it was characteristic of our analy-

ci c levels of up date load and congestion necessary to sever sis that at least one of the ma jor ISPs was going through an

these connections vary dep ending on the routing technol- infrastructure change at any given p oint in time. Some au-

ogy in place. Once a BGP connection is severed, all of the tonomous systems always represent a somewhat larger share

p eer's routes are withdrawn. An alternate explanation is of instabili ty, but this may b e explained by a large number

that this cycle is due to Internet engineering activity that of factors. For example, ISP-A provides connectivitytoa

o ccurs within a business day.However, the data seem to large numberofinternational networks; ISP-B is a relatively

indicate that a signi cant level of instabili ty remains until new ISP that has a muchyounger customer base and has

late evening, correlating more with Internet usage than engi- b een able to provide address space from under its own set

neering maintenance hours. While the relationshi p b etween of aggregated CIDR blo cks, p erhaps hiding internal insta-

network usage and routing instabili tymay seem intuitively bility through b etter aggregation. Additional factors that

obvious to some, a more rigorous justi cation is problem- can skew ISP b ehavior include: customer b ehavior, routing

atic due to the size and heterogeneity of the Internet. We p olicies, and quality of aggregation.

are continuing to investigate this relationship in our current Wenow fo cus on the instability on a p er-route basis.

work [8]. Sp eci call y,we lo ok at the instabili ty measured at the Mae-

East exchange p oint during August for (pre x, AS-p eer)

pairs, or Pre x+AS. A Pre x+AS represents a set of routes

5.2 Fine-grained Instability Statistics

that an AS announces for a given destination. It is more

Having examined aggregate instabili ty statistics, wenow an-

sp eci c than a pre x since the same pre x could b e reached

alyze the data at a ner granularity: autonomous system

through several ASes; and more general than a route which

and route contributions. To simplify the following presenta-

uniquely sp eci es the ASPATH. By aggregating routing up-

tion, we fo cus on a single month of instabili ty, August 1996,

dates on Pre x+AS pairs, we can pinp oint several rout-

measured at the Mae-East exchange p oint. This month was

ing up date phenomena: up dates that oscillate over several

chosen since it typi es the results seen at the other exchange

routes for a given pre x; AS contribution for given pre x;

p oints across our measurements. Sp eci call y,we show that:

and pre x b ehavior.

Figure 7 shows the cumulative distributio n of Pre x+AS

 No single autonomous system consistently dominates

instabili ty for the four BGP announcement categories. In

the instability statistics.

all four graphs, the horizontal axes represent the number

of Pre x+AS pairs that exhibited a sp eci c numb er of BGP

 There is not a correlation b etween the size of an AS

instabili tyevents; the vertical axes show the cumulative pro-

(measured at the public exchange p oint as the num-

p ortion of all suchevents. The graphs contain lines that

b er of routes which it announces to non-customer and

represent daily cumulative distribution s for August 1996.

non-transit p eers) and its prop ortion of the instability

Examining these graphs, one can see that from 80 to 100

statistics.

p ercent of the daily instabili ty is contributed by Pre x+AS

pairs announced less than fty times. For example, gure 7a

 A small set of paths or pre xes do not dominate the

shows that dep ending on the day, from 20 to 90 p ercent

instability statistics; instabili tyisevenly distributed

(median of approximately 75%) of the AADi events are

across routes.

contributed by routes that changed ten times or less. To-

gether, these graphs show that no single route consistently

The graphs in gure 6 break down the routing up dates

dominates the instabili ty measured at the exchange p oint.

seen during August measured in each of the route server's

However, there are days where a single Pre x+AS pair con-

p eers. Three up date categories (AADi , WADi , and WA-

tributes substantiall y, such as August 11, a day where sev-

Dup) are shown where p oints represent the normalized num-

eral pre x+AS pairs contributed ab out 40% of the daily ag-

b er of up dates announced by a p eer on a sp eci c day. That

gregate AADi s, graphicall y displayed as the lowest curve

is, there is a p oint for every p eer for every day in August.

in gure 7a. Sp eci call y, in this example, ISP-A announced

The horizontal axes show the prop ortion of the Internet's

seven routes eachbetween 630 and 650 times. These same

default-free routing table for which the p eer is resp onsible

seven routes had an equal amount of AADups that day and

on a sp eci c day; the vertical axes signify the prop ortion of

also account for the low curve in gure 7c. Moreover, there

that day's route up dates that the p eer generated. The diago-

are zero withdrawals on these seven pre xes.

nal represents the break-even p oints: where a p eer generates

When comparing the four typ es of routing up dates in g-

a prop ortion of announcements equal to its resp onsibil i ty for

ure 7, one can see that WADi climbs to a plateau of ab out

routes in the routing table. If routing up dates were equally

95% faster than the other three categories. WADi also has

distributed across all routes, wewould exp ect to see au-

the fewest numb er of Pre x+AS pairs that dominate their

tonomous systems generating them at a rate equal to their

days. In fact, there are very few days where a Pre x+AS has

share of the routing table. Generally,we do not see that:

more than 100 WADi events. Similarl y, there are very few

few days cluster ab out the line which indicates that there is 9 1.0 1.0 1.0

A 0.8 0.8 0.8

0.6 E 0.6 0.6

0.4 0.4 0.4 Proportion of Announcements Others

0.2 Proportion of WADUP Announcements C Proportion of WADIFF Announcements 0.2 0.2 B F D

0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

Proportion of Routing Table Proportion of Routing Table Proportion of Routing Table

(a) AS AADi Contribution

(b) AS WADi Contribution (c) AS WADup Contribution

Figure 6: AS contribution to routing up dates measured at the Mae-East exchange p oint during August 1996. These graphs

measure the relative level of routing up dates generated by backb one providers. This data do es not represent relative p erfor-

mance of ISPs, and may b e more re ective of customer instabili ty and address allo cation p olicies.

1.0 1.0 1.0 1.0

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 Cumulative proportion Cumulative Proportion Cumulative Proportion Cumulative Proportion

0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0 1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000

PrefixAS Announcements Prefix+AS Announcements Prefix+AS Announcements Prefix+AS Announcements

(a) AADi (b) WADi (c) AADup (d) WADup

Figure 7: Cumulative distribution of Pre x+AS routing up dates measured at the Mae-East exchange p oint during August

1996. Each line in a graph represents the up date distributio n for a single day.

to examine the high frequency comp onents. Our results days where a Pre x+AS sees more than 200 AADi events.

are shown in gure 8. The graphs in gure 8 represents Taken together, this information is comforting since these

a histogram distributio n for each of the four instabili ty cat- categories p erhaps b est represent actual top ological insta-

egories. The graphs' horizontal axes mark the histogram bility. In contrast, the categories than may represent re-

bins in a log-time scale that ranges from one second (1s) dundant instability information, AADup and WADup, b oth

to one day(24h); the vertical axes show the prop ortion of have a signi cantnumber of days where from 5% to 10%

up dates contained in the histogram bins. The data shown of their events come from Pre x+AS pairs that o ccur 200

in these graphs take the form of a mo di ed b ox plot: the times or more. An investigation of instability aggregated on

black dot represents the median prop ortion for all the days pre x alone generated results similar to those shown in this

for eachevent bin; the vertical line b elow the dot contains section and have b een omitted.

the rst quartile of daily prop ortions for the bin; and the

line ab ove the dot represents the fourth quartile.

5.3 Temp oral Prop erties of Instability Statistics

As illustrated gure 8, the predominant frequencies in

We next turn our attention to the temp oral prop erties of

each of the graphs are captured by the thirty second and

Internet routing instability. Section 5.1 describ ed the aggre-

one minute bins. The fact that these frequencies account

gate temp oral b ehavior and identi ed the weekly and daily

for half of the measured statistics was surprising. Normally

frequencies. Here weinvestigate the frequency distributions

one would exp ect a exp onential distributio n for the inter-

for instabili tyevents at the Pre x+AS level. Again our anal-

arrival time of routing up dates as they might re ect exoge-

ysis lo oks at the statistics from August 1996 measured at

nous events, suchaspower outages, b er cuts and other

the Mae-East exchange p oint. For this analysis, we de ne a

natural and human events. The thirty second p erio dicity

routing up date's frequency as the inverse of the inter-arrival

suggests some wide-spread, systematic in uence in the ori-

time b etween routing up dates; a high frequency corresp onds

gin, or on the ow of instability information. There are

to a short inter-arrival time.

several p ossible causes for this p erio dicity including rout-

Wewere particularl y interested in the high frequency

ing software timers, self synchronization, and routing lo ops.

comp onent of routing instability in our analysis. Other work

The presence of these frequencies in the more legitimate in-

has b een able to capture the lower frequencies through b oth

stability categories, suchasWADi and AADi almost cer-

routing table snapshots [6] and end-to-end techniques [17].

tainly represents some pathology whichmay b e caused by

Our measurement apparatus allowed a unique opp ortunity

CSU handshaking timeouts on leased lines or a aw in the 10 0.4

0.4 0.3

0.2 0.2

0.2 Proportion of Events Proportion of Events Proportion of Events 0.1

0.0 0.0 0.0 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h 1s 5s 30s 1m 5m 10m 30m 1h 2h 4h 8h 24h

Histogram Bins Histogram Bins Histogram Bins

(a) AADi (b) WADi (c) WADup

Figure 8: Histogram distribution of up date inter-arrival time distances for Pre x+AS instabili ty measured at the Mae-East

exchange p oint during August 1996.

resources. routing proto cols.

Our analysis of the data showed that instabili tyiswell

distributed across b oth autonomous systems and pre x space.

6 Impact of Routing Instability and Conclusion

More succinctly, no single service provider or set of network

destinations app ears to b e at fault. We describ ed a strong

As we describ ed earlier, forwarding instability can havea

correlation b etween the version and manufacturer of a router

signi cant deleterious impact on the Internet infrastructure.

used by an ISP and the level of pathological b ehavior exhib-

Instability that re ects real top ological changes can lead

ited by that ISP. As noted earlier, router vendors resp onded

to increased packet loss, delay in time for network conver-

to our nding, and develop ed software up dates to limit sev-

gence, and additional memory/CPU overhead on routers.

eral pathologies. Up dated software is now actively b eing

In the currentInternet, network op erators routinely rep ort

deployed by backb one op erators. Preliminary results indi-

backb one outages and other signi cant network problems

cate that it will b e successful in limiting the ow of some

directly related to the o ccurrence of route aps [14].

pathologies, particularl y those involving WWDup up dates.

Our analysis in this pap er demonstrated that the ma-

We also showed that instabili ty and redundant infor-

jority (99 p ercent) of routing information is pathological

mation exhibit strong temp oral prop erties. We describ e a

and may not re ect real network top ological changes. We

strong correlation b etween the level of routing activity and

de ned a taxonomy for discussing routing information and

network usage. The magnitude of routing information ex-

suggested a numb er of plausibl e explanations that may ac-

hibits the same signi cantweekly, daily and holiday cycles

count for some of the anomalous b ehaviors. Router vendors

as network usage and congestion. Although the relation

and ISPs are currently pro ceeding with the deploymentof

between instabili ty and congestion may seem intuitive, a

up dated routing software to correct some of the p otential

formal explanation for this relationshi p is more dicult.

problems we describ ed.

Instability and redundant routing information also ex-

Since pathological , or redundant, routing information

hibit a strong p erio dici ty. Sp eci call y,we describ ed 30 and

do es not a ect a router's forwarding tables or cache, the

60 second p erio dicity in b oth instability and redundant BGP

overall impact of this phenomena may b e relatively b enign

information. We o ered a numb er of plausibl e explanations

and may not substantially impact a router's p erformance.

for this phenomena, including: self-synchronizati on, miscon-

Most of the pathological up dates will b e quickly discarded

guration of IGP/BGP interactions, router software prob-

by routers and will not undergo p olicy evaluation. More im-

lems, and CSU link oscillation . The origins of this p erio dic

p ortantly, these pathological up dates will not trigger router

phenomena, however, remain an op en question.

cache churn and the resultant cache misses and subsequent

If we ignore the impact of redundant up dates and other

packet loss.

pathological b ehaviors, Figure 9 shows that most (80 p er-

Anumb er of network op erators, however, b elieve that

cent) of Internet routes exhibit a relatively high level of sta-

the the sheer volume of pathological up dates may still b e

bility. Only b etween 3 and 10 p ercent of routes exhibit one

problematic [16]. Even pathological up dates require some

or more WADi p er day, and b etween 5 and 20 p ercent ex-

minimal router resources, including CPU, bu ers and the

hibit one or more AADi eachday. This conforms with

exp ense of marshaling pathological pre x data into b oth in-

empirical observations by most end-users that the Internet

b ound and outb ound packets. Informal exp eriments with

usually seems to work. Our data also agrees with Paxson's

several p opular routers suggest that suciently high rates

ndings that only a very small fraction of routes exhibit

of pathological up dates (e.g. 300 up dates p er second) are

some typ e of top ological instability eachday[17].

enough to crash a widely deployed, high-end mo del of com-

One of our diculties in evaluating the impact of insta-

mercial Internet router. We de ne crash as a state in which

bilityonInternet p erformance is that wehave not yet fully

the router is completely unresp onsive and do es not resp ond

b een able to characterize and understand the signi cance of

to future routing proto col messages, or console interrupts.

the di erent classes of routing information. Figure 9 shows

Other studies have rep orted high CPU consumption and

that b etween 35 and 100 p ercent (50 p ercent median) of

loss of p eering sessions at mo derate rates of routing insta-

pre x+AS tuples are involved in at least one category of

bility. Although our analysis of the impact of redundant

routing up date (p olicy uctuation, forwarding instabili ty,

information on Internet p erformance is still ongoing, webe-

pathological information) eachday. Sp eci call y,we do not

lieve pathological up dates are a sub optimal use of Internet 11 1.0

Any

CM SIGCOMM '92, pp. 40-52, Baltimore, MD, Au- AADUP A

WADUP gust, 1992. 0.8 AADIFF

WADIFF

[5] S. Floyd, and V. Jacobson, \The Synchronizati on of

Perio dic Routing Messages," IEEE/ACM Transactions

0.6

on Networking, V.2 N.2, p. 122-136, April 1994.

[6] R. Govindan and A. Reddy, \An Analysis of Inter-

0.4

Domain Top ology and Route Stability," in Proceedings

of the IEEE INFOCOM '97, Kob e, Japan. April 1997.

0.2

[7] J. Honig, D. Katz, M. Mathis, Y. Rekhter, and J. Yu,

Proportion of Routes Experiencing Event

\Applicatio n of the Border Gateway Proto col in the

ternet," RFC-1164, June 1990. 0.0 In

April May June July August September

[8] Internet Performance Measurement and Analysis

pro ject (IPMA), http://www.merit.edu/ipma.

Figure 9: Prop ortion of Internet Routes a ected by routing

up dates (1996). Days shown have at least 80 p ercent of the

[9] C. Lab ovitz, \Multithreaded Routing To olkit," Merit

date's data collected.

Technical Rep ort, 1996.

[10] C. Lab ovitz, NANOG presentation, Washington, D.C.,

know what p ercentage of redundant up dates may actually

May 1996.

b e re ective of \legitimate" changes in forwarding informa-

tion. As we describ ed earlier, some of our analysis suggests

[11] Dave O'Leary, Cisco Systems, Inc. Private communica-

that a p ortion of the AADup and WWDup b ehaviors may

tion, January 1997.

originate in the interaction b etween forwarding instability

[12] K. Lougheed and Y. Rekhter, \A Border Gateway Pro-

and the 30 second interval timer on some routers. If this

to col (BGP),", RFC-1163 June 1990.

is the case, then some p ortion of pathological b ehavior may

re ect legitimate top ological changes.

[13] B. Metcalf, \Predicting the Internet's Catastrophic

By directly measuring the BGP information shared by

Collapse and Ghost Sites Galore in 1996," InfoWorld,

Internet Service Providers at several ma jor exchange p oints,

Decemb er 4, 1995.

this pap er identi ed several imp ortant trends and anomalies

in inter-domain routing b ehavior. This work in conjunction

[14] Merit Joint Technical Sta mail archives,

with several other research e orts has b egun to examine

http://www.merit.edu/mjts/msg00078.html.

inter-domain routing through exp erimental measurements.

[15] MFS Communications Mae-East Statistics Page, These research e orts help characterize the e ect of added

http://www.mfst.com/MAE/east.stats.html. top ological complexity in the Internet since the end of of the

NSFNet backb one. Further studies are crucial for gaining

[16] North American Network Op erators Group,

insightinto routing b ehavior and network p erformance so

http://www.nanog.org.

that a rational growth of the Internet can b e sustained.

[17] V. Paxson, \End-to-End Routing Behavior in the Inter-

Acknowledgments

net," in Proceedings of the ACM SIGCOMM '96, Stan-

ford, C.A., August 1996.

We wish to thank Vadim Antonov, Hans-Werner Braun,

[18] Y. Rekhter, \Scalable Supp ort for Multi-homed Multi-

Randy Bush, Kim Cla y,Paul Ferguson, Ramesh Govin-

Provider Connectivity," NANOG, Ann Arb or, MI. Oc-

dan, Sue Hares, John Hawkinson, Tony Li, Dave O'Leary,

tob er 1996.

Dave Meyer, Yakov Rekhter, Brian Renaud, Dave Thaler,

Curtis Villamizar, and David Ward for their comments and

[19] Y. Rekhter and C. Top olcic, \Exchanging Routing In-

helpful insights. We also thank the anonymous referees for

formation Across Provider Boundaries in the CIDR En-

their feedback and constructive criticism.

vironment," RFC-1520. Septemb er 1993.

References [20] Routing Arbiter web pages, http://www.ra.net.

[21] K. Varadhan, R. Govindan, and D. Estrin, \Persis-

[1] P. Blo om eld, \Fourier Analysis of Time Series: An

tent Routing Oscillations in Inter-Domain Routing,"

Intro duction," John Wiley & Sons, New York. 1976.

USC/ISI, Available at the Routing Arbiter pro ject's

[2] H.-W. Braun, P.S.Ford and Y. Rekhter, \CIDR and

homepage at USC/ISI.

the Evolution of the Internet," SDSC Rep ort GA-

[22] C. Villamizer, R. Chandra, and R. Govindan, \draft-

A21364, in Proceedings of INET'93, Republished in

ietf-idr-route-damp en-00-p revi ew", Internet Engineer-

ConneXions Sep 1993 (InterOp93 version).

ing Task Force Draft, July 21, 1995.

[3] B. Chinoy, \Dynamics of Internet Routing Informa-

[23] C. Villamizer, \TCP Resp onse Under Loss Conditions",

tion," in Proceedings of ACM SIGCOMM '93, pp. 45-

NANOG Presentation, San Francisco, February 1997.

52, Septemb er 1993.

[4] D. Estrin, Y. Rekhter, and S. Hotz, \A Scalable Inter-

domain Routing Architecture," in Proceedings of the 12