<<

Characterizing the Hierarchy from Multiple Vantage Points

Lakshminarayanan Subramanian, Sharad Agarwal, Jennifer Rexford, Randy H. Katz

Report No. UCB/CSD-1-1151 August 2001

Computer Science Division (EECS) University of California Berkeley, California 94720 This work was supported by DARPA Contract No. N00014-99- C-0322. 1

Abstract— The delivery of IP traffic through the Internet rest of the Internet. The Internet topology alone does not depends on the complex interactions between thousands of provide enough information to answer these questions. For Autonomous Systems (ASes) that exchange informa- example, suppose that AS B connects to two providers, AS tion using the (BGP). This paper A and AS C. An AS graph would show connectivity from investigates the topological structure of the Internet in terms A to B and from B to C; however, AS B’s routing policies of customer-provider and peer-peer relationships between ASes, as manifested in BGP routing policies. We describe would not permit transit traffic between A and C. a technique for inferring AS relationships by exploiting par- In the absence of a global registry, the AS-level struc- tial views of the AS graph available from different vantage ture of the Internet is typically inferred from analysis of points. Next we apply the technique to a collection of ten routing data. Previous work has focused on constructing a BGP routing tables to infer the relationships between neigh- view of the AS graph from traceroute experiments or in- boring ASes. Based on these results, we analyze the hier- dividual BGP table dumps. Traceroute provides a view of archical structure of the Internet and propose a five-level the path from a source to a destination host at the IP-level. classification of ASes. Our analysis differs from previous characterization studies by focusing on the commercial rela- The traceroute data must be analyzed to infer which inter- tionships between ASes rather than simply the connectivity faces belong to the same and which routers belong between the nodes. to the same AS [6]. Running experiments between multi- ple source-destination pairs provides a larger collection of paths over time [6], [7], [8]. Other studies have extracted I. INTRODUCTION AS paths directly from BGP routing tables or BGP update Today’s Internet is divided into more than 10,000 Au- messages [9], [10]. The dump from the Uni- tonomous Systems (ASes) that interact to coordinate the versity of Oregon RouteViews server [11], [12] has been delivery of IP traffic. An AS typically falls under the ad- the basis of several studies of basic topological properties, ministrative control of a single institution, such as a univer- such as the distribution of node degrees [13], [14]. With sity, company, or Internet Service Provider (ISP). Neigh- the exception of the work in [10], these studies have fo- boring ASes use the Border Gateway Protocol (BGP) [1], cused on the topological structure without considering the [2] to exchange information about how to reach individual relationship between neighboring ASes. [10] presents a blocks of destination IP addresses (or, prefixes). An AS heuristic for inferring the relationships from a collection of applies local policies to select the best route for each prefix AS paths and evaluates the technique on the RouteViews and to decide whether to propagate this route to neighbor- data. ing ASes, without divulging these policies or the AS’s in- In this paper, we propose a technique for combining data ternal topology to others. In practice, BGP policies reflect from multiple vantage points in the Internet to construct a the commercial relationships between neighboring ASes. more complete view of the topology and the AS relation- AS pairs typically have a customer-provider or peer-peer ships. Each vantage point offers a partial view of the Inter- relationship [3], [4]. A provider provides connectivity to net topology as viewed from the source node. Due to the the Internet as a service to its customers, whereas peers presence of complex routing policies, these partial views provide connectivity between their respective customers. are not necessarily shortest-path trees and may, in fact, in- AS relationships, and the associated routing policies, clude cycles. We generate a directed AS-level graph from have a significant impact on how traffic flows through the each vantage point and assign a rank to each AS based Internet. An understanding of the structure of the Internet on its position. Then, each AS is represented by the vector in terms of these relationships facilitates a wide range of that contains its rank from each of the routing table dumps. important applications. For example, consider a content Finally, we infer the relationship between two ASes by distribution company that has a choice of placing replicas comparing their vectors. The work we describe in this pa- of a Web site in data centers hosted by different ASes. The per is novel in two ways. First, we analyze AS paths seen company can identify the IP prefixes and ASes responsible from multiple locations to form a more complete view of for a large portion of the traffic from the site [5]. With an the graph. Second, rather than simply combining the data accurate view of the connectivity and relationship between from the various vantage points, we propose a methodol- ASes, the company can identify the best locations for its ogy for exploiting the uniqueness of each view to infer the replicas. As another example, consider a new regional relationships between AS pairs. ISP that wants to connect to a small number of upstream We evaluate our technique on a collection of ten BGP providers. An accurate view of the AS topology and rela- routing tables and summarize the characteristics of the AS tionships between ASes can help the ISP determine which relationships. To validate the inferences, we check for ASes would provide the best connectivity to and from the paths that are not consistent with the routing policy as- 2 sumptions underlying customer-provider and peer-peer re- Although ASes typically follow these guidelines, some lationships. We show that these cases account for a small ASes have more complicated relationships in practice. For proportion of the paths and that the most common incon- example, two ASes operated by the same institution may sistencies may stem from misconfiguration or more com- have a sibling relationship where each AS provides tran- plex AS relationships. Then, we analyze the resulting AS sit service for the other [10]. Other AS pairs may have graph to characterize the hierarchical structure of the Inter- backup relationships to provide connectivity in the event net. We present a five-level classification of ASes with a of a failure [15]. Alternatively, two ASes may peer indi- top-most layer that consists of a rich set of peer-peer rela- rectly through a transit AS [16]. Also, an AS pair may

tionships between ¾¼ so-called tier-1 providers. This clas- have different relationships for certain blocks of IP ad- sification can aid an institution in selecting or reevaluating dresses; for example, an AS in Europe may be a customer its connections to other ASes in the Internet. of an AS in the United States for some destinations and a peer for others. Router misconfiguration may also cause II. PROBLEM FORMULATION violations in the export rules. For example, a customer may mistakenly export advertisements learned from one In this section, we formulate the problem we are try- provider to another. We initially assume that only a small ing to solve. We first present a brief overview of AS re- fraction of the AS pairs represent exceptions to the tradi- lationships and their implications on BGP export policies. tional provider-customer and peer-peer relationships. Our Then we formally define the Type of Relationship (ToR) inference technique is designed to tolerate occasional ex- problem for finding an assignment of AS relationships that ceptions and, in fact, our algorithm can be used to identify maximizes the number of paths that adhere to the export AS pairs that have unusual relationships. restrictions. B. Type-of-Relationship (ToR) Problem A. AS Relationships and BGP Export Policies BGP export policies have a direct influence on the AS The relationships between ASes arise from contracts paths seen from a particular vantage point in the Inter- that define the pricing model and the exchange of traf- net. If every AS adheres to the customer and provider fic between domains. ASes typically have a provider- export rules, then no path would ever traverse a customer- customer or peer-peer relationship [3], [4]. In a provider- provider edge after traversing a provider-customer or peer- customer relationship, the customer is typically a smaller peer edge, and no path would ever traverse more than one AS that pays a larger AS for access to the rest of the Inter- peer-peer edge [10], [15]. To formulate these properties in net. The provider may, in turn, be a customer of an even mathematical terms, we denote an edge from a customer

larger AS. In a peer-to-peer relationship, the two peers are ½ to a provider with a , an edge from one peer to another typically of comparable size and find it mutually advan- with a ¼, and edge from a provider to a customer with a tageous to exchange traffic between their respective cus- ·½. Restating a result from [10] in these terms, we have: tomers. These relationships translate directly into policies for exporting route advertisements via BGP sessions with Theorem 1: If every AS obeys the customer, peer, and

neighboring ASes: provider export policies, then every advertised path be-

 ¼ longs to one of these two types for some Å; Æ :

¯ Exporting to a provider: In exchanging routing infor-

½ ::: ·½ ::: 1. Type 1: N times M times.

mation with a provider, an AS can export its routes and

½ ::: ¼·½::: 2. Type 2: N times M times. routes of its customers, but cannot export routes learned

from other providers or peers. The first stage of a Type ½ path contains only customer-

¯ Exporting to a peer: In exchanging routing information provider links (uphill portion) and the second stage con- with a peer, an AS can export its routes and the routes of tains only provider-customer links (downhill portion). The its customers, but cannot export routes learned from other second type captures all paths which traverse exactly one providers or peers. link. The single peering link must appear in be-

¯ Exporting to a customer: An AS can export its routes, tween the uphill and the downhill portions of any path. routes of its customers, and routes learned from other The type-of-relationship (ToR) problem can be formu-

providers and peers to its customer. lated as a graph theory optimization problem for labeling

½ ¼ ½ Each BGP session defines a relationship between the two the edges of the graph with a , ,or such that the ASes it connects. Although two ASes may have multi- observed paths obey the export policies implied by the re-

ple BGP sessions, the relationship between the two ASes lationships. Given a graph G with each edge labeled as

½ ¼ ·½ Ô should be uniquely defined. , or ,apath is said to be valid if it is either of

3 ¾ Type ½ or Type : this property by successively pruning the leaf nodes and

ToR Problem: Given an undirected graph G with vertex assigning ranks to ASes as we prune.

E È

set Î and edge set and a set of paths , label the edges Still, identifying the boundary point between the uphill

½ ¼ ·½ in E as either , or to maximize the number of valid and downhill portions of a path is tricky. The structure of

paths in È . the partial view of the AS graph depends on the position

The graph G represents the entire Internet topology where of the AS in the Internet hierarchy. When viewed from a each node is an AS and each undirected edge represents tier-1 AS that does not have any upstream providers, every a relationship between the incident pair of ASes. The set path consists of zero or one peer-peer edges followed by

of paths È consists of all paths seen from the various van- a downhill portion. In practice, we expect the provider-

tage points. We believe that this problem is NP-complete. customer relationship to be acyclic [16]. That is, if Ù is a

Ú Û Û However, we have not been able to prove that this problem customer of Ú and is a customer of ,then is not a is NP-complete nor are we aware of any theoretical work customer of Ù. Hence, the partial view from a tier-1 AS that provides a polynomial-time solution. would tend to be acyclic. In this case, successive pruning In previous work, Lixin Gao [10] proposes and evalu- would identify provider-customer relationships. However,

ates a heuristic for inferring AS relationships from a col- in other scenarios, the graph may have cycles. For exam-

X Ô = ´X ; A; B ; C µ

lection of AS paths. For each AS path, the heuristic uses ple, suppose source has two paths ½

Ô = ´X ; B ; A; D µ

the degree of the nodes to identify the point that marks the and ¾ . The resulting graph has a cycle B

boundary between the uphill and downhill portions of the between nodes A and . As such, it is difficult to infer the

A B path. The inferences from multiple paths are later com- relationships between X , ,and . We exploit this obser- bined to infer the relationship between the ASes. In the vation in our algorithm by assigning the same rank to all of next section, we propose a heuristic for solving the ToR the ASes in the connected component of the graph. Infor- problem based on the paths seen from each vantage point. mation from other vantage points is necessary to construct an inference for these ASes. III. INFERRING AS RELATIONSHIPS In practice, the Internet consists of a relatively small number of large Internet Service Providers (ISPs) and a This section presents our algorithm for inferring AS re- large number of smaller ASes. A small AS must traverse lationships. We describe the properties of the partial AS one or more upstream providers to reach most of the many graph observed from a single vantage point. This moti- other small ASes. As such, a large portion of the paths vates our algorithm for assigning a rank to each AS for in a graph should consist mostly of roughly equal down- each of the partial views. Finally, we describe how to infer hill and uphill portions of non-zero lengths. Thus, we ex- the relationships between ASes based on their ranks in the pect a large portion of the edges in the graph to fall in different views. a large, acyclic portion consisting of provider-customer edges. The remaining edges should fall into a connected A. Partial View of the AS Graph component near the source node. Our heuristic exploits Routing data from a single vantage point provides a set this property by making a loose association of AS rank of paths from a particular source node. These paths can be with the provider-customer relationship and using proba-

used to construct a directed graph that includes the edge bilistic comparisons to resolve incorrect inferences.

Ù; Ú µ Ù ´ if one or more paths travel directly from AS to B. AS Ranking

AS Ú . If every AS employed a simple shortest-path rout- ing policy, then this graph would be a shortest-path tree Our algorithm assigns a rank to each AS for each van-

rooted at the source. However, complex routing policies tage point. Let X denote the source AS of a particular

´X µ

result in more complicated graphs that may include non- view of the AS graph and let È denote the set of AS

Ô ¾ È ´X µ minimal paths and even cycles. The graph from a single paths seen from X . Each path consists of a se-

vantage point reveals a great deal of information about the quence of nodes, starting with X . We construct a directed G

relationship between ASes. Identifying the boundary point graph X that consists of each edge that appears in one or

È ´X µ Ú ´G µ

between the uphill and downhill portions of the paths is more of the paths in .Welet X denote the set

G Ð eaÚ e×´G µ ¾ Ú ´G µ

X X

the key to inferring the AS relationships. The uphill por- of all vertices in X and let denote

¼

¼

Ú  Ú ´Gµ G

tion of a route appears at the beginning of a path, near the the leaves of the graph. For a given , Ú is the

¼ Ú

source node, whereas the downhill portion appears in the subgraph of G induced by the vertices in . Drawing on

´Ùµ

later portion of the path. As such, a leaf node in the graph this notation, we assign a ranking Ö aÒk to each vertex

Ù ¾ Ú ´G µ

is likely to be a customer of its parent node(s). We exploit X by applying the reverse pruning algorithm in 4

C.1 Complete Dominance

G = G

X ; =½

Ö ; i

In a view from source X ,ifAS has a higher rank than

´Gµ 6=  f

while (Ð eaÚ e× )

i j

AS j ,then appears to be a provider of . In complete

¾ Ð eaÚ e×´Gµ for all Ù

dominance, we assert that if the rank of i is more than that

´Ùµ=Ö

Ö aÒk ; i

of j irrespective of the vantage point, then is definitely

¼

= Ú ´Gµ Ð eaÚ e×´Gµ

Ú ;

c´iµ c´j µ

a provider of j . A vector is said to dominate ,if

= Ö ·½

Ö ;

´i; j µ > ¼ Ð ´j; iµ = ¼

Ð and . So, in vector terminology, if

¼

G = G

Ú ;

´iµ c´j µ i

c dominates , then we can infer that is the provider g

of j , assuming that the two ASes share an edge.

¾ Ú ´Gµ

for all Ù

´Ùµ=Ö

set Ö aÒk ; C.2 Equivalence

´Ü; Ý µ > =¾

Two ASes are said to be equivalent if e N . G

Fig. 1. Reverse pruning algorithm on graph X

This rule states that from more than 5¼± of the vantage Ý points, two ASes Ü and appear in the same level of the Figure 1. At each stage, the algorithm identifies the leaf hierarchy. Two ASes that appear in the same level in the hi- nodes, assigns them a rank, and removes these nodes (and erarchy from different vantage points are likely to be peers. their incident edges) from the graph. In the end, the re- This rule is useful in finding peers among tier-1 and tier-2

maining nodes (if any) form the connected component of providers. G

the original graph X ; these nodes are all assigned the same (highest) rank. C.3 Clustering The algorithm assigns a rank to each AS. Comparisons Most of the partial views generated from our routing between AS rankings play a major role in our inference al- table dumps are directed acyclic graphs. As a result, all

gorithm in the next subsection. However, many AS pairs ASes in these graphs are removed at some stage of the

i; j µ do not share an edge in the partial view. In many cases, reverse pruning process. Hence if ´ is an edge in

the two ASes may not have an edge in any of partial views the partial view from a source, then j is removed before

Ö aÒk ´iµ > Ö aÒk ´j µ

because they are not connected to each other in the real i. This implies that the . Therefore j

graph. In this scenario, the rank does not have any par- we can infer that if i is a provider of , then with high

Ö aÒk ´iµ Ö aÒk ´j µj  ½ ticular meaning. In other cases, the two ASes may share probability j for every AS. Note

a link in one of the other partial views. In this scenario, that if the edge is viewed from j or its customers, then

´j µ > Ö aÒk ´iµ

our algorithm imposes a relative rank for these two ASes Ö aÒk . Given this constraint, the Euclidean

c´iµ c´j µ ×ÕÖØ´Æ µ even though they may not share an edge from source X ’s distance between and is at least .How-

perspective. For example, consider a source X with paths ever, it is hard to infer from this that if the distance be-

c´iµ c´j µ ×ÕÖØ´Æ µ

X ; A; C ; D µ ´X; B; E; F µ

´ and that do not use the edge tween and is more than then they have

i j

C; E µ ½ D

´ . Our algorithm assigns a rank of to nodes and a provider-customer relationship. This is because, and

¾ C E ¿

F , a rank of to nodes and ,andarankof to nodes can be peers and still have a few coordinates where their

B ´C; E µ

A and . Despite the fact that the edge does not ranks have a large difference. We observe this for some G European ISPs that peer with American ISPs. However,

appear in X , we may be able to exploit the presence of

´G µ Ú one can infer the opposite of this rule. If the distance

both nodes in X in conjunction with the ranking from

j ×ÕÖØ´Æ µ

other vantage points that do include the edge to draw in- between i and is strictly less than , then they

Æ E

ferences about the relationship between C and . are more likely to be peers. This rule clusters these -

´Æ µ dimension vectors into spheres of ×ÕÖØ to iden- tify possible peers. C. Inference Rules for the ToR problem C.4 Probabilistic Rules The routing data from each vantage point provides a

partial view of the Internet. Given data from Æ vantage We introduce two probabilistic rules to tolerate uncer-

points, we map each AS into an Æ -dimensional vector tainty in in export policies and our ranking mechanism.

c´iµ = ´Ö ;::: ;Ö µ Ö i Ð ´i; j µ=Ð ´j; iµ > Æ

½ iÆ ij ¼

i ,where is the rank of AS from Probabilistic Dominance states that if

j Ð ´i; j µ Æ i j

vantage point .Let refer to the number of coordi- for a high value of ¼ then is a provider of . Typically,

Ö > Ö e´i; j µ j jk

nates where ik and be the number of coor- in graphs from the vantage point of or its customers, it is

Ö = Ö k =½; ¾;::: ;Æ Ö aÒk ´j µ > Ö aÒk ´iµ i jk dinates where ik ,forall . probable that even if is a provider of 5

TABLE I j . To avoid an incorrect inference, we introduce the rule of

probabilistic dominance. Probabilistic Equivalence occurs LOOKING GLASS SERVERS

Ð ´i; j µ=Ð ´j; iµ  Æ Æ ½ when ½ for close to 1. We use this rule AS # Name # Edges Change to infer peering relationships between ASes which are not 1 Genuity 13419 +1.8% in the same level in the hierarchy and those cases where the 1740 CERFnet 14287 n/a relationship between two ASes is not visible from many 3549 Globalcrossing 13542 +1.0% partial views. An AS relationship may not be visible from 3582 University of Oregon 23136 -0.4% a partial view because ASes may assign a low preference 3967 Exodus Comm. 19005 +0.4% to paths that traverse this edge. Using probabilistic equiv- 4197 Global Online Japan 13474 +1.0%

alence, we test whether two ASes are peers. We use values 5388 Energis Squared 13534 +2.1% Æ

Æ 7018 AT&T 14160 +3.0% ½ of 3 and 2 for ¼ and respectively. 8220 COLT Internet 11282 n/a C.5 Order of Application 8709 Exodus, Europe 15519 +0.7% The relationship inferences depend on the order in which we apply these rules. We treat equivalence and dominance as the basic rules for inferring peer-peer and the best and alternate paths for each prefix and construct provider-customer relationships. We apply equivalence a list of all AS paths that appear in the table. For each

before checking for dominance. Since we apply domi- path, we add the AS number of the router to the beginning

i; j µ nance after the equivalence rule, if ´ is inferred as a of each path and remove duplicate AS numbers that arise

provider-customer relationship using the dominance rule, from AS prepending. Then we process the paths to con- j

then the rank of i should be more than the rank of in at struct a partial view of the AS graph. After constructing ¾

least Æ= of the dumps. If this is not the case, the rank the partial views, we apply the ranking algorithm and in-

j Æ=¾ of i should be equal to that of in at least dumps ference rules from Section III to assign a relationship to thereby classifying the link as a peer-peer using the equiv- each AS pair that shares an edge in one or more of the alence rule. Therefore those provider-customer relation- routing tables. ships inferred using the dominance and equivalence rules Table I provides a summary of the ten tables we down- can be treated with a high level of confidence. We apply loaded on April 18, 2001. The “# Edges” column shows the clustering condition before applying the probabilistic the number of unidirectional edges in the AS paths. The rules. The dominance, equivalence, and clustering condi- “Change” column indicates the change in the number of tions are powerful constraints for determining the type of edges from April 18 to May 1, when we downloaded a new a relationship. The probabilistic rules are applied to elim- copy of the tables. The entry for AS 3582 corresponds to

inate the AS relationships that cannot be inferred from the the University of Oregon RouteViews server, which has 5¾

more basic conditions. peering sessions with ¿9 different ASes [11]. The Route- Views server has an especially rich view of the AS graph, IV. EXPERIMENTAL RESULTS with over 23,000 edges compared to 11,000–15,000 edges This section evaluates our inference techniques on a for most of the other routing tables. In total, the AS paths collection of ten publicly-available BGP routing tables. in the ten routing tables have 24,752 unidirectional edges

We classify the relationships between ASes and identify a between 24,059 pairs of ASes. More than ¾5± of the small number of AS paths that are inconsistent with the re- edges appear in all ten routing tables, as shown in Fig-

lationship assignment. The most common anomalies seem ure 2, which plots a histogram of the percent of edges that

½¼ 8¼± to stem from recent acquisitions and mergers, suggesting appear in Ü of the routing tables. More than of the that some AS pairs may have a sibling relationship. edges appear in at least two dumps. We use the partial views from these ten routing tables A. BGP Routing Table Data to generate our inferences of the AS relationships. In Sec- Our inference techniques have been applied to a collec- tion IV-C, we validate our inferences using the AS paths tion of ten BGP routing tables available from Telnet Look- from another collection of routing tables. We manually ing Glass servers. We automated the process of contacting downloaded routing data from Ebone (AS #1755), MAE- each server, sending “show ip bgp” to the command- West (AS #2548), KDDI Japan (AS #2516), and Cable line interface, and storing the resulting table. For each des- and Wireless (AS #6893) on April 9, 2001. These four tination prefix the table has one or more routes with a va- tables are available from Web Looking Glass servers that riety of BGP attributes, including the AS path. We extract have a slightly different interface than the Telnet servers. 6

30 TABLE III

25 DISTRIBUTION OF THE 23,953 INFERENCES

20 Rule Number Percentage 15 Complete dominance 22,241 97.93% % of Edges 10 Probabilistic dominance 471 2.07% 5 Equivalence 836 67.37% 0 1 2 3 4 5 6 7 8 9 10 Probabilistic equivalence 278 22.40% # of Dumps Clustering 127 10.23%

Fig. 2. Percentage of edges that appear in Ü of the ten tables

:¾± TABLE II of Lixin Gao in [10]. Our inference that 5 of the

INFERRED RELATIONSHIPS FOR 24,059 AS PAIRS AS pairs (1,241 pairs) are peers is close to Gao’s values

:¿± 7:8±

between 5 and . The percentage of provider-

½:5± Relationship #ASpairs Percentage customer relationships we infer is within ½– of the Provider-customer 22,712 94.40% figure reported in [10]. Her study drew on RouteViews Peer-peer 1,241 5.16% data from September 1999, January 2000, and March

Unknown 106 0.44% 2000. The number of edges in the RouteViews dump ½¿ has grown by over 7¼± over the last months. With the larger RouteViews table and the nine other tables, our The Web interface typically does not permit users to in- collection of edges is twice as large as the graph used in voke the “show ip bgp” command. Instead, we rely her earlier study. Using traceroutes from 16 sources to on the “bgp paths” command that produces a list of AS 400,000 destinations [8] in October 2000, CAIDA con- paths, without the destination prefix or an indication of the structed an AS graph that is slightly larger than ours. Their best path. As with the “show ip bgp” data, we extract final graph consists of 7,563 ASes and 25,005 edges. Ours the AS paths, add the AS number of the source AS, and contains 10,698 ASes and 24,752 edges. However, they do remove duplicate ASes. Then, we use the results of our not explore this graph in terms of AS relationships. inference algorithm to assign a relationship to each unidi- rectional edge in each path and look for paths that violate C. Validation of Inferences the two patterns identified in Section II. Since the peering and customer information of an ISP are proprietary information, we cannot validate our infer- B. Relationship Inferences ences against an official list of AS relationships. Instead, Table II summarizes the results from applying our infer- we determine what percentage of the AS paths actually ad-

ence algorithm to the ten BGP tables from Table I. Our al- here to the export rules suggested by our inferences. There

:5± gorithm produces an inference for over 99 of the edges are two scenarios where we may label an AS path as an in our AS graph (23,953 of the 24,059 AS pairs). The vast anomaly: one in which some AS in the path actually vio- majority of AS pairs appear to have a provider-customer lates the export rules or the other in which our relationship

relationship. Approximately 5± of the AS pairs have a inference of one of the edges in the path is wrong. The per- peer-peer relationships. Table III highlights the role of the centage error that is reported in this section is the sum total various inference rules in drawing conclusions about AS of these two scenarios. For our validation, we draw on the relationships. A large percentage of the provider-customer list of AS paths from two of the ten of the Telnet Looking relationships are inferred from the complete dominance Glass Servers (AS numbers 1 and 7018) used to construct

rule. Complete dominance in Æ dimensions is a good in- our original inferences, as well as the four Web Looking

dication of a provider-customer relationship and we can Glass Servers (AS numbers 1755, 2516, 2548, and 6893).

:9¿± be reasonably certain of 97 of our provider-customer If every AS pair has a customer-provider or peer-peer

inferences. Similarly, close to 8¼± of the peering links relationships, then every AS path should have have one

are inferred from the equivalence and clustering condi- of the two patterns identified in Theorem 1. A path is an

:½±

tions. The probabilistic rules account for ¾ of the anomaly if it has any two adjacent edges having one of the

:4±

provider-customer inferences and ¾¾ of the peer-peer following patterns:

½µ inferences. 1. ´·½ : An AS permits transit traffic between two of The percentages of provider-customer and peer-peer re- its providers.

lationships in Table II are consistent with the conclusions 2. ´·½ ¼µ: An AS permits transit traffic from one of its 7

peers to one of its providers. terns (1239 1740 7018) and (3561 5400 5727) seem to

½µ ½74¼

3. ´¼ : An AS permits transit traffic from one of its have similar explanations; Cerfnet ( ) was acquired by

54¼¼ 57¾7 providers to one of its peers. AT&T ( 7¼½8), and AS and AS are both part of

4. ´¼ ¼µ: An AS permits transit traffic between two of its the Concert IP backbone. For the anomaly pattern (1239 peers. 8043 6395), further investigation showed that IXC Com- Case 1 represents a serious violation of the export rules. munications has acquired SmartNAP (8043) and IXC was This anomaly may arise from a misconfigured customer or later renamed as Broadwing (6395) [20]. Similar anec- or due to a misclassified relationship where the customer dotes apply to many of the other popular anomaly patterns. AS is actually a sibling of one or both of the providers. Identifying anomaly patterns may be a useful way to de-

Backup and sibling relationships relationships can cause tect sibling relationships. In the absence of misconfigura-

½ ¿

case ¾ and case anomalies. Case 4 suggests that the path tions, we can label all case anomalies as caused by sib-

A; B ; C µ

traverses two consecutive peering links, which may be per- ling relationships. That is, if the AS path ´ is a

A B C B

missible if two the peers have a sibling relationship for case ½ anomaly, either and or and are siblings. ¿ some destination prefixes. Detecting these anomaly paths We do not extend this to case ¾ or case anomaly pat- provides a way to identify AS pairs that may have more terns since these anomaly patterns may represent backup

complicated relationships. relationships or other complex transit agreements. Ignor- ½

As shown in Table IV, the vast majority of paths are ing the KDDNet dump, we observed ½¼9 unique case ½9¼ consistent with the relationship assignments and the asso- anomaly patterns. In these ½¼9 patterns, we found

ciated export policies. The percentage of anomalies varies unique AS pairs with possible sibling relationships; ¾¾ of

¿:4± between ¾– for five of the six routing tables. These these possible sibling relationships appeared in multiple

results validate our base assumption that the export rules paths. As an example, AS ¾685 (AT&T Global Network are observed by a large percentage of the ASes. How- Services) appears in the middle as a customer in many

ever, KDDI (AS #2516) has a relatively high percentage case ½ anomalies. This AS may have a sibling relationship

7¼½8

:5±

of anomalous paths ( 8 ). For every anomalous path, we with AS (AT&T). Our sibling inferences account for

:8±

can identify an anomaly pattern consisting of three adja- roughly ¼ of the edges in the AS graph. The work by

½:5±

A; B ; C µ ´A; B µ

cent ASes ´ where the pair of edges and Gao [10] identifies of the edges to be siblings; her

B; C µ ´ falls into one of four cases. The results in Table IV validation of a subset of these inferences on a private data

show that case 3 anomalies are very uncommon and case set found that ¾¼± of these inferences were valid. We plan 1 arises less frequently than case 2 and case 4. KDDnet to explore our approach for detecting sibling relationships exhibits an unusually high number of all four cases (espe- in more detail as part of future work. cially case 4); further investigation is necessary to explain

V. I NTERNET HIERARCHY

A; B ; C µ this fact. A small number of AS triples ´ are re- sponsible for the vast majority of the anomalies. For most The term tiers has been used informally in discussions of the routing tables, ten different AS triples were respon- about the hierarchy of ASes in the Internet topology. How- sible for more than 9¼± of the anomalous paths. ever, precise rules for classifying ASes into tiers have not been resolved. The work in [9] uses node degree to group D. Common Anomaly Patterns ASes into different classes. ASes with a large number of The last column in Table IV lists one popular triple for neighbors are placed above ASes with a small node de- each routing table dump. For example, the anomaly (1 gree. However, a simple degree-based classification may 65112 6461) includes a private AS number (65112) [17] not capture the essence of tiers in the hierarchy. In this

that should not appear in an AS path between two public section, we infer a hierarchy that symbolizes the business 646½ ASes (½ and ). This anomaly pattern alone accounted relationships between ASes. Typically, a customer should for more than half of the anomalous paths in this dump. be at a lower level in the hierarchy than its providers. An We analyzed the other anomaly patterns using the RADB essential component to such a characterization is a knowl- whois data [18] which identifies ASes by name and some- edge of the relationships between ASes. We now briefly times includes a list of import and export policies. Con- describe the rationale behind our approach for inferring the sider the anomaly pattern (7018 6841 3300) with AT&T different levels in the AS hierarchy. (7018), Infonet Europe (6841), and AUCS (3300). The In order to capture these hierarchical properties, we rep- RADB data states that AS 6841 exports and import all ad- resent the AS topology as a directed graph where the di- vertisements from AS 3300. We confirmed that Infonet rection of an edge indicates the type of relationship be- and AUCS have recently merged [19]. The anomaly pat- tween the two ASes. To the best of our knowledge, previ- 8

TABLE IV QUANTIFICATION &DISTRIBUTION OF PATH ANOMALIES

AS # AS Name #of Anomaly Anomaly Unique Case 1 Case 2 Case 3 Case 4 Popular Paths Paths % Anomalies Anomaly 1 Genuity 65,383 1,679 2.57% 115 23 89 3 159 (1 65112 6461) 7018 AT&T 141,283 3,357 2.37% 101 16 85 0 181 (7018 6841 3300) 6893 CW 70,253 2,384 3.39% 148 24 115 9 210 (3561 5400 5727) 2548 MaeWest 115,199 2,298 2.00% 233 57 171 5 256 (1239 8043 6395) 1755 Ebone 23,469 703 3.00% 131 21 103 7 212 (3300 8933 2200) 2516 KDDI 126,414 10,709 8.47% 594 248 306 40 1101 (209 1800 1239)

ous works have considered the topology as an undirected gorithm to our graph reveals 95¼ small regional ISPs. We

graph that simply captures the connectivity between the define the remainder of the graph as the core, consisting

6; 578

ASes. In our graph, a provider-customer relationship be- of a connected component with just 857 ASes and

B A ¾5±

tween A and is represented by a directed edge from unidirectional edges. This represents more than of

A B

to B and a peering relationship between and is rep- the total number of edges in the graph. The nodes in the B

resented by two directed edges, one from A to and the core have an average degree of 6. A other from B to . We analyze the graph constructed by applying our inference techniques to the ten Telnet Look- B. Dense Core ing Glass servers discussed in Section IV. The set of ASes that remain after the pruning process represent the core of the Internet. Given the nature of the A. Customers and Small Regional ISPs reverse pruning process, we can infer that for every AS Customers are the easiest class of ASes that can be clas- present in the core, all of its peers and its provider should sified from this directed graph structure of the AS topo- also be present in the core. The core of the graph should logy. Customers are those stub networks which are origins include the small number of so-called tier-1 providers. In and sinks of traffic and which do not carry any transit traf- practice, the term “tier-1 provider” is loosely defined as a fic. From the very definition of the direction of edges in “large” AS or as an AS that does not have any upstream our graph, we can infer the customer ASes to be the leaves provider. We could identify these ASes by looking for of this directed graph. In a directed graph, a leaf is a node all provider-free nodes. However, this approach would be

with out-degree ¼. Since an undirected graph makes no sensitive to a small number of missing edges or misclas- distinction between out-degree and in-degree, customers sified relationships in our AS graph. Instead, we could

with multiple providers would have a degree more than ½ exploit the observation that every provider-free AS would and hence would not appear as leaves of the graph. Model- peer with every other provider-free AS to ensure reacha- ing the topology as a directed graph provides a more pre- bility to all destinations. That is, the set of tier-1 ASes cise characterization of the bottom-most layer in the AS should form a clique where every AS has an edge to and

hierarchy. In the directed graph constructed from the ten from each of the other ASes. Other provider-free ASes, if

; 85¾ ½¼; 698

BGP dumps, 8 of the ASes are leaf nodes. they exist in our graph, would be excluded from the set of

:5± The rest of the graph contains just ½7 of the ASes. tier-1 providers. Once we identify the customers and remove these nodes, In practice, some ASes may have complex transit or the resulting graph has a new set of leaves. These leaves backup relationships to provide connectivity. We define represent small regional ISPs that have one or more cus- a weaker notion of the dense core as the largest subset of tomers. We can continue the process of pruning the leaves ASes whose induced subgraph is “almost a clique.” We de-

of the graph until we reach a point where the graph has fine a directed graph of Æ nodes to be dense if every node

no leaves. This involves applying a reverse pruning algo- in the graph has an in-degree and out-degree of at least

¾ Æ=¾ rithm similar to Figure 1 in Section III-B. We define the set Æ= . We have set as an artificial cut-off for deter- of nodes removed by this process as small regional ISPs. mining the dense core in the AS topology. The problem of Since every peering relationship is represented as a loop of determining the largest clique in a graph is NP-hard. Given two edges in the graph, no ASes with peering relationships that a clique is just one example of a dense graph, the are included in this layer. Applying the reverse pruning al- problem of finding the largest dense subgraph of a graph

9 X

which k was not dense. This indicates that the ordering

¾ Ú ´Gµ compute Þ with maximum out-degree;

output by the algorithm was indeed a good ordering for

= fÞ g X ;

choosing the vertices of the dense core. In other words,

´Þ µ=½;Ö =½ ÔÓ× ;

it validates the rationale behind our greedy approach that

6= Ú ´Gµ f

while (X )

Þ Ý

if Ý appeared before in the ordering then had a better

¾ Ú ´Gµ X d´Ý; X µ compute Ý with max

chance of being present in the dense core than Þ .

(selecting the Ý with the max out-degree)

= X [fÝ g X ;

B.2 Properties of the Dense Core

´Ö µ=d´Ý; X µ

ÑaÜiÒdeg Ö ee ;

= Ö ·½

Ö ; Applying this algorithm to the core of our graph, we

¾¼

´Ý µ=Ö ÔÓ× ; identify a dense core consisting of ASes. These ASes

g include the large ISPs such as Genuity, Sprint, AT&T, Ex-

odus.net, and Alternet. The top ¾¼ ASes have a very dense

½5 ¾¼ Fig. 3. Greedy algorithm to order the nodes connectivity of ¿¾9 peering links. The top of the ASes almost form a clique with only three edges missing from the clique. The largest clique we observed in this

becomes much harder. We have developed a greedy algo- ¾¼ innermost core consisted of ½¿ ASes. The ASes have

rithm for identifying the ASes in the dense core.

; 85¾ 6 provider-customer edges to customer ASes and 964 provider-customer edges to the small regional providers. B.1 Identifying the Dense Core After removing the dense core, the remainder of the core First, we order the vertices based on a “greedy” notion consists of 837 ASes. of connectivity, following the algorithm in Figure 3. Let C. Transit Core

G represent the directed graph representation of the core.

´Gµ E ´Gµ

Let Ú and represent the vertices and edges of the After removing the dense core, we noticed the presence

d´Ü; Y µ Ü ¾ Ú ´Gµ Y  Ú ´Gµ

graph G.Let for and de- of other large national providers and hosting companies

Ü; Þ µ Þ ¾ Y note the number of edges of the form ´ where . that have peering relationships with many of the ASes in Connectivity from a node to a given set of nodes refer to the dense core. To identify these ASes, we define the no-

the number of directed edges from that node to any of the tion of a transit core. Nodes in the transit core peer with Æ

nodes in the set. Assume that k of the nodes are already each other and with ASes in the dense core, but they do

k ordered. For each of the remaining Æ nodes, we de- not tend to peer with many other ASes. In our directed

termine the connectivity to the k nodes and pick the node graph representation, these peering links are essentially the

Øh

k ·½µ with the maximum connectivity as the ´ .When incoming directed edges from vertices outside this set to

multiple nodes have the same connectivity, we choose the vertices within the set. We define such a set of edges to

´Üµ node with a higher out-degree. In Figure 3, ÔÓ× denotes be the in-way cut of the graph induced by the given set.

the position of a node Ü in the final ordering. Using this property, we define the transit core as the small-

Øh

i X

Ü est set of ASes containing the dense core which induces a i

Let i denote the AS in the ordering and be the

cÓÒÒ´iµ

set of the top i ASes. Let represent the connec- weak in-way cut. We can presently visualize a weak in-

d´Ü ;X µ

Ü way cut to have a small number of edges compared to the

i i ½

tivity of i which is equal to .Wedefinethe k X total number of ASes in the transit core.

dense core as the set k for the smallest value of such

cÓÒÒ´k ·½µ < ´k ·½µ=¾ X

that and k is dense. Once

C.1 Identifying the Transit Core

´k ·½µ ´k ·½µ=¾

the value of cÓÒÒ falls below the value ,

Øh

´k ·½µ X  Ú ´Gµ cÙØ ´X µ

the node will violate the dense property. There- Given ,let iÒ denote the set of all

´k ·½µ < ´k ·½µ=¾ ´Ý; Þ µ Ý ¾ Ú ´Gµ X Þ ¾ X

fore if cÓÒÒ , the induced subgraph of edges of the form where and .

X Ü X Ú ´Gµ

·½ k ·½

k will not be dense since the out-degree of will We define a cut of the vertex set tobeaweak

´k ·½µ=¾ jcÙØ ´X µj < jX j=¾

be less than . However this does not mean that if cut if iÒ . The problem of finding weak

cÓÒÒ´k ·½µ > ´k ·½µ=¾ X ·½

,then k is dense. Consider cuts in a graph is NP-complete and there are no good ap-

Ü j

the scenario where a node j for some is linked to proximation algorithms for that problem. Given that the

j=¾ X

more than elements in j and not linked to any node transit core is a super-set of the dense core and that the

X X cÓÒÒ´k µ >k=¾ j

in k .Thisisanexamplewhere but dense core is derived by the greedy ordering, we apply the X

k is not dense. In this regard, our algorithm is greedy. same ordering to find the transit core as was used to find

For the AS topology that we obtained, the point where the dense core. A natural way of using this ordering to find

´k µ k=¾ k k cÓÒÒ dropped below was the first value of for the transit core is to find the smallest value of such that 10

TABLE V TABLE VI DISTRIBUTION OF ASES IN THE HIERARCHY INTER-CONNECTIVITY ACROSS LAYERS

Level #ofASes Layer 0 1 2 3 4 Dense core (0) 20 0 329 776 931 964 6852 Transit core (1) 162 1 213 1052 1344 728 3660 Outer core (2) 675 2 8 74 1070 390 3196 Small regional ISPs (3) 950 3 0 0 0 202 2376 Customers (4) 8852

between various layers in the AS hierarchy. Each number

jcÙØ ´X µj

k . Surprisingly we found that the value refers in the table refers to the total number of edges from

k jcÙØ ´X µj < k=¾ iÒ

of at which k also satisfied the prop- the layer represented in the same row to the layer repre-

cÓÒÒ´k ·½µ = ½

erty that . This means that no two edges in sented in the same column. For example, 776 is the total

cÙØ ´X µ iÒ

k have the same source. A weak cut also means number of edges from layer 0 to layer 1. The tables shows

5¼± X

that more than of the ASes in k do not have any several key properties of the Internet topology:

Ú ´Gµ X ¯

peering relationship with any of the ASes in k . The ASes in dense core are very well connected.

X ¯ Hence by this definition, k should indeed contain all the As we move from the dense core toward customers, the transit providers. inter-layer and intra-layer connectivity drops significantly.

¯ The large number of customer ASes have their providers C.2 Properties of the Transit Core

distributed across all the layers. The ASes in layer ¼ sup- Applying the in-way cut algorithm to our graph, we port a large number of customer ASes. This indicates that discover a transit core consisting of the 162 ASes, not the connectivity across layers is not strictly hierarchical, as including the 20 ASes in the dense core of the graph. also observed in [9].

These 162 ASes have 213 peering links with ASes in ¯ The number of edges within the outer core is less than the dense core. Concert, Singapore Telecommunications, the total number of vertices in the outer core. This in- UUNet European division, Teleglobe European division dicates the presence of multiple disconnected groups of and KDDi Corporation, Japan are some example ISPs in ASes in the outer core; ASes in different groups commu- our transit core. We found many of the top providers in nicate via ASes in the dense core and the transit core. Europe and Asia to be present in our transit core. The graphs in Figure 4 explores the relationship be- tween node degree and the layers in the hierarchy. We D. Outer Core define node degree as the number of neighboring ASes We classify all of the remaining ASes in the core as without regard to the relationship. The top graph plots the the outer core. The members of the outer core typically cumulative distribution of node degree on a logarithmic represent regional ISPs which have a few customer ASes scale and the bottom graph focuses on the large number of and a few peering relationships with other such regional ASes with no more than 15 neighbors. In general, layer ISPs. The outer core consists of 675 ASes that have 8 peer- 0and1ASeshavehighdegree,andlayer3and4ASes ing sessions with ASes in the dense core and 74 peering tend to have low degree. However, this is not universally sessions with ASes in the transit core. We observed that true. Some customers at layer 4 have a large number of many members of our outer core are regional ISPs. Some upstream providers, and some ASes in the dense core at examples include Turkish Telecomm, Williams Commu- layer 0 have a relatively small number of neighbors. For nications Group, CAIS Internet, Southwestern Bell Inter- example, our results suggest that AS 1833 (TeliaNet USA) net Services and Minnesota Regional Network. It is in- has a degree of only 40. Yet, we classify TeliaNet as part teresting to note that while Exodus Communications (AS of the dense core due to its rich collection of peering rela- 4197) is present in our outer core, Exodus.net (AS 3967) tionships. A hierarchy based solely on degree distribution is present in the dense core. would not be able to make this distinction.

E. Summary VI. CONCLUSIONS Table V summarizes the number of ASes at each level The relationships between ASes has a significant im- in the hierarchy—dense core (layer 0), transit core (layer pact on the flow of traffic through the Internet. Our work 1), outer core (layer 2), small regional ISPs (layer 3), and makes two important contributions toward understanding customers (layer 4). Table VI summarizes the connectivity the structure of the Internet in terms of these relationships: 11

100 [4] C. Alaettinoglu, “Scalable router configuration for the Internet,” 90 in Proc. IEEE IC3N, October 1996. 80 [5] B. Krishnamurthy and J. Wang, “On network-aware clustering of 70 web clients,” in Proc. ACM SIGCOMM, August/September 2000. 60 [6] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet 50 map discovery,” in Proc. IEEE INFOCOM, 2000. 40 Layer0 [7] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and 30 Layer1 20 Layer2 W. Willinger, “Network topologies, power laws, and hierarchy,”

% of ASes (Cumulative) Layer3 Tech. Rep. 01-746, Computer Science Department, University of 10 Layer4 0 Southern California, 2001. 1 10 100 1000 10000 [8] “CAIDA Web Site.” http://www.caida.org. Degree [9] R. Govindan and A. Reddy, “An analysis of inter-domain topo- 100 logy and route stability,” Proc. IEEE INFOCOM, 1997. 90 [10] L. Gao, “On inferring automonous system relationships in the In- 80 ternet,” in Proc. IEEE INFOCOM, November 2000. 70 [11] “University of Oregon RouteViews project.” 60 http://www.routeviews.org/. 50 [12] “BGP tables from the University of Oregon RouteViews Project.” 40 http://moat.nlanr.net/AS/Data/ 30 . 20 [13] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law re- % of ASes (Cumulative) 10 lationships of the Internet topology,” in Proc. ACM SIGCOMM, 0 pp. 251–262, August/September 1999. 0 2 4 6 8 10 12 14 [14] C. Jin, Q. Chen, and S. Jamin, “Inet: Internet topology generator,” Degree Tech. Rep. CSE-TR-433-00, U. Michigan, September 2000. Fig. 4. Cumulative distribution of AS degree by layer [15] L. Gao, T. G. Griffin, and J. Rexford, “Inherently safe backup routing with BGP,” in Proc. IEEE INFOCOM, April 2001. [16] L. Gao and J. Rexford, “Stable Internet routing without global coordination,” in Proc. ACM SIGMETRICS, June 2000.

¯ An algorithm for inferring AS relationships from partial [17] J.Hawkinson, “Guidelines for creation, selection and registration views of the AS graph from different vantage points of an Autonomous System,” RFC 1930, IETF, March 1996.

¯ A mechanism for dividing the Internet hierarchy into [18] “RADB Whois Server.” whois.radb.net. layers based on AS relationships and node connectivity [19] “Infonet Europe Web Site.” http://www.infonet- The complete structure of the Internet is unknown and dif- europe.com/. [20] “Preston Gates Ellis Acquisitions.” ficult, if not impossible, to obtain. Our approach is com- http://www.prestongates.com/meetpge/pro.asp?proID=611 prised of many heuristics, with certain limitations:

¯ We draw our inferences based on only ten vantage points available from Telnet Looking Glass servers. Ideally, we would have a larger collection of routing tables from more diverse vantage points, including smaller customer ASes.

¯ We treat the RouteViews routing table as a view from a single AS. In future work, we plan to extract a separate view for each AS participating in the RouteViews project.

¯ Multiple ASes may fall under the administrative control of a single institution, due to historical artifacts and market forces. We plan to extend our methodology to incorporate more complex routing policies that are not captured by the traditional customer-provider and peer-peer relationship. Despite these limitations, we have shown that our ap- proach provides a detailed view of the Internet topology in terms of the relationships between ASes.

REFERENCES [1] J. W. Stewart, BGP4: Inter-Domain Routing in the Internet. Addison-Wesley, 1998. [2] S. Halabi and D. McPherson, Internet Routing Architectures. Cisco Press, second ed., 2001. [3] G. Huston, “Interconnection, peering, and settlements,” in Proc. INET, June 1999.