Layer 1-Informed Internet Topology Measurement
Total Page:16
File Type:pdf, Size:1020Kb
Layer 1-Informed Internet Topology Measurement Ramakrishnan Durairajan Joel Sommers Paul Barford University of Colgate University University of Wisconsin-Madison [email protected] Wisconsin-Madison [email protected] [email protected] ABSTRACT 1. INTRODUCTION Understanding the Internet's topological structure continues Studies that aim to map the Internet's topological struc- to be fraught with challenges. In this paper, we investigate ture have been motivated for many years by a number of the hypothesis that physical maps of service provider infras- compelling applications including the possibilities of improv- tructure can be used to effectively guide topology discov- ing performance, security and robustness (e.g., [19]). While ery based on network layer TTL-limited measurement. The these motivations remain as compelling as ever, the ability goal of our work is to focus layer 3-based probing on broadly to accurately and comprehensively map the Internet has, for identifying Internet infrastructure that has a fixed geographic the most part, remained beyond our grasp. location such as POPs, IXPs and other kinds of hosting fa- The primary challenges to thoroughly mapping the In- cilities. We begin by comparing more than 1.5 years of TTL- ternet stem from its enormous size, distributed ownership, limited probe data from the Ark [25] project with maps of and constantly changing characteristics. Faced with these service provider infrastructure from the Internet Atlas [15] challenges, the most widely used approach to Internet map- project. We find that there are substantially more nodes ping has been based on gathering data from network-layer and links identified in the service provider map data ver- measurements using TTL-limited probes1. Great progress sus the probe data. Next, we describe a new method for has been made on solving some of the specific problems re- probe-based measurement of physical infrastructure called lated to using these network layer measurements for under- POPsicle that is based on careful selection of probe source- standing Internet topological characteristics. However, the destination pairs. We demonstrate the capability of our fact remains that layer 3 data are inherently tied to the method through an extensive measurement study using ex- management policies and operational objectives of service isting \looking glass" vantage points distributed throughout providers, which may be at odds with comprehensive and the Internet and show that it reveals 2.4 times more phys- accurate mapping of the Internet. ical node locations versus standard probing methods. To So, just what do we mean by an \Internet map"? At the demonstrate the deployability of POPsicle we also conduct lowest level, the Internet is composed of physical conduits tests at an IXP. Our results again show that POPsicle can that contain bundles of optical fiber, and that terminate at identify more physical node locations compared with stan- buildings that house routing and switching equipment. We dard layer 3 probes, and through this deployment approach refer to the collection of these data and their geographic lo- it can be used to measure thousands of networks world wide. cations as \physical maps" of the Internet. Several recent projects have begun to assemble repositories of physical In- Categories and Subject Descriptors ternet maps [15, 29]. These maps are valuable because they reflect a ground truth perspective of service provider infras- C.2.1 [Network Architecture and Design]: Network topol- tructure. These are in contrast with maps that have been ogy; C.2.3 [Network Operations]: Public networks generated based on layer 3 probes (e.g., [45]), which we re- fer to as \network-layer maps". Ideally, network-layer maps General Terms reflect a timely representation of network topology as well as the dynamic aspects of management and configurations. Algorithms, Design, Measurement In this paper we investigate the hypothesis that physical maps can be used to guide and reinforce the process of col- Keywords lecting layer 3 probe data toward the goal of expanding the Physical Internet, POPsicle probing heuristic scope of physical infrastructure captured in network-layer maps. This conjecture leads directly to two key research Permission to make digital or hard copies of all or part of this work for personal or questions that are the focus of our work: (i) how do phys- classroom use is granted without fee provided that copies are not made or distributed ical layer maps compare and contrast with network-layer for profit or commercial advantage and that copies bear this notice and the full cita- maps? and (ii) how can probe methods used by projects tion on the first page. Copyrights for components of this work owned by others than like Ark [25] be improved to reveal a larger portion of phys- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- ical infrastructure? We contend that some of the challenges publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 1 IMC’14, November 5–7, 2014, Vancouver, BC, Canada. Maps can also be created using BGP updates or Copyright 2014 ACM 978-1-4503-3213-2/14/11 ...$15.00. application-layer data, however those are not the focus of http://dx.doi.org/10.1145/2663716.2663737. this paper. 381 inherent in generating maps from layer 3 probes can be over- get networks using two core ideas: (i) source-destination come by using the constructive approach of first identifying pairs should be proximal to the target geographically and in key infrastructure (POPs, etc.) and then identifying nodes address space, and (ii) verification of measurements using (identified by disambiguating IP addresses or using DNS multiple sources is required. We verify the identification of names) that reside in those locations. infrastructure using location hints in DNS names and us- Our study begins by considering physical map data from ing records available in PeeringDB [5]. Our analysis shows the Internet Atlas (or Atlas) project and network-layer map that probing between sources and destinations that are both data from the Ark project. We focus specifically on infras- within the same autonomous system as the target(s) reveals tructure in North America. The Atlas repository includes the most physical infrastructure. data from 78 Internet service providers with over 2600 nodes The results of our targeting experiments motivate a new and over 3580 links. Nodes in the Atlas data refer to hosting heuristic algorithm for probe targeting that we call POPsi- centers or points of presence (POPs), with links referring to cle. We show that POPsicle finds 2.4 times as many nodes physical connections between those locations. We use Ark as are identified by Ark. We compare the number of POPs measurements collected from September 2011 to March 2013 found by POPsicle with POPs found using Rocketfuel [45] (approximately the same period over which the Atlas reposi- and in all cases POPsicle performs better. We also found tory has been assembled). We resolve the IP addresses from that IXPs play a critical role in the way probes traverse this corpus to DNS names and then use location hints to a given network. Specifically, sources that are co-located associate these with physical locations (e.g., cities), which with IXPs have the advantage of appearing|from a layer becomes the basis for our comparisons. 3 perspective|as being internal to any/all of the networks Several characteristics are immediately evident in the that are connected at that location. Thus, a single source data. Most prominent is the fact that among the 50 net- that is co-located within an IXP may enhance the identi- works that are the focus of our comparison study, we observe fication of infrastructure across all networks that connect many more nodes and links in the physical maps. There can to the IXP. This has the effect of significantly broadening be a number of explanations for this observation, including the scope of the infrastructure that can be identified using (i) the limitations of exploiting DNS naming conventions, our approach. To validate this idea, we deployed POPsi- (ii) the use of tunneling protocols (e.g., MPLS) or the lack cle at the Equinix IXP in Chicago, USA, and measured the of layer 3 services which can render nodes invisible to probes, number of POPs for 10 ISPs and found that POPsicle re- (iii) the limited perspective of the network mapping infras- veals almost all POPs compared to Atlas and extra POPs tructure and (iv) the fact that layer 3 routing configura- (in certain cases) compared to Ark. We also find through a tions may simply obviate the ability to observe all networks, case study of Cogent network that POPsicle identifies over nodes and links. This supposition is supported by the obser- 90% of the nodes identified in Atlas or by the reverse DNS vation that all Ark probes are confined to a minority subset technique of [21], compared with about 65% of the POPs of networks, with the majority of probes traversing an even identified through Ark, and only 25% identified in the most smaller subset of networks. Despite this, there are still some recently available Rocketfuel data. nodes/locations/links that appear in the network-layer map To summarize, the key contributions of our work are as but are not indicated in the physical map. This can be ex- follows. First, we perform a first-of-its-kind comparison of plained by physical maps that are out of date or are either large repositories of physical and network maps and find intentionally or erroneously incomplete. that physical maps typically reveal a much larger number The differences between the physical and network-layer of nodes (e.g., POPs and hosting infrastructure).