Estimating Internet Address Space Usage Through Passive Measurements
Total Page:16
File Type:pdf, Size:1020Kb
Estimating Internet Address Space Usage through Passive Measurements Alberto Dainotti, Michael Kallitsis Eduard Glatz, Karyn Benson, Xenofontas Alistair King, Merit Network, Inc. Dimitropoulos kc claffy Ann Arbor, Michigan, USA ETH Zurich CAIDA, UC San Diego [email protected] Zurich, Switzerland La Jolla, California, USA {eglatz,fontas}@tik.ee.ethz.ch {alberto,karyn,alistair,kc}@caida.org ABSTRACT Internet growth, including to what extent NAT and IPv6 One challenge in understanding the evolution of Internet in- deployment are reducing the pressure on (and demand for) frastructure is the lack of systematic mechanisms for moni- IPv4 address space. toring the extent to which allocated IP addresses are actu- To our knowledge, the only previous scientific work map- ally used. Address utilization has been monitored via ac- ping actual utilization of IPv4 addresses is ISI’s Internet tively scanning the entire IPv4 address space. We evaluate Census project [10] (ISI Census, in the following), which the potential to leverage passive network traffic measure- periodically sends ICMP echo requests to every single IPv4 ments in addition to or instead of active probing. Passive address (excluding private and multicast addresses) to track traffic measurements introduce no network traffic overhead, the active IP address population. This approach has four do not rely on unfiltered responses to probing, and could primary limitations: significant probing overhead; potential potentially apply to IPv6 as well. We investigate two chal- to offend probing targets (i.e., the entire Internet) who may lenges in using passive traffic for address utilization infer- request not to be probed or even blacklist probing addresses; ence: the limited visibility of a single observation point; inaccuracies due to the fact that many networks either filter and the presence of spoofed IP addresses in packets that out ICMP echo requests or respond to them on behalf of can distort results by implying faked addresses are active. other IP addresses, and inability to scale for use in a future We propose a methodology for removing such spoofed traf- IPv6 census. fic on both darknets and live networks, which yields results We investigate the potential for passive network traffic comparable to inferences made from active probing. Our measurements to inform or even substitute for active meth- preliminary analysis reveals a number of promising findings, ods of measuring address space utilization. Passive measure- including novel insight into the usage of the IPv4 address ments introduce no network traffic overhead, do not rely on space that would expand with additional vantage points. unfiltered responses to probing, and could potentially apply to IPv6 as well. On the other hand, a passive-measurement approach to address utilization inference has two daunting Categories and Subject Descriptors challenges which we explore in this paper: the limited visibil- C.2.3 [Network Operations]: Network monitoring; ity of any single vantage point of traffic; and the presence of C.2.5 [Local and Wide-Area Networks]: Internet spoofed IP addresses in packets that can significantly distort results by implying faked addresses are active. We analyze two types of passive traffic data: (i) Internet Background Keywords Radiation (IBR) packet traffic1 [25] captured by darknets Passive measurements; IPv4 address space; Internet address (also known as network telescopes); (ii) traffic (net)flow space; Darknet; Network telescope; Spoofed traffic; Internet summaries in operational networks. We develop and evalu- census ate techniques to identify and remove likely spoofed packets from both darknet (unidirectional) and two-way traffic data. 1. INTRODUCTION AND MOTIVATION Our contributions include: On February 3, 2011, the Internet Assigned Numbers Au- thority (IANA) allocated its last set of available IPv4 ad- • We demonstrate that, surprisingly, passive traffic mea- dresses to Regional Internet Registries (RIR), a historic turn- surements, including both packet-level captures from ing point for the Internet. The address pools of the Euro- darknets and two-way network flow summary records pean and Asian-Pacific RIRs were exhausted in 2012; the from operational networks, can reveal substantial in- other RIRs will likely run out within the next few years. Al- sight into Internet address utilization. though IPv4 address scarcity is now a reality, so is the fact that allocated addresses are often heavily under-utilized. • We introduce and validate heuristics for detecting IP One challenge in managing Internet address space (both address spoofing in captured network traffic data, thus IPv4 and IPv6) is the lack of reliable mechanisms to monitor mitigating their impact on our inferences. actual utilization of addresses. Macroscopic measurement of patterns in IPv4 address utilization also reveals insights into 1IBR is generated primarily by malware. ACM SIGCOMM Computer Communication Review 43 Volume 44, Number 1, January 2014 • We use our traffic data sets to show that combining negatives (tn) and false negatives (fn). We evaluate their passive with active measurements can reduce probing performance using four standard metrics: (by 37.9%) and find an additional ≈ 450K active /24s tp • Precision = tp+fp : fraction of positives that are true not detected as active by ICMP-based Internet census positives. measurement. tp • Recall = tp+fn : fraction of “active”-labeled networks • Our initial results reveal new macroscopic insights into correctly reported as active. Internet address space utilization, such as increasing tn activity within large legacy address blocks; additional • True negative rate = tn+fp : fraction of “inactive”- vantage points would likely illuminate activity in other labeled networks correctly reported as inactive.3 regions of Internet address space. tp+tn • Accuracy = tp+tn+fp+fn : fraction of correctly classi- • We publish comprehensive maps of IPv4 address space fied positives and negatives. utilization, at a /24 granularity, for 2012 [7]. 3. DATASETS 2. FRAMING THE PROBLEM Our passive measurement data include: (i) full packet traces collected from a /8 network telescope operated by To investigate the potential of passive measurements to the University of California San Diego (UCSD) [24]; (ii) provide a census-like snapshot of IP address space usage, we full packet traces collected from a ≈/8 network telescope consider four factors: finding appropriate traffic measure- (space covering 14M different addresses) operated by Merit ment locations; selecting an observation granularity; identi- Network (MERIT) [17]; and (iii) unsampled NetFlow traces fying and removing traffic with spoofed IP source addresses; collected at SWITCH, a regional academic backbone net- and quantitatively evaluating our algorithm including vali- work that serves 46 single-homed universities and research dating against what ground truth data we can gather. institutes in Switzerland [22]; the monitored address range Observation granularity. We analyze the usage of the of SWITCH contains 2.2 million IP addresses, which corre- IPv4 address space with /24 address-block granularity (/24 spond to a continuous block slightly larger than a /11. All blocks, in the following), i.e., we consider a /24 block as ei- datasets are from the same collection period, between July ther active (or inactive) if we observe traffic from at least one 31 and September 2, 2012. (or exactly zero) IP address from that address block. There For comparison, we use the ISI Internet Census dataset is no universal IP address segment boundary space (due to it49c-20120731, obtained by probing the entire IPv4 address sub-netting and varying size of administrative domains), but space [12] during this same time window. Based on the using a /24 granularity mitigates the effects of dynamic but suggestions by the data collector [13], we filtered this dataset temporary IP address assignment (e.g., DHCP), as well as to only those probes that received an ICMP echo reply, and having an intuitive relationship with both routing operations where the source address in the reply matched the target and address allocation policy. address of the original probe. We do not consider negative Measurement location. We devise techniques for two replies (e.g. ICMP time exceeded) since we assume they types of passive measurement data: bidirectional netflow are not reaching the target. In addition, we filtered out data from all nodes in a large live research network (SWITCH, all probes sent to IP addresses with a least significant byte in Switzerland), and raw unidirectional packet traffic data of 0 or 255, which are typically reserved for network and from two large darknets. Section 3 describes these data sets, broadcast addresses respectively [10] and third parties along and others we used to validate our inferences. the path may intercept and reply to such probes. Note that Identifying and removing spoofed traffic. Packets the ISI Census experiment was designed to report at a /32 with source IPs other than those assigned to the sending (host) rather than /24 (subnet) granularity, but we apply host, i.e., spoofed traffic, will skew our address utilization the resulting data set to a /24 granularity analysis. results. Source IP addresses may be spoofed for a variety To establish a set of unrouted IPv4 address blocks, we of reasons, typically to prevent attribution of an attack and 2 take a list of BGP prefixes announced and captured by avoid being caught [6] , but also due to transmission or pro- the route-views2.routeviews.org [3] collector