Who Is Scanning the Internet?
Total Page:16
File Type:pdf, Size:1020Kb
Who Is Scanning the Internet? Roman Trapickin Betreuer: Oliver Gasser, Johannes Naab Innovative Internet-Technologien und Mobilkommunikation SS2015 Lehrstuhl Netzarchitekturen und Netzdienste Fakultät für Informatik, Technische Universität München Email: [email protected] ABSTRACT ning activities in section 2. Popular software tools will be Today’s port scanning software is able to perform massive discussed in section 3. In section 4 we will talk about the port scans up to the complete IPv4 space within minutes. individuals and organizations, which openly perform Internet- Once a device is connected to the Internet, it will be almost wide scans. We will focus on understanding their intentions, immediately scanned for open ports and services. Most of used tools and techniques. Finally, we will endeavor a clas- these scans are done by anonymous individuals, in particular sification model for these entities in section 5. As a short targeting at finding and exploiting system vulnerabilities. supplement we will revitalize the discussion about the legal However, there is also a variety of individuals and organiza- and ethical aspects of (massive) port scanning in section 6. tions openly practicing massive port scanning and pursuing different objectives. In this paper we will introduce several 2. SCANNING BASICS scanning entities, while emphasizing their motives. In addi- The term “Internet(-wide) scan(ning)” describes a port scan- tion we will propose an entity classification model. In order ning procedure performed by a single or multiple scanning to endorse further understanding of the topic we will also entities on a considerable amount of hosts. There is no clear discuss port scanning as a measurement discipline as well as requirement or specification at which point scanning multiple introduce the contemporary port scanning software used for hosts may be dubbed an “Internet scan”, however scanning Internet-wide scans. the complete IP spaces of regional Internet registries or even countries definitely falls into this term. In this section we Keywords will shortly describe the main reason behind such massive port scan, internet-wide scan, measurement, entities, nmap, port scanning initiatives and also port scans in general as zmap, masscan, classification, caida, sonar, sipscan, conficker, well as introduce main concepts and terms used throughout shodan, hacienda, legality, ethics this paper. Although rudimentarily explained in this section, the real motives will be introduced in sections 4 and 5. 1. INTRODUCTION The main purpose of port scanning is gathering information IPv4 address space consists of 232 or almost 4.3 billion possi- about offered services from hosts connected to a network. ble IP addresses, which is within the giga order of magnitude. This is done via sending probe messages to targeted hosts Today’s computers have CPUs with gigahertz clock rates and and prompting a response later on. As the name suggests, gigabytes of RAM and storage. Today’s networks allow band- port scanning is centered around Transport layer ports and widths exceeding 1 gigabit per second. This makes iterations hence mostly based on the Transmission Control Protocol over the entire IPv4 space possible within a comparatively (TCP). Nevertheless both adjoining Network and Application short time period. By comparison, IPv6 address space con- layers often provide additional information about the targeted sists of 2128 or nearly 340 undecillion (1036) addresses, which system. is rather unlikely to iterate over entirely in the nearest future. Since the massive transition to IPv6 has not yet begun and 2.1 Port Status IPv4 is still by far the dominant Internet protocol,[18] we First, we explain what kind of knowledge a scanning entity now have a unique opportunity to reach every IPv4 address can receive from a TCP segment. The essential step in on the Internet in reasonable time. The year 2013 has seen (Internet-wide) port scanning is to determine the port status ZMap and masscan emerging, two port scanning tools, which of a remote host:[24] promised to port scan the entire Internet in under an hour on consumer hardware. Particularly ZMap has received an open port indicates that an application is accepting con- extensive IT media coverage,[2][10][31] which once again nections on this port. Finding an open port is the main raised the discussion about port scanning and associated goal for a scanning entity, since it discloses the most threats. knowledge about targeted host. closed port indicates that there is no active application In this paper we will attempt to answer the question “Who on the other end. Bearing significantly less information is scanning the Internet?” by giving an insight into the cur- about the host, a closed port could still provide some clues rent trends of massive port scanning. First, we will briefly about e. g. the operating system by fingerprinting the introduce the technical terms used for describing port scan- TCP/IP stack. Various operating systems exhibit char- Seminars FI / IITM SS 15, 81 doi: 10.2313/NET-2015-09-1_11 Network Architectures and Services, September 2015 acteristic behavior while generating the freely selectable An extreme example is a /0 vanilla scan. As mentioned TCP/IP fields, so that collecting multiple responses and before, large block scans are poorly scalable, since includ- matching them against a database with known finger- ing a single host to the target domain results in up to 216 prints makes OS detection possible. additional ports. In practice only a small number of ports undetermined port status is the least desirable message per host is scanned. Hence an Internet-wide block scan for a scanning entity. The port scanning software does may rather be considered as a series of few horizontal not receive any response. In most cases the port is fil- scans or alternatively as a /0 strobe scan. Such scans tered – protected by a firewall. Some port scanning are as well interesting for attackers aiming at multiple software, most prominently nmap, which offers various vulnerabilities and security experts doing global research, scanning techniques that – under favorable circumstances – but also useful for various service discovery tasks. may return more informative results. As a matter of consequence, the entire scanning process slows down dras- Figure 1 visualizes the three types of scanning in terms of tically. The scanning techniques will be introduced in targeted scope. section 3.1. 216 In addition to Transport and Application layer knowledge, horizontal vertical block a scanning entity can acquire relevant information from the Network layer. Querying a WHOIS database with targeted IP address will provide the scanning entity with approximate host geolocation and ISP data among other things. A variety of WHOIS services is publicly available on the Internet. In general, retrieving a short description of running services is called banner grabbing. ports 2.2 Defining the Scope of a Scan The next step is to define the scope of Internet scanning. This scope is defined by two dimensions, namely ports and hosts. A na¨ıve calculation for IPv4 address space for every port results in 232 216 = 248 communication endpoints · to be scanned. However, subtracting private IP ranges as 32 well as blacklisting some IP addresses will not change the 0 IPv4 addresses (hosts) 2 order of magnitude. By comparison, brute-forcing a 48-bit cryptographic key space is considered feasible today,[6] but Figure 1: A simplified visualization of scan types still only on dedicated hardware.[29] Even though iterating over 248 combinations alone can take significant amount of time on consumer-grade machines, actually it is the bandwidth 2.3 Measurement Standards that creates the major bottleneck. While scanning a remote Finally, we will discuss port scanning from a scientific, or host on the Internet, the scanning performance depends on more precisely, experimental point of view. Regardless of bandwidth of every path segment a packet has to traverse. the real motivation of a scanning entity, a (massive) port In fact, there are various factors, such as congestion control, scanning initiative is a quantitative research, and, as such, it is which negatively affect the overall bandwidth. As a result, subject to several measurement and quality standards. Vern such scans are indeed possible, albeit rather impractical. Paxson from the International Computer Science Institute Instead, scanning entities concentrate their effort on relevant in Berkeley, California has proposed the following aspects ports, so that time to completion can be thoroughly improved. for sound Internet measurement:[25] There exist three major approaches to reduce the scanning scope:[22] Precision describes a degree of relative proximity between individual measured values. A telling example is the Horizontal scan describes a port scan performed for the sampling rate of a continuous signal, whereas Paxson same port on multiple hosts. An extreme example of a cites an example of clocks with 1 μs and 1 ms precision horizontal scan is a /0 scan (entire IPv4 space) on a each as well as filtering responses. He also states, that, single port. Horizontal scans are often used by attackers concerning the Internet measurement, precision is “readily to detect open ports on a large number of machines in apparent” and hence is rather of secondary importance. order to exploit vulnerabilities of the listening application. However, according to Paxson, a sound measurement In turn, such scans can be used for massive security audits, study should at least respond to precision concerns. i. e. measuring global distribution of a vulnerability. Metadata is the data that accompanies the actually mea- Vertical scan describes scanning multiple ports on a single sured data. A prominent example of metadata preser- host. Scanning every port out of 216 of a host is called vation are (human-readable) logs which save additional vanilla scan, while scanning a small subset is dubbed information during the measurement. Metadata is an strobe scan.