Muninn Monitoring Changes in the Icelandic through Repeated Port Scanning

Alex Már Gunnarsson Níels Ingi Jónasson Ingólfsson

Thesis of 12 ECTS Bachelor of Science in Computer Science

May 2019 Muninn Monitoring Changes in the Icelandic Internet through Repeated Port Scanning

Alex Már Gunnarsson Níels Ingi Jónasson Sindri Ingólfsson

Thesis of 12 ECTS credits submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science

May 2019

Supervisor: Gylfi Þór Guðmundsson

Examiner: Marcel Kyas

Advisors: Theódór R. Gíslason Hlynur Óskar Guðmundsson Acknowledgements We would like to give special thanks to these individuals and organizations. Syndis was very generous to accomidate us by providing us with an office space, lunches and caffeine as well as advice from experts in the field. CERT for showing great interest in the project and providing advice regarding ethical concerns. Opin Kerfi was nice enough to allow us to perform our scans when no other service provider was willing to host us. Hlynur Þór Óskarsson for taking time out of his busy schedule to provide us with guidance on a weekly basis. Theódór Ragnar Gíslason for encouragement and being there when we needed to consult his expertise. Gylfi Þór Guðmundsson for being constantly ready to help and molding our mass of stupid ideas into good ones.

i [This page is intentionally left blank]

ii Contents

1 Introduction 1

2 Background 2 2.1 Standards and Definitions ...... 2 2.2 Internet Census 2012 ...... 5 2.3 Shodan ...... 5 2.4 Application for Historical Service Assessment (AHSA) ...... 6 2.5 Heimdallur ...... 6

3 Analysis of the Icelandic Internet 6 3.1 Problematic Firewalls ...... 7 3.2 Scanning the full port range ...... 8 3.3 Banners versus CPEs ...... 9 3.4 Noteworthy banners ...... 11 3.5 Scan Delta ...... 11

4 Design and Implementation of Muninn 15 4.1 Design ...... 15 4.2 Implementation ...... 17

5 Evaluation and Results 20 5.1 Selective scanning with Muninn ...... 20 5.2 Targeted Scanning with Muninn ...... 25

6 Discussion 26 6.1 Ethics ...... 26 6.2 Computer Emergency Response Team (CERT) ...... 27 6.3 Limitations ...... 27

7 Future Work 27 7.1 Automated tracking of changes ...... 28 7.2 Deeper scan ...... 28 7.3 Distribution of the platforms ...... 28 7.4 CPE extraction ...... 28 7.5 Going beyond ...... 29

8 Conclusion 29

iii Abstract The world is becoming ever more connected. Home routers, webcameras, databases, TVs and even garden sprinklers are all examples of devices that are now connected to the Internet. In this connected landscape hackers constantly look for vulnerable devices. A single version upgrade can mean the difference between a safe and compromised machine. In this paper we analyse the data made available by the port scanner Heimdallur. We aim to answer our research question: Can we monitor changes of the Icelandic Internet in semi real time through repeated port scanning? We constructed a new port scanner Muninn which utilizes historical information to scan even faster than previously possible. Muninn has two main uses. Firstly Muninn can obtain an updated view of all responsive Icelandic Internet services in just a few hours. This allows us to monitor any changes happening on the Internet. Secondly Muninn can find and monitor any specific set of services very closely. This can be crucial following the discovery of a vulnerability. It enables us to track any abnormal activity and see exactly for how long the machines remained vulnerable before updating to a safe version.

iv 1 Introduction

The Internet is a contraction of the words "interconnected network" and it is in this connectivity that the Internet’s greatest strength and its greatest security risks lie. Be- cause of this any service on the Internet is by design open to everything else unless it is specifically closed or hidden. IoT analytics estimated that the number of connected devices in 2018 exceeded 17 Billion [13]. With so many open and connected devices any- one with the mind for it can take a look or even interact with poorly configured services. This inevitably leads to many services being left unintentionally open, often with severe consequences. This was the case in 2012 when an anonymous researcher decided to scour the Internet for insecure devices. He managed to gain unauthorized access to around 420 thousand devices and turn them into a under his control known as the Carna Botnet [2]. He then decided to use this botnet to scan the entire internet within an hour and later that year published his findings. Again four years later another person going by the name janit0r created a similar botnet [12]. However he had a more malicious intent, over the course of 13 months janit0r managed to destroy 10 million devices around the world mostly devices like modems, routers and gateways but also Hikvision and Dahua web cameras. The botnet was aptly named BrickerBot and caused a lot of commotion [21]. Considering all of this, it is no surprise that scanning tools which find and identify running services have become standard for security experts and hackers alike. Many tools which facilitate this like Nmap [14] have been around for more than 20 years. Today it is easier than ever to search the Internet for connected devices. In fact services like Shodan scan the Internet and then allow their users to search through the accumulated data [25]. In late November 2018 a person going by the name Hacker Giraffe used Shodan to find printers open to the Internet and with just a few lines of code 50,000 printers started printing a custom made message [1]. In an interview Hacker Giraffe said "I’m usually lurking around Shodan... I’m usually just searching around looking for something to mess with. I was really looking for some protocol that should not be opened to the Internet" [22]. This interview does not only reflect how easy it is for anyone to find vulnerable connected devices but also how dangerous and simple to abuse such information can be. It is clear that keeping a close eye on the status and responses of connected devices is crucial for today. Plenty of research has been done on the state and security of the Internet but this research might not apply to the Internet of a small community like Iceland. There is a lack of research on Internet security in Iceland specifically and our project will aim to fill that gap. What most scanners today have in common is that they only provide a single snap- shot of the Internet. However the Internet is ever changing and such a snapshot lacks context of change and direction. The sudden appearance of multiple routers or webcams responding to a scan might indicate possibly vulnerable consumer devices. Additionally by monitoring changes one might be able to glimpse how fast patches rollout following

1 their release. Knowing this can be crucial in case said patches resolve a critical security vulnerability. This information can for example be used to estimate the probability of a security breach by determining how long services remain vulnerable and if some other services were hacked during that time. Everyone seems to agree that the internet is ever changing and growing yet there is barely any research on how the services on the internet change as time progresses. Our aim is to monitor changes in the internet in as close to real time as possible so that we can see how the Internet changes and evolves. Inspired by we have named our infrastructure Muninn, after one of the ’s two ravens. The 48th verse of the ancient nordic poem describes the ravens as follows:

Huginn ok Muninn fljúga hverjan dag jörmungrund yfir; óumk ek Hugin, at hann aftr né komi, þó sjáumk ek meir of Munin. [26]

Huginn and Muninn are said to fly over the world and return with news to Odin each and every morning. They ensured Odin always knew what was happening in the world of men. This accurately reflects the design of our scanner Muninn which scans the Icelandic internet and observes changes happening every day. Additionally the scanner should be able to find and monitor specific services, ports or IPs more closely if deemed important or interesting. As we mentioned before, one can do bad things to vulnerable services. As important as this kind of research is, it is equally important to be ethical and to be careful when gathering the information. For this reason we have included a special section on ethics in this report.

2 Background

Services such as Shodan constantly scan the Internet yet there are not many researches which have gleaned into the idea of monitoring the Internet from a security perspective. Actively watching for vulnerable services and monitoring their updates is not common, however there is some similar work in this field which relates to ours. In this section we will define all the background material that we build our contribution on, starting by defining some standards and definitions.

2.1 Standards and Definitions In this section we present key concepts that are needed to understand the research that Muninn is built upon.

2 2.1.1 IP Addresses, ports and CIDR An IP address is used to identify servers that host the various internet services. However due to Network Address Translation (NAT), we can not be sure whether there is a single host or multiple devices behind a single IP address. That is why we will mostly refer to IP addresses, however this should be thought of as interchangeable with hosts or networks. It should also be noted that IP addresses are not always constant. Much like people moving their home address, machines can move from one IP address to another. CIDR is a method of representing IP addresses. It introduces allocation of addresses to organizations [6], which can be very handy as we will find out in section 3.1. Finally each IP address has a total of 65,536 ports available for hosting.

2.1.2 Port Scanners Each internet service must listen for incoming connections on a specific port. A port scanner is an application that is used to scan the Internet for open ports and even poke around to figure out what service is running behind a port. There are two main methods of scanning the Internet that port scanners typically use. First there is a quick and rather non-informative method, called a connectionless scan. Scans of this method work by sending a SYN packet and then closing the connection after the response packet is received. If the response received is SYN-ACK the port is open and if the response is RST the port is closed. The second method is called a connection oriented scan since it finishes the TCP handshake. This method is slow but can give a lot of information about the services running behind a port. We use both of these methods to our advantage as they are embedded in the following tools that we use. Masscan is a free and open source port scanner [10]. Masscan uses its own TCP/IP stack, has similar output to Nmap and can SYN scan the entire Internet in under 6 minutes. We use Masscan as a SYN scanner to determine all open ports, afterwards we do a connection oriented scan on the open ports that Masscan returned. Nmap (Network Mapper) is also a free and open source application for port scanning and service discovery of the Internet [14]. We use Nmap as a connection oriented scanner. Nmap produces a list of scanned targets with information about which ports were open, protocol, services and state of a port. Nmap has a scripting engine that allows users to write scripts that automate a wide variety of network tasks. We use open source scripts that come with Nmap to gather information about what is running behind a host.

2.1.3 Banners and Banner Grabbing A banner in our context refers to the information received from a remote host about the service that it is running for some specified port. This information can contain the name of the service, its version number, the it is running on and more. Banner grabbing refers to the act of collecting this information from a remote host. This is done by connecting to the host and sending various messages in hopes of triggering some sort of response which gives away the aforementioned information. Once done with

3 the banner grabbing process it is possible to process them further, like extracting CPEs that are discussed in section 2.1.6.

2.1.4 Banner Grabbing Assumptions The banners that we are working with are provided by the hosts that we scan. Because this is a third party we can not be sure that the banners give the absolute truth. It is possible that the banners we receive are not giving the whole picture or are even entirely false. This is the best information we have to work with whilst also being passive in our scanning. It should also be noted that this is the exact same information that malicious people such as hackers would use to their advantage when looking for vulnerable services.

2.1.5 Firewall A firewall is a network security system which monitors incoming and outgoing traffic. Firewalls can filter incoming traffic after many different rules, some more specific than others. Systems which have firewalls running can keep the port scanner from acquiring information about the services that are running, or even fool it into thinking that a closed port is open. This behaviour can trick the scanner into trying to scan thousands of closed and unresponsive ports which results in significant slow down. Later on we will discuss what can be done to mitigate these consequences.

2.1.6 Common Platform Enumeration (CPE) In Common Platform Enumeration: Naming Specification Version 2.3 [5], a CPE is described in the following way.

Common Platform Enumeration(CPE) is a standardized method of describ- ing and identifying classes of applications, operating systems, and hardware devices present amongan enterprise’s computing assets. CPE can be used as a source of information for enforcing and verifying IT management policies relating to these assets, such as vulnerability, configuration, and remedia- tion policies. IT management tools can collect information about installed products, identify products using their CPE names, and use this standardized information to help make fully or partially automated decisions regarding the assets.

A well-formed CPE name (WFN) is a representation of a product. A WFN can include the name of the vendor, the name of the product, the version number and what type of software it is either "a" for applications, "o" for operating systems or "h" for hardware devices. Here is an example of a WFN representing an Apache http server version 2.2.24:

wfn:[part="a",vendor="apache",product="http_server",version="2.2.24"]

4 A WFN can then be turned into a machine readable encoding by binding procedures. There are two binding procedures defined in the CPE Naming Specification one is a Unifrom Resource Identifier (URI) and the other is a format string identifier. The URI is the procedure which we use for binding the WFN to a machine readable encoding. The example above would become:

cpe:/a:apache:http_server:2.2.24

2.1.7 Common Vulnerabilities and Exposures (CVE) The Common Vulnerabilities and Exposures list was created with the purpose of stan- dardizing vulnerability naming in mind. Before CVE became a standard different groups of people used different standards of naming vulnerabilities. This became confusing quickly. The CVE list is designed to be a dictionary where each vulnerability has one name and a standardized description. The list of CVEs gathered over the years is freely accessable on the internet [15].

2.2 Internet Census 2012 In 2012 an anonymous researcher decided to scan the Internet for services with easy to guess default passwords. After some time he found thousands of insecure devices and decided to upload a simple binary to these devices to create a distributed port scanner for more insecure telnet services. The number of insecure devices under his control grew exponentially eventually leading to him having control of a botnet which comprised of around 420 thousand devices. This botnet is known as the Carna Botnet [2]. After gaining control of all of these devices the researcher decided to upload a down- sized version of Nmap to each one and started to scan the entirety of the Internet. The researcher was now able to scan the entire IPv4 range within an hour and in doing so give a map of the whole Internet in a way nobody had done before. This gave people a complete perspective of the state of the Internet at that time. The researcher states multiple times in his paper that he did this in the most ethical way he could. He did this by not changing or disturbing any normal flow of the system and only allowing the binaries he uploaded to stay on the computer for a couple of days at a time. Even though the researcher thought himself to be ethical and did not do anything malicious it is still illegal to gain unautorized access to devices. Trying to replicate work like this is out of the question for this reason. Even though the methods used to conduct the research are illegal, this gave people a unique way to look at the exact state of the Internet for a specific date for the first time and serves as an important milestone of Internet scanning. However this did not monitor the Internet and how it changes over time.

2.3 Shodan Shodan was launched in 2009 by a programmer named John Matterly. It is the world’s first search engine for Internet connected services, sometimes called the search engine for

5 service banners. Shodan keeps an up to date state of the Internet by conducting regular port scans. Users of Shodan can search for all kinds of devices including webcams, routers, traffic lights, home heating systems, servers and much more after paying a subscription fee [25]. The people behind Shodan have even done vulnerability assessment, the most well known being the assessment of the Heartbleed vulnerability [24]. Users can also do their own basic vulnerability assessment by for example looking up how many devices are open to the Internet with default passwords. However this is not their focus, their main focus is being a simple search engine for Internet connected services.

2.4 Application for Historical Service Assessment (AHSA) In 2015 B. Genge et al. [9] proposed AHSA which is an application that communicates with the Shodan API and retrieves historical data on the list of target services. Their research is quite similar to ours, their technique is passive like ours and gathers histor- ical data of services just like ours. However they were only interested in the potential of infering the lifetime of different Internet-facing services. We are more interested in monitoring services to show the vulnerability of the Internet developing from one state to another over time. Another difference is that they used Shodan while we perform our own scans.

2.5 Heimdallur This project is based on a BSc final project from 2017 called Heimdallur [11]. The Heimdallur project involved creating a port scanner to scan the Icelandic Internet and connect extracted information to CVEs. The CVEs were then used to indicate the potential vulnerability of each scanned system. This existing infrastructure was used and expanded upon in this project. The previous group managed to perform three scans, two of them were scanning the top 1000 ports whilst the third one scanned the top 7,300 ports. After the project finished one of the group members initiated another scan which scanned the whole port range. This means they have gathered a lot of data that has yet to be analyzed. This fact was indeed discussed in their report, under the future work chapter, along with other much smaller goals. The analysis of this data will be foundational to our work and the contribution described in this report.

3 Analysis of the Icelandic Internet

On December 20th 2017 a scan of the full port1 range for all IP addresses in Iceland was initiated using Heimdallur. This scan, referred to here as Scan 9, took just over 6 months to finish and it provided some useful and previously unavailable information. When we started to look at the data from Scan 9 we noticed some alarming facts.

1Note that whenever we talk about ports we are always referring to TCP ports unless stated otherwise.

6 1. Around 78% of the open ports were actually closed/unresponsive and had been reported open by a firewall.

2. Only around 0.45% of the "open" services returned banners.

3. Scanning the entire Icelandic IP range took around 6 months, which is way too long for practical purposes.

In this section we will discuss our findings from analysing the information gathered by the scanner Heimdallur. For this purpose we used data from old scans already in the database as well as conducting some new scans ourselves. First we show how we found the aforementioned alarming facts then we will talk about what data is important and what we are actually interested in. Finally we will discuss in depth our findings from comparing the differences between two consecutive scans, Scan 13 and Scan 15.

3.1 Problematic Firewalls The Heimdallur project mentioned firewalls and network level access control as limita- tions to the project. When we took a look at the data from Scan 9 we noticed a strange pattern in the number of responses from each port that seemed highly unlikely. Count- ing the number of responsive IP addresses for each port showed us that 47,833 ports responded around 620−680 times. Only 3,109 ports had a response count outside of this range. This indicated that there was a group of around 640 IP addresses that responded as open on almost all ports yet never returned a banner. After consulting with experts in the field we figured out that these responses stemmed from a giant firewall. When looking into the associated CIDR we found that these IP addresses all belonged to the Icelandic internet service provider Síminn. Scan 9 returned approximately 32.6 million results, each corresponding to an open port in Iceland. Approximately 30.6 million of these results belonged to the aformen- tioned group of IP addresses. This indicated that there is a lot less to gain from scanning the full port range than we initially thought. This is discussed further in section 3.2.

3.1.1 Attempted mitigations In light of the impact that the firewall had on the speed of scanning and complete lack of data received from those IP addresses, we decided to attempt to detect firewalls in our scans. We hoped this would save time and make our scanner more scalable for future potential firewalls. We attempted well known firewall detection methods and hoped that Síminn had configured their firewall badly. Here are the methods we attempted and a short explanation for them.

1. Sending invalid TCP checksums: Normally if you send an invalid TCP check- sum the host will not reply. However there are cases where firewalls do not verify the packets and simply always reply. This did not work in our case since the firewall verified the packets.

7 2. Checking for shorter than expected round trip times: This can work since firewalls are at least one hop closer than the target host, however the time difference was negligible and unstable.

3. Checking for TTL consistency: By tracing the route we can look for inconsis- tent jumps or the firewall might even be noticeable in the route. This method did show promise but it was only understandable intuitively and it was out of scope to automate this process as it was quite difficult.

After our failed attempts we finally decided not to linger too much on this and simply exclude the Síminn firwall IP range from all future scans. This sped up the full scan of the Icelandic port range down to around 4 months which is still too slow for our purposes.

3.2 Scanning the full port range

Non-empty Unique Unique Scan ID Ports Banners Start time Time taken Banners Banners CPE 5 Nmap top 1,000 ports 602,296 53,030 4,784 1,345 2017-10-27 7 days 02:05:00 7 Nmap top 7,342 ports 607,743 66,680 4,667 1,320 2017-11-29 11 days 8 Nmap top 1,000 ports with additions 632,937 77,891 4,673 1,314 2017-12-11 2 days 9 All ports 32,592,961 148,110 3,807 1,107 2017-12-20 185 days 11:00:00 13 Top 1024 open ports in scan 9 473,340 95,119 3,777 1,059 2019-02-11 6 days 13:15:00 15 Top 1024 open ports in scan 9 595,725 76,001 3,687 1,039 2019-04-18 9 days 00:02:00 16 Top 1024 responding ports in scan 9 679,867 86,563 3,538 996 2019-05-03 5 days 17:13:00

Table 1: Heimdallur scans used and mentioned in this report. They are also used as baseline scans for Muninn. Scan 13 and Scan 15 were used for analysing the Scan Delta in section 3.5. The additions of Scan 8 included 12 commonly responding ports in Iceland.

The Heimdallur project [11] limited its scope to only scanning 1,012 ports arguing that scanning the most active ports will still provide the bulk of available information at the fraction of the time. By analyzing the data from Scan 9 we could put this information to the test on the set of Icelandic IP addresses. Although the scan of all 65,536 ports returned 32,592,961 results only 148,110 (0.45% of the total) gave a non-empty banner. Only scannning the top 1,024 (Scan 13) most open ports gave 95,119 non-empty banners but still provided 3,777 unique banners or 99.2% of the unique banners found from scanning the full port range. Although the full port scan found more non-empty banners than the top 1024 port scan this difference can largely be explained by the presence of what we assume to be honeypots 2 and IPs with all ports open but the same banner returned from every single port. Finally there is negligable difference between the number of unique banners. This indicates that the top 1,024 port scan gives an accurate view of the state of the Icelandic Internet but in a much more reasonable timeframe. Due to the fact that the Scan 9 took 6 months to finish, the data does not accurately represent a snapshot of the Internet for some specific moment in time. The scan results

2Honeypots are servers set up with with seemingly vulnerable services. They are used to attract, detect and monitor hackers scouring the internet for easy targets [7]

8 Top 10 most responsive ports Ours scans Nmap 7547 80 80 23 443 443 22 21 8080 22 21 25 25 3389 53 110 23 445 2000 139

Table 2: Top 10 most responsive TCP ports according to previous scans compared with Nmap’s default top 10 TCP ports [17]. trickle in over a long period of time so two banners can be from completely different time periods. As discussed in section 3.5 the Internet changes rapidly and thus such a slow scanner is very limited in its uses.

3.2.1 Most responsive ports Despite its flaws the information obtained from Scan 9 is still valuable. For example it can show us the most responsive ports 3 in Iceland, one of them being the number one most responsive port 7547. This port was not included as one of Nmap’s top 1,000 ports but hosts around 25,000 Huawei TR-069 consumer routers in Iceland. When comparing the top 10 ports, see table 2, we see that the top 10 ports list of Nmap is quite close being correct for Iceland, differing only by 3 ports [17]. However it is clear that if the scanner were to be expanded to other countries scanning the full port range at least once is an important step to reveal anomolous ports like 7547. The port 7547 returned among other things a banner containing information that the service behind the port is made for routers and is following the TR-069 protocol. This protocol allows routers, which are referred to as Customer Premises Equipments, to receive remote firmware updates, vendor configuration files, commands to update software modules and more [8].

3.3 Banners versus CPEs The preliminary project Heimdallur extracted CPEs from banners to be able connect services with vulnerabilities. This extraction however was not perfect and could only extract some CPE out of 76% of the banners, which were produced in a controlled environment. This corresponds with the data from Scan 9, out of all the banners around

3We define the most responsive ports as those who respond most frequently with a non-empty banner

9 81.9% were able to get some CPEs extracted. However in the preliminary project it was discovered that the CPE extraction provided by Nmap was only accurate for about 30% of the services [11]. Although only 30% of the CPEs are accurate the other 70% can still give us an idea what is behind the banner. In figure 1 we can see that the most common non-empty CPE is cpe:/o:microsoft:windows, which is very general. The third most extracted CPE is cpe:/a:microsoft:iis:8.5 which is more specific and enables further inspection of the service at hand. Because the banners are unstructured the CPEs can enable us to compare the state of the Internet more easily. However there is significant information that gets lost during this extraction. Often version numbers, important information or even the whole banner is removed during the extraction. For example figure 1 shows that most common banner,

Figure 1: Barchart containing the top 10 most common banners on the leftside and the top 10 most extracted CPEs on the right

10 the aforementioned router using the TR-069 protocol is nowhere to be found in the top 10 CPEs. When using CPEs to represent states we must be aware that it may in fact be too unspecific and contain too little information. The raw banners contain a lot of data but like mentioned before they are mostly un- structured making them more difficult to analyse and compare. There are some banners that may have the same format but do not contain the exact same information. For instance we have the banners product: Hikvision IP camera httpd devicetype: webcam and Hikvision camera httpd devicetype: webcam. There were no CPEs ex- tracted from these banners but by looking at them they seem to describe the same services. In the end we decided to focus on banners rather than CPEs. The information lost by the CPE extraction was simply too great to justify. Using the CPEs was important for the Heimdallur project since it allowed them to connect services to vulnerabilities but this was not the focus of our research.

3.4 Noteworthy banners While analysing the data there were some banners in the system that stood out. The Hikvision camera httpd devicetype: webcam banner mentioned earlier is alluring thanks to the words camera and webcam. This indicates that this is indeed a webcam open to the internet. For ethical reasons we did not attempt to access the camera but given that a cyber criminal would access this information he could perform unethical methods to access the feed of the camera. Another noteworthy banner is

product: Apple Time Capsule AFP extrainfo: name: anonym-name; protocol 3.3 hostname: XXX.XXX.XXX.XXX devicetype: storage-misc 4

An Apple Time Capsule is a product from Apple that allows the owner to wirelessly backup other products from the company via Wi-Fi. According to the Setup Guide this is only meant for devices on your network, we find it intriguing that this is after all open to the internet, even sending out banners that contain information about the name of the device [3]. This begs the question why are these devices open to the internet? Again to malicious eyes this can be a potential access point into networks and devices. Considering that the service is open to the internet extends the attack surface of the network the Time Capsule is located in.

3.5 Scan Delta We wanted to see if meaningful changes could be detected by comparing two identical scans conducted at different times. For this purpose Scan 13 and Scan 15 were run 70 days apart and the resulting data was compared. The first observation of interest is the

4For ethical reasons we are not showing the true hostname and name of the device in question.

11 difference in banner counts between the scans as can be seen in table 3. Scan 15 returned 26% more total banners but yet 20% fewer non-empty ones. Scan 13 also had slightly more unique banners and CPEs but the difference is less drastic. There are 3 plausible explanations for these differences.

1. It is likely that some companies or individuals might have noticed the increased scanning activity from our IP address and blacklisted it. We did not resort to spoofing our IP address so this might have resulted in a loss of banner information.

2. During Scan 15 the connection oriented scanner (Nmap) froze for three days with- out us noticing. This caused a time gap between the server-discovery phase and the banner grabbing phase which can result in an open target service to have already closed its port by the time the banner grabbing was done.

3. Finally the most likely reason is simply that the set of connected devices is ever changing hence the difference.

Non-empty Unique Unique Scan ID Ports Banners Start time Time taken Banners Banners CPE 5 Nmap top 1,000 ports 602,296 53,030 4,784 1,345 2017-10-27 7 days 02:05:00 7 Nmap top 7,342 ports 607,743 66,680 4,667 1,320 2017-11-29 11 days 8 Nmap top 1,000 ports with additions 632,937 77,891 4,673 1,314 2017-12-11 2 days 9 All ports 32,592,961 148,110 3,807 1,107 2017-12-20 185 days 11:00:00 13 Top 1024 open ports in scan 9 473,340 95,119 3,777 1,059 2019-02-11 6 days 13:15:00 15 Top 1024 open ports in scan 9 595,725 76,001 3,687 1,039 2019-04-18 9 days 00:02:00 16 Top 1024 responding ports in scan 9 679,867 86,563 3,538 996 2019-05-03 5 days 17:13:00

Table 3: Heimdallur scans used and mentioned in this report. They are also used as baseline scans for Muninn. Scan 13 and Scan 15 were used for analysing the Scan Delta in section 3.5. The additions of Scan 8 included 12 commonly responding ports in Iceland.

3.5.1 Banners rate of change Out of all the ports which reported as open in Scan 13, 84.1% were still open 70 days later. However this number goes down to 67% when only considering non-empty banners. This indicates that the set of banners changes quite rapidly despite the fact that the number of unique banners barely changes between consecutive scans as can be seen in table 3.

3.5.2 Types of banner changes Just as there is no way to verify with certainty that the banner extracted from a service is correct there is no way to verify that when a banner changes the underlying service does as well. However when a banner changes it is highly likely that it is the result of some change in the service. The nature of the banner change can often strongly hint towards the changes that have been made. Table 4 shows examples of interesting banner changes between Scan 13 and Scan 15 as well as how often these specific changes happened. These banner changes are discussed below.

12 Count Banner Scan 13 Huawei TR-069 remote access devicetype: broadband router 1 368 Scan 15 Scan 13 3CX PhoneSystem PBX version: 15.5.15502.6 ostype: Windows 2 5 Scan 15 extrainfo: SIP end point; Status: 200 OK Scan 13 Apache httpd 3 5 Scan 15 Apache httpd version: 2.4.29 extrainfo: (Ubuntu) Scan 13 version: 1.6.2 4 2 Scan 15 nginx Scan 13 MySQL version: 5.6.43 5 3 Scan 15 MySQL extrainfo: unauthorized Scan 13 MySQL extrainfo: unauthorized 6 1 Scan 15 MySQL version: 5.6.16-1~exp1 Scan 13 7 6 Scan 15 Hikvision IP camera httpd devicetype: webcam Scan 13 Axis 210 Network Camera ftpd version: 4.40.2 extrainfo: Nov 04 2008 devicetype: webcam 8 1 Scan 15 Scan 13 DD-WRT milli_httpd hostname: myndavel 01 9 2 Scan 15 Scan 13 Microsoft SChannel TLS ostype: Windows 10 2 Scan 15 Microsoft-HTTPAPI/2.0 Scan 13 MailEnable smptd version: 9.75– ostype: Windows hostname: HOVUS.home 11 4 Scan 15 MailEnable smptd version: 10.12– ostype: Windows hostname: HOVUS.home Scan 13 Cisco ASA SSL VPN 12 1 Scan 15 Microsoft IIS httpd version: 8.0 ostype: Windows

Table 4: Banner changes from Scan 13 to Scan 15

1. The Huawei router banner is by far the most common banner in Iceland. These routers which are very common in peoples’ homes almost always have port 7547 open to the Internet. When observing consecutive scans it is common to see many such routers appear and disappear.

2. PBX phonesystems are a profitable target for hackers who use the connected phones to call their own pay-per-minute numbers by the thousands. Here we can see a PBX system in Scan 13 which changes to a cryptic response in Scan 15. It is uncertain whether the underlying phonsystem is changed but a SIP endpoint still hints that the PBX system remains in place [19].

3-4 Show how banners can become more specific or less specific when changed.

5-6 Show when our scanning behavior might trigger firewalls on open SQL servers who block the connection preventing further information.

7-9 Show webcameras appearing and disappearing in our scans. For security reasons webcameras open to the Internet are especially concerning. However item 7 is even more so as it shows how 6 different ports have previously responded as open in our scans but displayed no further information. In a follow-up scan the cameras suddenly appear possibly due to some change or firmware update. The owner of said cameras might not be aware of their sudden appearance as open devices out on the Internet.

13 10 Shows a change where both banners are similar but not equal. The exact nature of the change is hard to determine.

11 Indicates a simple version upgrade on a MailEnable server.

12 These Banners are hard to analyse exactly but suggest a complete service change.

3.5.3 Version updates One of our main goals for comparing Scan 13 and Scan 15 was analysing if we could observe version number changes between the scans. Knowing how and when version numbers change can be crucial following the discovery of a new vulnerability as it can indicate how many systems are vulnerable at any given time and how long it took them to move to a safe version. Out of all the banners in Scan 13 20,516 banners included version numbers and 1,321(6.4%) had been changed 70 days later. In total we saw 401 unique version updates. Some examples of version number updates are listed in table 5. The most obvious of these is the massive number of SQL servers which were updated to version 5.6.43. This version was released 21st of January 2019, a month before we started scanning, and included security updates to the linked OpenSSL library [18]

Count Banner Scan 13 MySQL version: 5.6.41 1 276 Scan 15 MySQL version: 5.6.43 Scan 13 MySQL version: 5.6.39 2 17 Scan 15 MySQL version: 5.6.43 Scan 13 Apache httpd version: 2.4.38 extrainfo: (cPanel) OpenSSL/1.0.2q mod_bwlimited/1.4 3 26 Scan 15 Apache httpd version: 2.4.39 extrainfo: (cPanel) OpenSSL/1.0.2r mod_bwlimited/1.4 Scan 13 nginx version: 1.13.12 4 24 Scan 15 nginx version: 1.15.9 Scan 13 Kerio Connect imapd version: 9.2.8 5 9 Scan 15 Kerio Connect imapd version: 9.2.8 patch 1 Scan 13 Kerio Connect imapd version: 9.2.8 patch 1 6 4 Scan 15 Kerio Connect imapd version: 9.2.9 Scan 13 AirTunes rtspd version: 375.3 7 4 Scan 15 AirTunes rtspd version: 380.20.1 Scan 13 MikroTik router ftpd version: 6.43.7 hostname: zurich5.fink.network devicetype: router 8 3 Scan 15 MikroTik router ftpd version: 6.44 hostname: zurich5.fink.network devicetype: router Scan 13 OpenSSH version: 7.7 extrainfo: protocol 2.0 9 3 Scan 15 OpenSSH version: 7.9 extrainfo: protocol 2.0 Scan 13 KiwiSDR_Mongoose/1.264 10 3 Scan 15 KiwiSDR_Mongoose/1.282 Scan 13 aiohttp version: 3.5.4 extrainfo: Python 3.6 11 1 Scan 15 Python/3.7 aiohttp/3.5.4 Scan 13 Apache httpd version: 2.4.33 extrainfo: (Unix) 12 1 Scan 15 Apache httpd version: 2.4.34 extrainfo: (Unix)

Table 5: Examples of banners indicating version updates from between 17th of Febuary to the 27th April

14 4 Design and Implementation of Muninn

4.1 Design From the discoveries made in the preliminary data analysis phase our research question was born. Can we monitor changes of the Icelandic Internet in semi real time through repeated port scanning? 5 Heimdallur is a great system for checking the current status of the Internet, however we sought the ability to monitor the changes of the Internet over time. To succeed in this we had to design a system which can scan in less than a day and do so repeatedly. This is why we devised Muninn a new system which is a branch of Heimdallur. There are two main goals we wanted to achieve with Muninn. Firstly the ability to quickly get a reasonably accurate, updated state of the Icelandic Internet so that such states can be gathered daily. Secondly if the need arises we wanted Muninn to be able to monitor very specific services by using the historical scan information to track them. It was also very important that Muninn would be non-invasive, ethical and scalable.

4.1.1 Motivation of Muninn As shown in section 3.5 there is a wealth of changes observable when comparing consec- utive scans, both general changes and version updates. However the changes observed could have happened anytime between the 17th of Febuary and the 27th of April, a 70 day time period. Even if we ran Heimdallur constantly we could at best complete a scan every week. As discussed in section 3.5.1 the Internet changes fast and any new infor- mation is only as relevant as it is new. Time is probably the most important context for changes. Only by knowing when changes occur can we start to understand why they did and what that implies. By monitoring the Icelandic Internet we can see what changes happen and also when they do.

4.1.2 How is Muninn different from Heimdallur At its core Heimdallur discovers open services on the Internet and then extracts some information from them. Muninn on the other hand receives information about the last known state of the Internet and monitors some areas of interest. This can be everything from the whole set of responding ports to only specific versions of mailing servers. Muninn relies on Heimdallur to do its initial discovery of the internet for it but it is capable of scanning much quicker and track changes in a way Heimdallur never could. However because of this the quality of information gathered by Muninn is directly tied to the age of the last discovery scan (here on out called baseline scan) by Heimdallur. We will explore this connection in detail in section 5.1.1.

5Note what we mean by "semi" real time is that we can complete a reasonably accurate scan of the Internet in less than 12 hours.

15 4.1.3 Architecture Here we will discuss the general architecture of the Muninn monitoring system and how it works from a high level perspective. We will also show a simple illustration of the system. Muninn comprises of three main blocks, these blocks are Heimdallur which we refer to as the discovery phase, the monitoring phase and a database. The database has two distinct sections. One section handles data from the discovery phase while the other handles data from the monitoring phase. For the discovery phase we utilize Heimdallur. Heimdallur starts with a list of IP addresses that includes all 910,720 IP addresses of Iceland excluding IPs which are a part of Síminn’s firewall. This list is then shuffled and a connectionless scan is started. The ports that respond as open are then fed into the connection oriented scan which is run parallel to the connectionless scan. The discovery phase finishes within a week. Once the discovery phase is finished the data is processed and inserted into the database. The monitoring phase starts off by querying the database for all responsive ports which were found in the discovery phase. For each port there follows a list of IPs which responded on the respective port. The ports are then distributed into n buckets where n is the number of processes. The buckets are evenly distributed based on the number of target IP addresses, since further parallization is done based on IP addresses. Each process gets their respective bucket and starts a connection oriented scan. Once the scanning is finished Muninn will wait for a set interval before starting another scan.

Figure 2: The architecture of Heimdallur’s data gathering phase (marked as Discovery Phase) along with the architecture of Muninn (marked as Monitoring phase).

16 The interval is by default set to 24 − δ hours where δ is the time it took for Muninn to scan. This waiting period is done because we do not want to be too invasive, plus this will give a good overview of a remote hosts status for each day. Muninn performs a set number of these iterations, once all iterations are finished Muninn stops scanning and starts to process the data. The data is parsed and inserted into the Muninn part of the database. Now the information gathered about these services can be analyzed and a user can view the changes the services took over the course of the monitoring phase.

4.2 Implementation As described in section 4.1, Muninn can be broken into three main parts. The discovery phase, monitoring phase and the database. In this section we will go into more detail of how Muninn was implemented and what sort of issues/dilemmas we faced along the way.

4.2.1 Discovery Phase In the Discovery phase we use Heimdallur’s data gathering phase to our advantage. Because Heimdallur is explained in great detail in the Heimdallur technical report we will not be explaining it in much detail. The Discovery phase consists of three simple steps. As mentioned in section 2.1.2 we use Masscan for the connectionless scan and Nmap for the connection oriented scan. The connectionless scan is used to find all open ports by initializing a new Masscan instance like so:

masscan -c –port –excludefile –output-filename –http-user-agent

The configuration file, exclude file, output file, excuse and port range are provided by the scan manager. The ports that are detected as open by Masscan are then fed into the connection oriented scan. The connection oriented scan receives these ports and starts to run a service detection on them. This is done by starting an instance of Nmap with the following parameters:

nmap -Pn -sV -T4 -oA -p

Because Masscan has already detected the ports as open we add the -Pn flag to disable host discovery. Finally the scan manager is responsible for initating all processes that perform con- nectionless and connection oriented scans. The Heimdallur team modeled this after the producer-consumer problem. Masscan processes produce open ports to run service de- tection on while Nmap processes consume the open ports. This is a great idea, however we noticed that because Masscan is so much faster than Nmap that eventually the ports that the Nmap consumers receive are outdated. This can result in significant speed and information loss. This can be fixed by slowing Masscan down and increasing the number of consumers.

17 Once the Discovery phase is done, the data gets parsed and inserted into the database for the use of the monitoring phase.

4.2.2 Monitoring Phase The monitoring phase is comprised of Muninn. Muninn receives as input a CPE, some form of a banner, service name or even a raw SQL query along with an optional IP or port range. The monitor manager parses the input, opens a connection to the database and queries for all targets which fit the requirements of the input. Note that Muninn only performs a connection oriented scan like the one mentioned in the Discovery phase but does not perform a connectionless scan. This is because there is not any need for it since the data is assumed to be very recent. This has the side affect that we can not use the producer consumer approach as specified in the Discovery phase. The monitor manager instead evenly distributes the targets into buckets using a greedy approach based on the number of IP addresses. Next the manager starts a connection oriented scanning process for each bucket. Each process gets a single port and a list of IPs to scan, this is done because of the parallelization of Nmap. Nmap parallelizes its execution based on IP addresses extremely well. That is why it is optimal to start a few processes with large lists of IPs instead of many processes with a a small list of IPs. Muninn also receives as input the number of scan iterations to conduct and the time that should elapse between each iteration. The time it takes to scan is accounted for when considering the scan interval. A final argument Muninn receives is the number of processes used for the scans. We conducted some experiments and found that 16 processes were optimal on our current hardware when doing a selective scan. However depending on what Muninn’s target set is this number might change.

4.2.3 Database Because we were going to redesign the database and going to set up the project on new machines from CERT we had the opportunity to change the type of database we were using. The main alternative to PostgreSQL was Elastic Search which is capable of very fast full text search which seemed useful to our project. However after further consider- ation we realised that accessing information from the database was never a bottleneck for anything we wanted to do. Additonally we all knew how to work with PostgreSQL so sticking to that allowed us to focus on our primary objectives. We extended the pre-existing PostgreSQL database used by Heimdallur with some additional tables to accomodate the data gathered by Muninn. This was done because Heimdallur and Muninn are essentially gathering the same data, however the context of the data is completely different. Because Muninn uses the historical data in the database we needed to interact with the database during runtime of Muninn. To achieve this we utilized the Python library SQLAlchemy. The structure of our database is shown in figure 3.

18 Figure 3: Relationship diagram of the changes made to Heimdallur’s database.

• The qscans table contains the start time, end time and an explanation as to what services were targeted by Muninn during the scan.

• The qiterations table contains information about the start time and end time of each iteration of the Muninn scan.

• The qservice table includes the information from the banner grabbing for each target that was successfully scanned in a specified Muninn pair.

• All other tables in the figure are a part of the original Heimdallur database and can be viewed in more detail in the respective report.

19 5 Evaluation and Results

Muninn had two main uses as was discussed in section 4.1. The first of these involves selectively scanning only the set of ports in Iceland which were previously responsive. This is done in an effort to quickly get an updated state of the . The second use is to conduct a targeted scan for some specific service by utalizing historical data from a previous scan. The evaluation and results of these two use cases will be discussed below.

5.1 Selective scanning with Muninn There are two questions we seek to answer in this section: 1) what is the quality of the information that we gather with a selective scan and 2) How quickly can selective scans be completed? We start with the quality evaluation.

5.1.1 Accuracy and Coverage Muninn avoids all server discovery and only scans targeted ports as mentioned in sec- tion 4.1. However services are constantly appearing, disappearing and changing as dis- cussed in section 3.5. Thus the age of the baseline scan directly affects the quality of the data from Muninn. We define two measures to indicate the quality of the information retrieved from Muninn. Hit-rate as described in equation 1 where Thit,Fhit mean the number of non-empty banners in a Muninn scan and in the corresponding baseline scan respectively. Completeness as shown in equation 2 where Tunique,Funique stand for the number of unique banners in the Muninn scan and corresponding baseline scan respec- tively. The measures give an indication of the amount and the diversity of the data attained from Muninn compared to a thorough baseline scan done by Heimdallur.

|T | Hit-rate = hit (1) |Fhit|

|T | Completeness = unique (2) |Funique|

Series of scans were done with Muninn using differently aged baseline scans ranging from a week old to one and a half years old. The Hit-rate and Completeness were then calculated to estimate the effects of the age of the baseline scan. The results are listed in table 6. Due to the fact that there were no scans done for over a year following the end of the original Heimdallur project there are not as many baseline scans to test on as we would have liked. Additionally all of these scans can be grouped into three time periods: The Heimdallur development, the beginning of the Muninn project and finally the end of the Muninn project. This lack of varied data points may distort our resulting analysis. Finally there is no guarantee that the behaviour observed in the past two years accurately reflects future changes.

20 Unique ID Hit-Rate Completeness Age of Baseline Banners Banners 15A 0.906 0.949 7 days 05:03:00 68,844 3,499 15B 0.906 0.949 8 days 05:03:00 68,826 3,498 15C 0.905 0.948 9 days 05:03:00 68,747 3,494 15D 0.900 0.948 10 days 05:03:00 68,425 3,494 13A 0.659 0.877 72 days 22:30:30 62,684 3,313 13B 0.647 0.873 74 days 04:30:30 61,582 3,296 13C 0.652 0.871 74 days 08:20:30 62,025 3,289 8A 0.530 0.572 514 days 04:56:00 41,291 2,673 7A 0.500 0.560 521 days 03:51:00 33,364 2,681 5A 0.492 0.564 556 days 12:11:30 26,081 2,633

Table 6: The scans conducted using Muninn and their results. The number in the ID references a corresponding baseline scan from table 3.

As suspected, figure 4 shows that the age of the baseline scan drastically affects the Hit-rate of Muninn. The drop is most drastic for the first few weeks but evens out to ∼ 50% as the baseline scan becomes more than a year old. This indicates that having an up to date baseline scan is crucial to capture as many banners as possible. Running a baseline scan weekly should guarantee a 90% hit rate for every run of Muninn. This is fine as is, since the baseline scan takes approximately 6 days currently using a single machine with 3 cores for scanning (the fourth core is exclusive to the database). The baseline scan could be distributed to more machines if increased speed is required. Interestingly, figure 5 shows that the Completeness follows a very different trend than the Hit-rate. The drop in unique banners is not as drastic as the drop in total banners.

Figure 4: Hit-rate as age of baseline scan Figure 5: Completeness as age of baseline increases. The function used to indicate the scan increases. The trend seems to be lin- trend is logarithmical of the form a ∗ log(b ∗ ear. x + c) + d.

21 The Completeness seems to follow a slow downwards linear trend with respect to the age. This indicates that if observing the amount of unique banners is a goal, conducting frequent baseline scans is not as urgent as it is for the total banners.

5.1.2 Speed of scanning all responsive ports For our goal of monitoring the Icelandic Internet, Heimdallur simply was not fast enough. Using Muninn with a 7-10 day old baseline scan we managed to selectively scan the whole Icelandic IP-range in an average of around two and a half hours. This is a massive improvement on the approximately 7 days it took for Heimdallur to complete the baseline scan, especially considering that the two scanners are running on the same machine and sharing the same bandwith limitation. The speed of Muninn is more than enough for our purposes as it allows us to get a rather accurate and detailed state of the Icelandic Internet within a few hours.

Time taken Age of Baseline Banners Baseline Banners QS ID h:m:s 5A 01:17:38 556 days 12:11:30 26081 53030 7A 01:22:29 521 days 03:51:00 33364 66680 8A 01:31:03 514 days 04:56:00 41291 77891 13C 01:48:23 74 days 08:20:30 62025 95119 15A 02:26:43 7 days 05:03:00 68844 76001

Table 7: Here we see the time it took Muninn to complete one scan iteration. Note that when we use the most recent baseline, scans takes longer. This happens because the using older scans results in a lower hit-rate and thus more banner grabbing operation instantly return with no banner information from trying to scan closed ports.

When conducting selective scans with older baseline scans, each scan iteration by Muninn was noticably faster. After some analysis we concluded that this likely stems from the scan having a lower Hit-rate and thus fewer successful port probes. When we scan a closed port we instantly get a response which is far quicker than waiting for further information. As can be seen in figure 6 the time of the scan looks to be strongly linearly related to the amount of non-empty banners and age but less so correlated with the total number of banners in the baseline scan.

5.1.3 Finding a better set of ports Throughout the project we tried using three different sets of top 1,000 ports for our scans in hopes of finding the best set.

1. For Scan 8 the top 1,000 ports from Nmap were used with 12 additional ports commonly open in Iceland.

22 Figure 6: Relationship between the number of banners in the baseline scan, the age of the baseline and the number of non-empty banners returned to the time taken to scan with Muninn. The time seems to be most strongly correlated with the number of non-empty banners returned.

2. For Scan 13 and Scan 15 we used the information from Scan 9 to find the top 1,024 open ports in Iceland.

3. For Scan 16 we tried the top 1,024 responsive ports in Scan 9.

The difference between the scans seemed negligable and if anything just using the Nmap specified ports with a few additions seemed to be best. This might be because the data from Scan 9 was already more than a year old and so might not be useful for analysing what to scan today. Additonally we might be overfitting our data by only considering the one full port range scan we have access to. Without a newer scan of all of the 65,536 ports it is hard to find a better set of ports.

5.1.4 Daily Scanning of all Icelandic Internet Services Muninn was run once per day for four consecutive days from the 30th of April to the 3rd of May 2019. All scans were started at exactly 15:34 Greenwich Mean Time. These scans are listed in table 6 with the ID 15A-15D. From the table we can see that the scans returned very similar results with slightly fewer banners and unique banners for each consecutive scan. This adheres to our observations in section 5.1.1 that the Hit-rate and Completeness have a general downwards trend. The rate of each consecutive scan is by far faster than anything we have previously been capable of doing. However four days

23 Date Range Before ProFTPD 0 30.04-01.05 After ProFTPD version: 1.2.10 Before MySQL version: 5.7.25-0ubuntu0.18.04.2 1 30.04-01.05 After MySQL version: 5.7.26-0ubuntu0.18.04.1 Before Apache httpd version: 2.2.3 extrainfo: (CentOS) 2 30.04-01.05 After Apache httpd Before DNVRS-Webs 3 30.04-01.05 After Hikvision Network Video Recorder http admin devicetype: webcam Before Apache httpd extrainfo: PHP 7.1.27 4 30.04-01.05 After Apache httpd extrainfo: PHP 7.1.28 Before nginx version: 1.15.9 5 01.05-02.05 After nginx version: 1.15.10 Before MySQL version: 5.5.54 6 01.05-02.05 After MySQL extrainfo: blocked - too many connection errors Before Fortinet FortiGate named 7 01.05-02.05 After ISC BIND version: Knot DNS 2.4.0 Before Kerio Connect imapd version: 9.2.8 patch 1 8 02.05-03.05 After Kerio Connect imapd version: 9.2.9 patch 1 Before Kerio Connect smtpd version: 9.2.8 patch 1 hostname: mail.kvikmyndasafn.is 9 02.05-03.05 After Kerio Connect smtpd version: 9.2.9 patch 1 hostname: mail.kvikmyndasafn.is Before Point to Point Tunneling Protocol 10 02.05-03.05 After DrayTek version: (Firmware: 1) hostname: Vigoro Before ISC BIND version: 9.11.5-P4 11 02.05-03.05 After ISC BIND version: 9.11.6-P1 Before Boa httpd 12 02.05-03.05 After Boa HTTPd version: 0.94.14rc21 Before version: 19.9.0 13 02.05-03.05 After HAProxy http proxy version: 1.3.1 or later devicetype: load balancer

Table 8: Examples of changes that were tracked between the 30th of April to the 3rd of May by scanning the whole Icelandic IP range with Muninn. For privacy reasons the specifc IP addresses and ports these changes were observed on where left out of the tables.

24 of scan time is a very narrow time window for changes to happen. Some of the changes observed are listed in table 8 which shows that wide variety of changes are observable and due to the frequency of the scans they can be pinpointed to a narrow time range. With the current capabilities of Muninn we could have scanned much more frequently, up to every three hours if we wanted. This would have given us even more accurate information as to when changes happen exactly. However, such aggressive scanning could possibly trigger alarms or disrupt some services and for that reason we avoided it. On the other hand, if an emergency should arise and some services need to be monitored very frequently, it is entirely within the capabilities of Muninn to do so.

5.2 Targeted Scanning with Muninn Prior to building Muninn there was no proper way to scan specified services which were not identified purely by their IP address and/or port. By mining previously collected banner information, a targeted set of pairs can be created that focus on scanning all known instances of a specific service or services. This targeted scanning capability was built into Muninn, resulting in the possibility of quick and easy initiation of specific services scans.

5.2.1 Updating all webcameras Muninn’s capability to quickly check on some specific services was first tested by finding and scanning all webcams encountered in a previous scan. The set of webcams is a rather large and varied set which can return a wide variety of banners. We therefore made Muninn look through the most recent scan for any banners including the words: "camera", "webcam" or "myndavel" (case-insensitive) and scan only the specific IPs and ports returned. The scan finished in 34 minutes and 23 seconds.

5.2.2 Scanning for Vulnerable Services On the 14th of May a new CVE was published, specifically CVE-2019-0708 [16]. The CVE states that there exists a remote code execution vulnerability in Remote Desktop Services (RDS) formerly known as Terminal Services. This is a component of Microsoft Windows which allows users to take control of a remote machine. This is an incredibly critical vulnerability which can result in massive damages. The vulnerability is wormable, meaning that malware created for the vulnerability can easily propagate between vulnerable machines [20]. In fact this vulnerability is so critical that Microsoft even released patches for it on unsupported systems. Microsoft also states that this vulnerability can be exploited in a similar way as WannaCry [23]. The same morning we initiated Muninn immediately. Muninn looked up all known RDSs in Iceland, scanned each one and performed a detailed OS detection on the systems. We found that there are 641 known vulnerable services to this CVE in Iceland. The scan itself took less than 6 minutes while a more detailed OS detection scan took 24 minutes

25 and 47 seconds. The information gathered from this scan will later be used to notify each service of their vulnerability.

5.2.3 Targeted scanning capabilities As can be seen by the experiments done there are a vast number of possiblites available with targeted scanning. By utilizing historical information on the last known state of the Internet we can create custom targeted scans in whatever manner we want. Other examples than webcams and vulnerable Windows machines can include all SQL servers of version higher than 5.4, all Exim servers and/or all Apple Air Capsules. The pos- sibilities are endless. Services can then be tracked for an extended period of time to gather historical data on how they change, what version is most popular and in what circumstance they typically update.

6 Discussion

In this section we will discuss the ethics regarding this project, our collaboration with CERT and the limitations we faced along the way.

6.1 Ethics We feel that it is important to address the ethics in our project. There are true cyber threats lurking on the Internet as we have stated many times before. When we scan services the people behind them have no way of knowing whether we are doing so with malicious intent or not. We do not want to give the people that are behind the services we scan unwanted panic attacks. This is why Heimdallur sends a User-Agent string containing a link to scan.syndis.is, which leads to a webpage explaining the purpose of the project. The website contains an in which people can ask questions regarding the project or even opt-out of our scanning. Muninn however only scans services that Heimdallur has already scanned. Thus if logging on the IP address is in place it has already received the User-Agent string [11]. During the banner grabbing process we do not perform any sort of invasive probing. We only try simple commands that return basic information about what sort of service is running. We do not try to gain access to the service by trying simple to guess default passwords and we most certainly do not send any commands that can harm the service in any way. Additionally to not cause any congestion or strain of networks Heimdallur scans passively. We followed their example with Muninn but note that because Muninn scans more efficiently there is a possibility of a shorter interval between scans of the same machine. Over the course of this project we received a couple of and phone calls asking about this project. After sending them an explanation they were happy to take part in our scanning. It should be noted however that after scanning the services that return banners with Muninn we have not received a single email. This can be either

26 because Muninn does not send the link to the aforementioned website or because the ones that return banners are not worried about scans or simply do not notice them.

6.2 Computer Emergency Response Team (CERT) CERT-IS, CERT for short, is the Icelandic National Computer Security Incident Re- sponse Team (CSIRT). CERT is under the telecommunication sector in Iceland. The main role for CERT is to analyze cyber security threats and aid its primary constituency members. Their role in cyber crisis is to coordinate responses to the situation [4]. This project is done in collaboration between Syndis and CERT. CERT elaborated what they needed in order for the system to be able to assist them. The creative work with them lead to what we now called Muninn. CERT showed much interest in the progression of the project which tells us how valuable a software of this kind can be. With the software we can assist them in serving their duties as an incident response team.

6.3 Limitations The platform created from the project comes with its limitations. Due to the scope of the project and the data from scans before the project only being the Icelandic IP range we decided to continue only exploring the same range, thus all the data collected is tied to Iceland. In table 3 there are only 148,110 services which returned a non-empty banner. The other 31,111,851 services however returned an empty banner. The only information we get about a service that returns an empty banner is that it has an open port as well as possibly deducting information from the hostname by hand. This limits the reach of the data significantly. Also note that the value of the platform only goes as far as the validity of the banners. Like mentioned in section 2.1.4 we are assuming all banners acquired are completely legitimate. We must address however that it is a good possibility that some of the banners are illegitimate and is thus limiting the performance and value of the platform. Limitations also lie in the coverage of Heimdallur. Heimdallur now excludes an IP range under the CIDR of Síminn, see section 3.1 for details. An important limitation of Heimdallur is the speed of its scans. This may be some- what mitigated by the firewalls as noted in [11]. Distributing the scanner is likely the main way to achieve better speeds and remains a possibility if we require faster service discovery for Muninn. In this project we refrained from resorting to that however as more machines are simply an added cost, both in setup and maintainence.

7 Future Work

In this section we go over topics that expand upon the research covered in this paper. The improvements we discuss are aimed to get a better understanding of services behind IP addresses, speed of the platforms as well as improved reach of the data. We will be

27 discussing deeper scans, evolving distribution of the platforms, automated tracking of changes, improving the CPE extraction, and finally expanding scope of both platforms.

7.1 Automated tracking of changes Over the course of the project we have mostly analysed the changes observable in con- secutive scans by hand. This was a fine starting point when setting out to see if changes could be monitored but is not efficient in the long term. An exciting future direction for the project would involve automating the tracking of changes between scan iterations of Muninn. What we mean by this is that we need some sort of system that allows us to view visually and statistically the changes between iterations. Getting an automated data analysis working would allow for the creation of a perpetual real time feed of the Internet. No new services or important version updates would go unnoticed. This sort of tool would be invaluable for computer security researchers and companies. We also wanted to add a feature where the platforms are connected to a vulnerability feed. This would enable us to know instantly when vulnerabilities are out and how many services are possibly at risk. This would benefit the reach of the data as well as CERT significantly. CERT would have a system that helps them notice impactful vulnerabilities so that they can fulfill their duties.

7.2 Deeper scan In section 6.3 we mentioned how empty banners were limiting the reach of the data sig- nificantly. In the preliminary project Heimdallur they mentioned that they could use tools like Wappalyzer. Wappalyzer would allow us to uncover technologies behind web- sites, for example content management systems, ecommerce platforms, server software and more [27]. We still believe that this is relevant to our future work, but this was out of scope. Performing a deeper scan is well within reach of the Muninn platform, as it is in our opinion quite fast and can thus sacrifice speed for depth. We believe that this would also benefit Heimdallur as he handles the discovery phase, he however does not have the speed to sacrifice.

7.3 Distribution of the platforms One way to speed up especially Heimdallur would be to distribute the platform. A possi- ble distribution would be to have a queue of which multiple machines would subscribe to and execute scans. This would of course make both platforms more scalable but we would also want to keep the speed of a single machine at a maximum.

7.4 CPE extraction In section 3.3 we touched on the subject of CPE extraction from the preliminary project. From their results we can see that it is lacking which leads to a weak relation to CVEs. This means that both Heimdallur and Muninn can not accurately connect services with

28 vulnerabilities. Coming up with a better and more sophisticated CPE extraction would improve the reach of the data considerably in context of more accurate vulnerability detection.

7.5 Going beyond Iceland Both platforms have the potential to scale easily by means of distribution. This would enable us to start scanning IP ranges of other countries. Consulting with experts at Syn- dis we were told that we could expand the project to the with permission. However at the start of the project we had not gotten a grip of the data of Iceland and inviting more data into the project would have only caused confusion. There are also some legal considerations that must be addressed before scanning other countries. Yet it would be interesting to compare the Internet between countries.

8 Conclusion

We have analysed the data made available by the port scanner Heimdallur and found that important changes could possibly be observed by more careful monitoring of the Internet. To serve this purpose we created Muninn which has two main capabilities. Firstly Muninn can obtain an updated view of all responsive services on the Icelandic Internet in less than 3 hours. This allows us to monitor exactly what changes happen and when they do with unprecidented precision and speed. Secondly Muninn is capable of scanning not only certain ports and IP addresses but also search for and monitor specific services using historical scan information. This allows for precise monitoring of services of particular interest. We demonstrated this capability by scanning for all vulnerable Microsoft Remote Desktop Services following the discovery of a critical vulnerability on the 14th of May in mere 6 minutes. We analysed the data gathered by Muninn to confirm that the repeated scans con- ducted provide useful information of changes happening on the Internet. By further automating the process of analysing and tracking changes Muninn could be used to give a real time feed of the ebb and flow of the Internet. This is invaluable information for computer security researchers and companies like Syndis and CERT.

29 References

[1] Ajdellinger. Hacker hijacks 50,000 printers to tell people to subscribe to pewdiepie. [Online]. Available: ://www.engadget.com/2018/11/30/ pewdiepie-printer-hack-thehackergiraffe/, November 2018. [Accessed: May. 9, 2019].

[2] Anonymus. Internet census 2012: Port scanning /0 using insecure embedded devices: Carna botnet. [Online]. Available: http://census2012.sourceforge.net/paper. html, 2012. [Accessed: May. 10, 2019].

[3] Apple. Airport time capsule setup guide. [Online]. Available: https: //manuals.info.apple.com/MANUALS/1000/MA1645/en_US/airport_time_ capsule_80211ac_setup.pdf, 2013. [Accessed: May. 13, 2019].

[4] CERT-IS. About cert-is. [Online]. Available: https://www.cert.is/en.html, [No year]. [Accessed: May. 13, 2019].

[5] Brant A. Cheikes, David Waltermire, and Karen Scarfone. Common platform enu- meration: Naming specification, version 2.3. NIST Interagency Report 7695, August 2011. [Accessed: May. 11, 2019].

[6] Justin Ellingwood. Understanding IP Addresses, Subnets, and CIDR Nota- tion for Networking. [Online]. Available: https://www.digitalocean.com/community/tutorials/ understanding-ip-addresses-subnets-and-cidr-notation-for-networking, 2014. [Accessed: May. 16, 2019].

[7] Symantec employee. What is a honeypot? how can it lure cyberattackers. [Online]. Available: https://us.norton.com/internetsecurity-iot-what-is-a-honeypot. html, [No year]. [Accessed: May. 16, 2019]. [8] The Broadband Forum. Tr-069. Broadband Forum, pages 19,26, 2011.

[9] Béla Genge and Calin Enachescu. Non-intrusive historical assessment of internet- facing services in the internet of things. MACRo 2015-5th International Confer- ence on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics, 2015.

[10] Robert David Graham. Masscan - tcp port scanner, spews syn packets asyn- chronously, scanning entire internet in under 5 minutes. [Online]. Available: https: //github.com/robertdavidgraham/masscan, 2013. [Accessed: May. 12, 2019]. [11] Tinna Sigurðardóttir Heiðar Karl Ragnarsson, Hlynur Óskar Guðmundsson. Heim- dallur - vulnerability detection through internet scanning. skemman, Febuary 2018.

[12] janit0r. Internet chemotherapy. [Online]. Available: https://ghostbin.com/paste/ q2vq2, 2017. [Accessed: May. 14, 2019].

30 [13] Knud Lasse Lueth. State of the IoT 2018: Number of IoT devices now at 7B – Market accelerating. [Online]. Available: https://iot-analytics.com/ state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/, 2018. [Accessed: May. 14, 2019].

[14] Gordon "Fyodor" Lyon. Nmap network scanning. [Online]. Available: https:// nmap.org/book/, [No Year]. [Accessed: May. 10, 2019].

[15] Robert A Martin. Managing vulnerabilities in networked systems. Computer, 34(11):32–38, 2001.

[16] Microsoft. Cve-2019-0708 | remote desktop services remote code execution vul- nerability. [Online]. Available: https://portal.msrc.microsoft.com/en-US/ security-guidance/advisory/CVE-2019-0708, May 2019. [Accessed: May. 16, 2019].

[17] Mike. Top 1,000 tcp and udp ports (nmap default). [Online]. Available: https://nullsec.us/top-1-000-tcp-and-udp-ports-nmap-default/, 2016. [Accessed: May. 10, 2019].

[18] MySQL. Mysql 5.6 release notes :: Changes in mysql 5.6.43 (2019-01-21, general availability). MySQL, January 2019. [Accessed: May. 13, 2019].

[19] Neal. What are sip endpoints? how do they work? [Online]. Available: https://support.voicepulse.com/hc/en-us/articles/ 202526855-What-are-SIP-Endpoints-How-do-they-work-, 2014. [Accessed: May. 12 2019].

[20] Simon Pope. Prevent a worm by updating remote desktop services (cve-2019-0708). [Online]. Available: https://blogs.technet.microsoft.com/msrc/2019/05/14/ prevent-a-worm-by-updating-remote-desktop-services-cve-2019-0708/, May 2019. [Accessed: May. 16, 2019].

[21] Radware. ”brickerbot” results in pdos attack. [Online]. Available: https://security.radware.com/ddos-threats-attacks/ brickerbot-pdos-permanent-denial-of-service/, 2017. [Accessed: May. 14, 2019].

[22] Jack Rhysider. Darknet diaries, ep 31: Hacker giraffe, Febuary 2019. [Accessed: May. 10, 2019].

[23] Ian Sherr. WannaCry ransomware: Everything you need to know. [Online]. Available: https://www.cnet.com/news/wannacry-wannacrypt-uiwix- ransomware-everything-you-need-to-know/, May 2017. [Accessed: May. 16, 2019].

31 [24] Shodan. Heartbleed - a report of devices vulnerable to heartbleed. [Online]. Available: https://www.shodan.io/report/89bnfUyJ, 2016. [Accessed: May. 12, 2019].

[25] Shodan. Shodan - the world’s first search engine for internet-connected devices. [Online]. Available: https://shodan.io, [No year]. [Acessed: May. 12, 2019].

[26] . Snorra Sturlusonar. Heimskringla, [Accessed: May. 16 2019].

[27] Wappalyzer. Identify techonology on websites. [Online]. Available: https://www. wappalyzer.com/, [No year]. [Accessed: May. 15, 2019].

32