ADVANCE PUBLICATION Annals of Business Administrative Science ://doi.org/10.7880/abas.0200908a Received: September 8, 2020; accepted: October 16, 2020 Published in advance on J-STAGE: December 5, 2020

Deep Web, , Dark Net: A Taxonomy of “Hidden”

Masayuki HATTAa)

Abstract: Recently, online black markets and virtual currencies have become the subject of academic research, and we have gained a certain degree of knowledge about the dark web. However, as terms such as and dark net, which have different meanings, have been used in similar contexts, discussions related to the dark web tend to be confusing. Therefore, in this paper, we discuss the differences between these concepts in an easy-to-understand manner, including their historical circumstances, and explain the technology known as used on the dark web.

Keywords: , deep web, dark web, dark net, privacy

a) Faculty of Economics and Management, Surugadai University. 698 Azu, Hanno, Saitama, Japan. [email protected] A version of this paper was presented at the ABAS Conference 2020 Summer (Hatta, 2020b). © 2020 Masayuki Hatta. This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1

Hatta

Introduction

In recent years, the term “dark web” has become popular. The dark web, i.e., a wherein your anonymity is guaranteed and one that cannot be accessed without using special , was, until recently, of interest to only a few curious people. However, in 2011, the world’s largest online black market, (Bolton, 2017), was established on the dark web; with the presence of virtual currencies, which incorporate the anonymity provided on the dark web (Todorof, 2019), it has become a topic of economic and business research. Words similar to “dark web” (such as “deep web” and “dark net”) are used in the same context, but they are completely different technical concepts; this leads to confusion.

Deep Web

Historically, among the three terms (“dark web,” “deep web,” and “dark net”), the term “deep web” was the first to emerge. technician and entrepreneur, Michael K. Bergman, first used it in his white paper “The deep web: Surfacing hidden value” (Bergman, 2001). Bergman likened web searches to the fishing industry and stated that legacy search engines were nothing more than fishing nets being dragged along the surface of the sea, even though there is a lot of important information deep in the sea, where the nets do not reach. Therefore, he stated that, moving forward, it was important to reach deep areas as well. This was the advent of the deep web. Bergman stated that the deep web was 400–550 times larger than the normal web, and that the information found in the deep web was 1,000–2,000 times the quality of the normal web. The problem is that even now this is used in the context of the dark web. What Bergman (2001) first raised as detailed examples of the “deep” web were the National Oceanic and Atmospheric Administration (NOAA) and 2

Deep web, dark web, dark net

United States Patent and Trademark Office (USPTO) data, JSTOR and Elsevier fee-based academic literature search services, and the eBay and Amazon electronic commerce sites; these are still referred to as the “deep web” today. In short, Bergman referred to the following as the deep web:

(a) Special databases that could only be accessed within an organization (b) Sites with paywalls wherein content can only be partly seen or not seen at all without registration (c) Sites in which content is dynamically generated each time they are accessed (d) Pages that cannot be accessed without using that site’s search system (e) Electronic and chat logs

That is to say, it refers to a Web that normal search engines, such as , cannot edit or index. Incidentally, according to Bergman, in 1994, there were already people using the “invisible web,” in the sense that it could not be searched by a . However, Bergman asserted that the deep web was just deep, and not “invisible,” i.e., it could be searched with innovations. The start-up that he was managing at that time was selling this very technology. Furthermore, following this, Google formed a separate agreement with the owners of databases and started the Google Books project with university libraries, becoming involved in “deep” field indexing; thus, in 20 years, the deep web—in the sense that Bergman used it—is considered to have shrunk considerably. In this manner, originally the “deep” in deep web was simply somewhere that was deep and difficult to web-crawl, and did not contain nuances of good or evil. Despite this, “deep” is a powerful word and, as will be described later, this has led the way in 3

Hatta entrenching the image associated with the dark web as something thick and murky.

Dark Net

The term “dark net” became popular at virtually the same time as the term “dark web” did. There is a hypothesis that this has been used since the 1970s and although even today, in concrete terms, an IP address that is not allocated to a host computer is referred to as the dark net, the trigger for it being used as a general term as it is now was in 2002 (published in 2003), when a paper was written by four engineers including Peter Biddle (he was working at at that time), who called the dark net as the future of content distribution (Biddle, England, Peinado, & Willman, 2003). Sweeping the world at that time was the P2P file service software service (started in 1999) and (released in 2000). Operation of File Rogue started at around the same time in Japan. There were fears of , and in the paper written as part of the research on Digital Rights Management (DRM) and copy protection (Biddle et al., 2003), the term “dark net” was clearly being used in the negative meaning of illegal activity. Biddle et al. (2003) broadly defined dark net as “a collection of networks and technologies used to digital content” (Biddle et al., 2003, p. 155). Based on this, it can be summarized as follows.

(1) This started with the manual carrying of physical media such as CDs, , and more recently, USB memory—the so-called “.” (2) With the spread of the Internet, files such as music files began to be stored on one server, giving birth to the “central server” model. However, if the central server were destroyed, that would be the end.

4

Deep web, dark web, dark net

(3) Files or parts of files were shared on multiple servers using Napster or Gnutella and by the shared servers (peer) communicating together—a Peer to Peer (P2P) model (meaning that if only one point of the network was destroyed, the network as a whole would survive) appeared.

This P2P model was realized on the existing physical network, using technology known as an that utilizes non-standard applications and protocols. Additionally, Biddle et al. (2003) noted that as Napster had a central server for searching, it could be controlled using that. Moreover, although Gnutella was completely distributed, the individual peers were not anonymous and you could learn their IP addresses, so it was possible to track them and hold them legally responsible. In this way, measures could be taken in regard to the P2P at the time, but it was predicted that a new dark net, where these weaknesses were overcome, would emerge. Biddle et al. (2003) considered that, even if protected, it could be widely diffused via the dark net, and that the dark net would continue to evolve. They reached the conclusion that DRM was fundamentally meaningless, and that to eradicate pirated versions, official versions also needed to have a reasonable price and be convenient for customers, as well as compete on the same ground. This pronouncement put the jobs of Biddle et al. at risk (Lee, 2017). However, considering that attempts at measures against piracy through copyright enforcement have continually failed, and that currently piracy is being put to the sword by the emergence of superior platforms, such as Netflix and Spotify, such pronouncements have proven to be correct.

5

Hatta

Yet Another Dark Net: F2F

Possibly due to the fact that dark net is an attractive name, around the same time as Biddle et al. (2003), the term “dark net” began to be used as a general term for a slightly different technology. This is called Friend-to-Friend (F2F), 1 and as this was implemented as mode by , which is one of the main types of dark web software (to be described later), this also became known as Darknet. In this sense, Darknet, or F2F, is a type of P2P network, and the user only directly connects with acquaintances (in many cases, they have met in real life and built up trust via a non-online route). A password or is used for authentication. The basic concept behind F2F is that a P2P overlay network is constructed over the existing relationships of trust between users. This is a method in

Figure 1. Topology of Darknet. A participant with malicious intent (e.g., a red one) cannot easily understand the entire network.

Source: the author.

1 The term F2Fitself was invented in the year 2000 (Bricklin, 2000). 6

Deep web, dark web, dark net

Figure 2. Topology of Opennet. The outside observer can understand the entire network thanks to the existence of a directory server.

Source: the author. which the network is subdivided and, rather than an unspecified large number of people, they connect to a much smaller group of, say, five people as shown as an example in Figure 1, whom they know well and trust. In this sense, the term Opennet, used in Figure 2, is an antonym of Darknet.

Overlay network

Here, an overlay network is a general term for a network constructed “over” a separate network. Typically, it refers to a constructed over the Internet. The problem in this case is one of routing. On the Internet, based on TCP/IP, it is possible to reach other servers by specifying an IP address. However, in the case of an overlay network, this IP address is not necessarily known or usable, so technology such as a (DHT) is utilized to route an existing node using a logical address. In F2F, each user operates as an overlay node. Contrary to an Opennet P2P network, with F2F, it is not possible to connect to an

7

Hatta arbitrary node and exchange information. Instead, each user manages “acquaintances” that they trust and establishes safe and authenticated channels only with a fixed number of other nodes. As pointed out by Biddle et al. (2003), in the Gnutella network, for example, there was the problem that the attributes of network participants, such as IP addresses, were known by all network participants. The participants could be infiltrated by police or information agencies, and if their attributes are known, there is the danger of them being tracked and of legal action being taken against them. Additionally, as connections are concentrated on powerful nodes with an abundance of network resources, as shown in Figure 3, when that node becomes an adversary, the overall image of the adversary’s network can be grasped in a so-called “harvesting” attack. With F2F, it is possible to create a P2P network that can withstand harvesting. Contrary to other dark web implementations such as or (described later), F2F network users are unable to know who is

Figure 3. Harvesting attack. If there is a powerful server (possibly run by adversaries), all nodes would try to connect that server; thus, the entire network would be revealed.

Source: the author.

8

Deep web, dark web, dark net participating in the network other than their own “acquaintances,” so the scale of the network can be expanded without losing the anonymity of the network as a whole. In other words, dark means that it is difficult to see and grasp an overall image of the network. In a “simple” F2F network, there is no path that reaches beyond “acquaintances,” so the average network distance is infinite. However, as indirect anonymous communication between users who do not know or trust each other is supported, even if it is between nodes for which trust has not been established, as long as there are common nodes that are acquaintances of both, by going via this node, it is possible for both to communicate with anonymity (small world network). It is interesting how both trends of the dark net Watts and Strogatz (1998) described herein and social networks were established once this small world phenomenon became commonplace. It can be said that the dark net is like a twin sibling to social networks, which are at the peak of their prosperity.

Dark Web

It is unclear when the dark web first appeared. The term dark web began to be used around 2009, but merging with the deep web was already seen at that time (Becket, 2009). To understand the dark web, which is different from the deep web and dark net (which are comparatively simple, technologically), an understanding of computer network basics is required.

Internet basics

On the Internet, “access” is realized by the correspondence of a high quantity of messages between a at the user location and a server in a remote location. For example, when viewing a using a , a request message saying “send data on this 9

Hatta page here” from the computer of the viewer is sent to the web server on the server side in accordance with the fixed rules (known as “protocols”). The web server receiving this message then sends the requested data. At this time, the message is minutely subdivided into data of a fixed size, called “packets,” in which data called a “header” (wherein control information such as the IP address of the sender and destination is described at the start of each header fragment) is attached, are exchanged. The side receiving the packet connects and reconstructs the message and takes action accordingly. On the Internet, such packets are sent in a packet relay via many server machines to the destination server. This type of packet flow is called “traffic.” Looking at the header and deciding where and how to send the packet is known as “routing,” and the general name given to devices and software that make these decisions and transmit these packets is “routers.”

Assumed anonymity versus real anonymity

The above text describes the mechanism by which data on the Internet (described as the , in contrast to the dark net) is exchanged; however, when the Internet is accessed, a “record” always remains. For example, if you view a website from a PC, the server hosting this website will have a record (access log) showing at what hour and what minute this page was accessed and from where. In many cases, the only thing recorded is the identifier number, known as the IP address. IP addresses are allocated to individual communications devices, and as the IP address assigned may change with every connection, it may be difficult to identify the location and person involved using the IP address alone. However, as you will know the Internet service provider (ISP) used by the device for this connection, you can then get information on the contracted party from the ISP. So, you can trace each step back one by one. 10

Deep web, dark web, dark net

The log is often stored on the server side for a fixed period of time (in many cases, from three months to one year or more). Therefore, if the investigatory body receives submission of a log from the body managing the server or ISP, etc., they can start to track down the sender. Of course, there are issues with freedom of expression and secrecy of communications, so even investigatory authorities are unable to acquire sender information in an unlimited way. However, in cases involving requests for disclosure of sender information based on the Law on Restrictions on the Liability for Damages of Specified Telecommunications Service Providers, identification of information senders by the police or the prosecution after acquiring a warrant from the courts is an everyday occurrence.

Anonymization by Tor: Onion routing

Therefore, there are systems, such as Tor, that make it difficult for information senders to be identified. Tor is software designed to enable Internet users to communicate while maintaining anonymity and has been named based on the initial letters for “The Onion Router.” Tor is an open-source software that runs on various platforms2 and can be freely obtained by anyone (and is often free of charge). In recent years, the development of Tor has been furthered on a volunteer basis. Originally, however, this technology was developed at a US-navy laboratory at the start of the1990s. Tor adds a tweak to the basic mechanism of data exchange over the normal Internet. Tor constructs a virtual network over the Internet and functions as a special router over this network. This is a unique of routing and, as the name suggests, it uses a type of technology known as “onion routing.” For example, in the same way that if the destination on postal items is written in a code, mail will not be delivered, the sender IP

2 For the development process of Open source software, refer to Hatta (2018, 2020a), etc. 11

Hatta address in the packet header is described in unencrypted data (plain text). Thus, the access log can be captured on the server. Additionally, as the destination IP address is also described, it is possible to lie in wait on a server during the course of traffic, i.e., a packet relay, and edit or eavesdrop on the packet header as it passes, and statistically analyze the type and frequency of access, exposing the sender’s identity. For example, if there is somebody in the outback of Afghanistan who frequently accesses a US army-related site, even if they do not know the content of the communications, there is a high possibility that an intelligence agent who has infiltrated the US army server is accessing the US army site. The general name for this type of method is . Onion routing was invented as a means of countering such traffic analysis.3 If you want to access a particular server anonymously, you should install Tor on your own computer, change the proxy settings on your web browser, and set all packets going from your computer to do so via Tor. If you do so, based on the following steps, Tor will provide you with anonymity.

Step 1: Choosing relay nodes randomly

Tor, which picks up packets leaving your computer, obtains a list of IP addresses of servers on which the Tor onion router is set from directory servers on the Internet (called Tor nodes or relays) and selects at least three of these nodes at random. If the selected nodes are set to Tor node A, Tor node B, and Tor node C, Tor routing is then performed on your own computer.

Your computer → Tor node A → Tor node B → Tor node C → destination server

3 I2P, known as a non-Tor implementation of the dark web, uses as an improved version of onion routing. Although it uses Freenet or a different algorithm, it is basically the same as onion routing. 12

Deep web, dark web, dark net

It is determined that the packet is sent using this route. (Tor node C is the terminal point on the virtual network created by Tor, and as this is a connecting point that reconnects to the Internet, in particular, this is referred to as an exit node.)

Step 2: Peeling onions at each nodes

Next, Tor attaches a Tor header to the packets to be sent. At each node, the header that has the Tor node IP address to pass the packet to next is attached.

A) Tor node A header: Tor node B IP address B) Tor node B header: Tor node C IP address C) Tor node C header: Destination Server IP address

In practice, as shown in Figure 4, the header is wound from the inside, in reverse order. First, C) in the Tor node C header, the IP address of the destination server where you want to send the packet is written, and the whole thing is encrypted with a which can only

Figure 4. Onion

Source: the author.

13

Hatta be decoded on Tor node C. Additionally, on top of that B) the Tor node B header is attached, and the whole thing is encrypted with a key that can only be decoded on Tor node B. The same process is carried out for the A) Tor node A header. In other words, the one closest to the final destination is placed on the inside, and each layer is locked with keys so that, at each stage, it can be decoded only by specific Tor nodes. The packet is passed to Tor node A on top of this. After decoding the packet headers received at each Tor node, the packet is transferred, as in a packet relay, to the next node written there. In this way, it is as if you are peeling the skin off an onion one layer at a time, and each node opens the header for itself one at a time, decodes it, and passes the packet to the next node. This is the reason that it is called “onion” routing. The peeled-off skin, i.e., the header for yourself that you have decoded is discarded by you. The header for the previous node is discarded by that node. So, for example, Tor node C knows that a packet has come from Tor node B, but it does not know that the one before Tor node B was Tor node A. Additionally, Tor node A knows that it has received a packet from the departure point node and that it needs to pass it to Tor node B, but it does not know the next node or that the contents of the packet are encrypted with the Tor node B key. Therefore, the most important thing is that, from the destination server perspective, the packet does not come from the departure point but appears to come from Tor node C. Therefore, even if an access log is taken for the destination server, what is recorded in the log is the exit node of the Tor node C IP address and not the departure point IP address. Additionally, Tor node C is operated by the Tor node and is simply selected at random, and there is effectively nothing linking the departure point and Tor node C.

14

Deep web, dark web, dark net

Table 1. Deep web, dark web, and (two) dark nets

Type The opposite The strength of anonymity Deep web WWW None

Dark net The legit contents Medium distribution Dark net (F2F) Open net Strong

Dark web Clearnet Strong

Source: the author

Conclusion

In this paper, we have provided a simple explanation of the deep web, dark web, and (two) dark nets, including their technical aspects. These concepts, as shown in Table 1, are easy to understand based on what each one is the opposite of. Discussions of the dark web, etc., tend to be hampered by the image that the word presents. When researching this, moving forward, a precise understanding based on its historical and technical nature is required.

Acknowledgments

This work was supported by JSPS Grant-in-Aid for Publication of Scientific Research Results, Grant Number JP16HP2004.

15

Hatta

References

Becket, A. (2009, November 26). The dark side of the internet. The Guardian. Retrieved from http://www.theguardian.com/technology/ 2009/nov/26/dark-side-internet-freenet Bergman, M. K. (2001). White paper: The deep web: Surfacing hidden value. Journal of Electronic Publishing, 7(1). doi: 10.3998/3336451.0007.104 Biddle, P., England, P., Peinado, M., & Willman, B. (2003). The darknet and the future of content protection. In J. Feigenbaum (Ed.), Digital rights management (pp. 155–176). Berlin, Heidelberg, Germany: Springer. doi: 10.1007/978-3-540-44993-5_10 Bolton, N. (2017). American kingpin: The epic hunt for the criminal mastermind behind the Silk Road. New York, NY: Portfolio/Penguin. Bricklin, D. (2000, August 11). Friend-to-Friend Networks [Web log message]. Retrieved from http://www.bricklin.com/f2f.htm Hatta, M. (2018). The role of mailing lists for policy discussions in open source development. Annals of Business Administrative Science, 17, 31–43. doi: 10.7880/abas.0170904a Hatta, M. (2020a). The right to repair, the right to tinker, and the right to innovate. Annals of Business Administrative Science, 19, 143–157. doi:10.7880/abas.0200604a Hatta, M. (2000b, August). Deep web, dark web, dark net: A taxonomy of “hidden” Internet. Paper presented at ABAS Conference 2020 Summer, University of Tokyo, Japan. Lee, T. B. (2017, November 24). How four Microsoft engineers proved that the “darknet” would defeat DRM [Web log message]. Retrieved from https://arstechnica.com/tech-policy/2017/11/how-four-microsoft-e ngineers-proved-copy-protection-would-fail/ Todorof, M. (2019). FinTech on the dark web: The rise of cryptos. ERA Forum, 20(1), 1–20. doi: 10.1007/s12027-019-00556-y Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442. doi: 10.1038/30918

16