<<

Unveiling the web structure: a connectivity analysis

Roberto Magan-Carri´ on,´ Alberto Abellan-Galera,´ Gabriel Macia-Fern´ andez´ and Pedro Garc´ıa-Teodoro Network Engineering & Security Group Dpt. of Theory, Telematics and Communications - CITIC University of Granada - Spain : [email protected], [email protected], [email protected], [email protected]

Abstract—Web is a primary and essential service to the literature have analyzed the content and services offered information among users and organizations at present all over through this kind of technologies [6], [7], [2], as well as the world. Despite the current significance of such a kind of other relevant aspects like site popularity [8], topology and traffic on the , the so-called Surface Web traffic has been estimated in just about 5% of the total. The rest of the dimensions [9], or classifying network traffic and volume of this type of traffic corresponds to the portion of applications [10], [11], [12], [13], [14]. Web known as . These contents are not accessible Two of the most popular at present are The Onion by search engines because they are authentication protected (; ://www.torproject.org/) and The Invisible contents or pages that are only reachable through the well Internet Project (I2P;https://geti2p.net/en/). This paper is fo- known as darknets. To browse through darknets special authorization or specific and configurations are needed. cused on exploring and investigating the contents and structure Despite TOR is the most used darknet nowadays, there are of the websites in I2P, the so-called eepsites. In particular, we other alternatives such as I2P or , which offer different are interested on how they are interconnected and which kind features for end users. In this work, we perform an analysis of of relationships exist among them. Although some works have the connectivity of websites in the I2P network (named eepsites) been proposed for TOR [15], [16] in this research line, no aimed to discover if different patterns and relationships from those used in legacy web are followed in I2P, and also to get efforts can be found regarding I2P. In fact, we only found the insights about its dimension and structure. For that, a novel work [17] where the authors attempt to discover and analyze tool is specifically developed by the authors and deployed on eepsites claiming they discovered the 80% of total eepsites a distributed scenario. Main results conclude the decentralized in I2P. However, no eepsites’ relationships and connectivity nature of the I2P network, where there is a structural part of analysis are there provided. interconnected eepsites while other several nodes are isolated probably due to their intermittent presence in the network. To analyze the structure and connectivity of websites in I2P, Index Terms—Deep Web, Darknet, I2P, Crawling, Eepsite, a crawling and scrapping tool has been specifically developed Connectivity. here: c4i2p (Crawling for I2P), which is able to extract some useful information from eepsites to characterize them and how I.INTRODUCTION they are inter-connected. For that, a 3 months long experiment The quintessential Internet service, the has been carried out in a distributed environment composed by (WWW), simply known as “the Web”, has a hidden face different nodes located at different places. As we will discuss which is ignored by a great number of users and organizations. below in the document, around only ∼ 1.5% of the total The bulk of them browse the so-called Surface Web, which observed services correspond to eepsites. Besides, they are corresponds to web resources able to be indexed by common generally small and simple, composed of few pages with not search engines. However, the Surface Web is just the peak so many text and hypertext. Moreover, about ∼ 66% of the of an enormous WWW iceberg. The great part of the WWW eepsites are isolated from the rest, thus comprising a hidden contents are hosted on the so-called Deep Web [1]. These part of the overall set of eepsites. That means that I2P darknet contents are not accessible by search engines mainly due to can be seen as an heterogeneous and decentralized network, arXiv:2101.03212v1 [cs.CR] 8 Jan 2021 they are authentication protected contents or pages that are where also some popular sites exist. These popular eepsites only reachable through the well known darknets [2], [3]. To are characterized by their persistence in the I2P network over browse through darknets websites a special authorization or the time, thus making a kind of eepsite network backbone. The specific software and configurations are needed. rest of them seem to be appearing and disappearing randomly Darknets have been widely publicized to users as technolo- in/from the network as it can be seen through the present gies mainly intended to elude restrictions imposed work. by totalitarian governments [4], and to preserve fundamental The rest of the document is organized as follows. A main security rights like or . Although this is background on analysis of darknet related technologies is true, the generalized use of darknets surpasses nowadays provided in Section II. After that, Section III introduces the freedom demands to become a main tool for illegal actions main fundamentals of I2P and presents the c4i2p tool aimed and cibercriminality because of the impunity provided by at discovering the structure of eepsites. The experimental them [5]. setup using the proposed tool is detailed in Section IV, while Mainly motivated by that, darknets have received the atten- the results extracted from the experimentation are afterwards tion of researchers in the last years. This way, some works in discussed in Section V. Finally, the main conclusions and future work are presented in Section VI. In 2018, the creation of a called DUTA was presented in [7]. The database contains a list of a multitude II.BACKGROUNDONDARKNETS of TOR hidden services classified by content. It is noteworthy to notice that more than 250,000 addresses of hidden services The Surface Web or simply the Web, has been widely were found, but only 7,000 of them were accessible while studied by the research community with different and het- the rest were down or unavailable. More recently and in erogeneous aims. A lot of works have been focused on the same line of the previous work, a new algorithm called different related topics, such as performance optimization for TORank is proposed in [2] to classify hidden services in search engines [18], [19], [20], [21], analysis of connectivity, TOR. Authors thoroughly analyze site contents, then creating dimension and structure of the sites [22], [23], [24], [25] and a dataset (DUTA-10K) which updates DUTA. In addition, classification of the sites and their contents [26], [27], [28]. ToRank is assessed and quantitatively compared with some of Regarding the Deep Web, many works have focused their the most popular classification algorithms, such as PageRank, attention on darknets, specially on TOR. For instance, a HITS and Katz. Among the most interesting conclusions we generic crawling framework is proposed in [29] to discover can mention is the fact that only 20% of the available hidden resources with different contents hosted both on the Surface services refer to suspicious activities, and 48% of them are Web or the Deep Web. In particular, the authors carry out associated with normal activities. They also discovered that, in an experiment intended to search websites with content about general, domains related to suspicious activities have multiple homemade explosives. Another study focused on measuring clones under different locations (URLs), which is susceptible performance parameters for TOR can be found in [30]. There, to be used as an additional feature to identify them. response time, throughput and latency are measured, among others parameters. The authors conclude that browsing TOR Some other works analyze the main weaknesses of TOR is slower than doing it through the Surface Web. In [8], and I2P regarding anonymity that might compromise user an approach to enumerate all hidden services in TOR is identities and communication links. This is the case of [33], introduced. The authors take advantage of defects in both where the authors conclude that some of the attacks require the design and implementation of TOR hidden services that considerable resources to be effective and that, therefore, it could allow an attacker to deanonymize them. The analysis is very unlikely that they will succeed against this kind of of popularity of hidden services in TOR is also carried out networks. As a consequence, both darknets are considered in [6] and [9]. Both works are aimed at analyzing how much highly safe. A theoretical comparison between TOR and I2P traffic is due to the use of hidden services and how many is carried out in [34] from a number of perspectives: visibility unique *.onion addresses exist in the network, in order to by the community, scalability, memory usage, latency, band- estimate the number of TOR hidden services. width, documentation, vulnerability to DoS attacks, number On the other hand, a system for crawling the Deep Torrent, of exit nodes, etc. Authors in [35] try to characterize the file- that is the torrents available in BitTorrent that cannot be found environment within I2P, they evaluating how it affects through public sites or regular search engines, is presented the anonymity provided by the network. It is concluded that in [31]. Authors estimate what percentage of resources shared most of the activities within I2P are oriented to file sharing in the BitTorrent network are hidden or part of the Deep and web hosting. Moreover, it is also concluded Web. In [15], a very complete survey about the topology and that the nodes are geographically distributed. content of TOR is presented. Authors make use of a specific Additionally, several works have been proposed based on crawler that recursively explores the links found. Using it, the use of Machine Learning (ML) techniques and algorithms a total of 34,000 hidden services are found, 10,000 of them with the aim of analyzing and classifying darknet’s network corresponding to online services. The work concludes that traffic in some sense. For instance, a ML-based approach to most of hidden services are well connected through central analyze traffic flows generated by I2P applications and users sites such as and forums. That is, the structure of is introduced in [36]. The work concludes that it is possible to this network is somewhat centralized. Regarding contents, create both user and application profiles, and that the accuracy half of the sites involve lawful activities, while those with in creating such profiles depends on the amount of shared illegal hidden services are mainly related with , sale of bandwidth. More recently, the authors in [10] introduce a counterfeit products and drug markets. hierarchical machine learning based framework to classify the In [32], the authors make use of graphs to analyze the most type of network, the type of traffic and the applications that popular emerging products in TOR. For that, authors make use generated it. For that, a labeled dataset (Anon17) compris- of text information extracted from the domains corresponding ing network traffic collected from I2P, TOR and JonDonym to markets to create a Product Correlation Graph (PCG). darknets is used. In comparison to their previous work [11] In a similar line, a link inter-connectivity study is carried where they used a flat approach with several classic ML out in [16] about the structure and privacy of TOR hidden algorithm, they obtain an improved performance, specially in services. The authors analyze more than 1.5 million URLs traffic classification. Moreover, the authors conclude that I2P hosted in 7,257 TOR domains. For each page, links, resources applications are the hardest ones to classify in comparison and redirection graphics, as well as the language used, are to TOR. With the same aim, the authors in [12] proposed a analyzed. Very relevant conclusions are extracted, such as the three step XGBoost ML-based framework. As in the previous fact that domains in TOR are highly interconnected and that works, the Anon17 dataset is used to validate the solution. A there are many links within TOR pointing to pages hosted on relevant improvement, in terms of classification accuracy, is the Surface Web (the reverse case is also quite common). achieved in comparison to other works, e.g., [36]. Moreover, they also tested the suitability of the approach for classifying eepsites. In [17], an attempt to discover and analyze eepsites different but similar network traffic coming from Virtual is made. For that, authors propose the use of Floodfills, Private Networks (VPN) or intentionally encrypted. Similar the active collection of host.txt files and the network works, addressing the classification of I2P network data traffic crawling. A total of 1,861 online eepsites were discovered, are also found in the literature [13], [14]. this amount corresponding to 80% of eepsites existing in The I2P network has also been analyzed from a forensic this network as the authors claim. However, no eepsites’ point of view in [37]. This work is aimed to help a forensic relationships and connectivity analysis is provided. Because analyst to find clues related to the activity of cybercriminals of that, the current work is mainly focused on the study of in I2P. More deeply, a study on the performance and safety of the I2P eepsites relationships and interconnections to unveil I2P is presented in [38]. The authors compare the design of the I2P web structure. the NetDB with the design of the popular KAD () III. I2P DARKNET:BASICSAND ANALYSIS algorithm. Among other conclusions, they stand out that I2P users are, in general, more stable than those of other non- In this section, we first present some fundamentals of I2P networks. Because of that, KAD design is and afterwards describe the specific crawler developed to considered less vulnerable to different attacks than the current analyze that darknet. NetDB design is. Liu et. al introduce in [39] two passive and A. I2P fundamentals active methods to discover I2P routers. An experiment was The I2P project [43] was started in 2003 by a group of carried out for two weeks, where around 25,640 routers were hackers, developers and software architects as a useful set of discovered every day. The authors claim that they almost cover tools against censorship, with anonymity and privacy in mind. 94.9% of the I2P network compared to the data announced on I2P operation is similar to that in other anonymous net- the official . A week-long experiment was also carried works. Thus, traffic is routed through various points in the out to monitor the I2P network in [40]. Some interesting network using chains of proxy servers. Each data packet is results were obtained in this study. Among others, the authors anonymously routed to the corresponding destination. More- conclude that 37% of the published leaseSets were offline after over, when users register their I2P instances (routers), these publication on NetDB. Around 30% of the complete leaseSets become nodes of the network, thus contributing with their set were not identified, which means that 30% of the network own network bandwidth to transport communications among was not running neither a web server nor an I2PSnark . other users. In addition, more than 50% of the anonymous web services The use of I2P is accomplished by using applications discovered were 70% of the time available online. It should specifically designed to interact with I2P in a transparent be noticed that the experiment had a relatively short duration way. Two relevant examples of that are I2PSnark (client for and it was performed when the network was possibly not yet BitTorrent), and I2PTunnel (services to route TCP/IP flows mature. for common applications such as web browsers, IRC or SSH), Another study is conducted in [41] to analyze I2P nodes. among others. The authors collected 16,040 I2P nodes and analyzed some One of the main elements that conform the I2P architecture of their properties, such as distribution by country, bandwidth are tunnels, since they allow sending (outbound tunnel) and utilization and FloodFill attributes. Finally, and perhaps receiving (inbound tunnel) between users/nodes in one of the most complete academic studies on I2P to date, the network. A tunnel could be defined as the composition of can be found in [4]. In it, a comprehensive empirical study of a set of routers that are in charge of sending and receiving the network is carried out where population, cancellation rate data packets to and from specific destinations or sources. (abandonment of the network by users), type of router and I2P makes use of a simple combination of symmetric geographic distribution of the I2P pairs are measured. At the and asymmetric algorithms to provide data con- time of the study, there were about 32,000 active I2P pairs in fidentiality and message integrity. It is commonly named as the network per day, and 14,000 of them were behind NAT or or encryption. More specifically, ElGamal/AES firewalls. Moreover, despite the decentralized nature of I2P, + SessionTags are used for the encryption with 2048-bit a censor actor could block more than 95% of the peer IP ElGamal, AES256, SHA256, and 32-byte nonces (single-use addresses known by a stable I2P client by only controlling 10 numbers). routers in the network, what constitutes a serious deterioration The Network Database or NetDB is an important part of of the connectivity of an I2P client. I2P as well. NetDB is a Distributed (DHT) with Through this review of the state of the art, we can conclude a structure based on the Kademlia algorithm. It contains the that the Surface Web has been explored and analyzed to a information about I2P services so that nodes in the network large extent from many points of view and approaches. We can communicate with each other. NetDB is queried the first have focused the discussion on research examples mainly time a running instance of I2P wants to contact to another based on crawling, information gathering, search engines router in the network. NetDB is disseminated by means of the and graph-shaped connectivity. Although to a lesser extent technique called Floodfill, which is based on the use of a series than the Surface Web, TOR has also been studied in depth, of special routers with the same name. Such special routers from several points of view too, e.g., content, structure (best are in charge of distributing the database and synchronizing result of this analysis is the TorMetrics Project, [42]), etc. the corresponding information about the routerInfo and the Nevertheless, I2P is still somehow unknown and further works leaseSets found on the network [44]. Unlike TOR directory about its operation and structure are desirable, specially for authorities, the Floodfill routers that compose the NetDB are not fixed or trusted routers. Indeed, any router on the network • Discoverers. As the main part of the Discovery module, can become a Floodfill if it is configured in that way. they implements the main functionality of the discover- ing procedure. Each one is responsible for checking if an B. C4i2p: A Crawling Tool for the I2P Darknet eepsite is active (reachable and operational). How many of them and when they are launched is controlled by the Although some crawling tools have been developed to date Discovery module. with different aims [45], none of them cover completely the • Spiders. Each of them, separately, is in charge of car- main objectives this work is intended for: a connectivity rying out the Crawling process of a certain eepsite. An analysis among eepsites. For this reason, a new and ready- spider dives recursively into all the pages to extract the to-use tool has been devised and developed: c4i2p, which desired information. It is launched right after an eepsite stands for ’crawling for I2P’. Beyond its usefulness to achieve is discovered. the objectives of the current work, we are convinced about • Database. The database stores in an ordered and struc- its potential use by the research community to be improved tured way all the information extracted from eepsites, as or extended in many ways. Because of that, c4i2p has been well as the inherent data of the system itself. conceived as an open source project and thus it is publicly • I2P network. The I2P network is where the eepsites are available for downloading at [46]. hosted, which are the entities from which we want to Through the rest of this section, the main modules, pro- extract information. cesses and components that shape the developed tool are • I2P router. Finally, the I2P router allows c4i2p to introduced and described. communicate with the I2P network. 1) Modules: As mentioned before, c4i2p tries to be a 3) Eepsites functional states: During its lifecycle in c4i2p, specific tool that allows users to interact with the I2P network. an eepsite goes through several states. The states and transi- Thanks to this tool, relationships among eepsites can be tions among them are depicted in Fig. 2: observed by gathering different kind of information. For that, several modules are involved in the operation of c4i2p, as • DISCOVERING. When c4i2p takes a new eepsite from shown in Fig. 1: the list of sources until the eepsite is successfully con- tacted, the eepsite is being discovered. This is the initial • Management. It is in charge of controlling the entire state of every observed eepsite. workflow of the tool. It governs the interaction between • PENDING. Once an eepsite is successfully contacted, the rest of the procedures and elements composing the i.e., it is reachable and available, it goes to the pending system. The management module is in turn managed by status where it is awaiting to be crawled. Indeed, the the end user. eepsite is queued into a FIFO based queue for that. • Discovery. It is responsible for checking the availability • ONGOING. The eepsite is being crawled. of the observed eepsites in a controlled manner over time. • ERROR. An eepsite is in the ERROR state if there • Crawling. It is responsible for carrying out the task of was an error while crawling it. In order to avoid not crawling and everything related with searching for and contemplated errors during the crawling process, the scrapping specific information from I2P eepsites. application allows a certain number crawling processes • Data Storage. It is a module aimed to provide data stora- until the eepsite is proposed to be discovered again. ge services to others. It is responsible for the storage, • FINISHED. A node is labeled FINISHED if it was persistence and management of all the information. successfully crawled. • DISCARDED. An eepsite reaches this state after a cer- 2) Processes, components and relationships: The main tain number of discovering attempts or when a predefined components that intervene in the operation of c4i2p are as on-discovering time is exceeded. This state concludes follows (see Fig. 1): that the eepsite is no longer available or it never existed. • Sources. They are eepsite addresses from I2P. They can be obtained from three different origins: IV. EXPERIMENTAL SETUP – SEED. It is obtained from the list of initial seeds In order to crawl the I2P network a distributed experimental (seeds.txt) from which the experiment starts. environment is devised. It is composed of 11 virtual machines They were selected from a preliminary search for organized in two separated groups as shown in Fig. 3. They eepsites in I2P. are distributed in different geographical places running in – FLOODFILL. They are discovered by the Floodfill different computation clusters too. On the one hand, 7 of I2P deployed routers. the VMs are deployed on the facilities of the University of – DISCOVERED. The ones discovered through the Granada (Spain): 6 named as i2pProjectM[1-6] in the crawling process, i.e. from links obtained from each figure, and the one acting as the central database identified scrapped eepsite. as i2pProjectBBDD. On the other hand, the remaining • Manager. The main procedure of the Management mod- 4 VMs are deployed on the facilities of the University ule, the Manager is the brain and core of the entire tool. of Cadiz´ (Spain), which are identified in the figure as It is in charge of coordinating and orchestrating the rest i2pProjectM[7-10]. All the virtual machines run of elements and to keep track of eepsite’s status and Ubuntu 18.04.1 LTS distribution with a total of 5 GBs RAM, lifecycle. 100 GBs HD and two 2.40 GHz vCPUs. Management seeds.txt Manager

Discovery

eepsite_1.i2p

Discoverers Data Storage I2P

Database eepsite_2.i2p I2P router I2P Crawling

Spiders eepsite_n.i2p

Figure 1. c4i2p modules and interactions.

i2pProjectM7 i2pProjectM8 i2pProjectM9 i2pProjectM10

DISCOVERING

CÁDIZ No

Yes max_attemps? || max_time?

GRANADA i2pProjectBBDD HTTP Response No == 2XX, 3XX ? DISCARDED GRANADA

Yes Yes

No F F F max_errors? PENDING F F F

i2pProjectM1 i2pProjectM2 i2pProjectM3 i2pProjectM4 i2pProjectM5 i2pProjectM6

Unexpected Successful error crawling ERROR ONGOING FINISHED Figure 3. Experimental setup where 11 virtual machines are de- ployed and distributed in different locations: i2pProjectM[1-6] and i2pProjectBBDD, in Granada (Spain); and i2pProjectM[7-10], in Cadiz´ (Spain). All the machines (except that corresponding to BBDD) run an I2P router. The ones with IDs i2pProjectM[1-6] are configured as floodfill I2P routers. Figure 2. Eeepsite’s lifecycle in c4i2p, where states (squared boxes) and available transitions (polygonal shapes) among them are depicted. eepsites. Both, Discovery and Crawling modules were accord- ingly set up in each c4i2p instance. Among other configuration Every VM executes an I2P router instance for accessing parameters, the number of simultaneous running spiders is the I2P darknet, except for the i2pProjectBBDD machine. limited to 10 to efficiently manage the VM computation Machines i2pProjectM[1-6] were set up as floodfill capacities. Moreover, a maximum value is established for the I2P routers, while the rest i2pProjectM[7-10] run as discovery time and the number of attempts. If one of such normal I2P routers. Floodfill routers, as mentioned in Sec- limits is reached, the site in process of being discovered will tion III-A, allow to discover additional eepsites since they be discarded. At the beginning, some initial I2P URLs (called all are in charge of maintaining and managing the NetDB. seeds) are equally distributed among all c4i2p instances. In Following recommendations from work [47], we set up shared our experiment we initially count on 3,938 seeds, which router’s communication bandwidth to 8 MBps, the number means that ∼ 394 initials seeds are assigned to each instance. of maximum participants in tunnels being equal to 10K. Table I shows the configuration parameters of c4i2p. In addition, it is necessary to activate floodfill routers by Finally, the i2pProjectBBDD machine runs a MySQL setting up router.floodfillParticipant=true in DataBase Management System (DBMS) to provide data per- router.config configuration file. sistence and storage for the rest of the crawling instances. The All VMs also include a c4i2p instance. They altogether DBMS engine has to be configured to support a high number provide a distributed crawling procedure where each c4i2p of concurrent connections. As it can be seen in Table I, every instance is in charge of managing and discovering its own c4i2p instance simultaneously runs 50 discovery threads (the Parameter Value Description MAX_ONGOING_SPIDERS 10 Number of simultaneous spiders MAX_CRAWLING_ATTEMPTS_ON_ERROR 2 Number of crawling attempts per eepsite MAX_DISCOVERY_ATTEMPTS 30d × 24h Number of discovery attempts per eepsite: one per hour during a month MAX_DISCOVERY_DURATION (m) 30d × 24h × 60m Maximum time for which an eepsite is tried to be discovered MAX_DISCOVERY_SINGLE_THREADS 50 Number of eepsites to be simultaneously discovered HTTP_TIMEOUT (s) 30s HTTP request timeout from an I2P URL INITIAL_SEEDS seeds.txt Path to the file containing the set of eepsite URLs INITIAL_SEEDS_BATCH_SIZE ∼394 Number of initial seeds assigned to a c4i2p instance Table I C4I2P CONFIGURATION PARAMETERS.

104 services services-SMA eepsites 103

102

101 # of observed services/eepsites

100

0 2019/06/08 2019/06/10 2019/06/12 2019/06/14 2019/06/16 2019/06/18 2019/06/20 2019/06/22 2019/06/24 2019/06/26 2019/06/28 2019/06/30 2019/07/02 2019/07/04 2019/07/06 2019/07/08 2019/07/10 2019/07/12 2019/07/14 2019/07/16 2019/07/18 2019/07/20 2019/07/22 2019/07/24 2019/07/26 2019/07/28 2019/07/30 2019/08/01 2019/08/03 2019/08/05 2019/08/07 2019/08/09 2019/08/11 2019/08/13 2019/08/15 2019/08/17 2019/08/19 2019/08/21 2019/08/23 2019/08/25 2019/08/27 2019/08/29 2019/08/31 2019/09/02 2019/09/04 2019/09/06 2019/09/08 2019/09/10 2019/09/12 2019/09/14 2019/09/16 2019/09/18 2019/09/20 2019/09/22 2019/09/24 2019/09/26

Figure 4. Daily number of observed I2P services (blue crosses), SMA (Simple Moving Average) (orange line) and eepsites (green dots). Notice the high number of services/eepsites at the beginning of the experiment where the system added more than 6,000 I2P services from initial seeds and floodfill sources.

900 eepsites (crawled) Cumulated eepsites eepsites (crawled)-SMA 800 102 700

600

101 500

400 # of eepsites

300 Cumulated eepsites

100 200

100 0 0 2019/06/08 2019/06/10 2019/06/12 2019/06/14 2019/06/16 2019/06/18 2019/06/20 2019/06/22 2019/06/24 2019/06/26 2019/06/28 2019/06/30 2019/07/02 2019/07/04 2019/07/06 2019/07/08 2019/07/10 2019/07/12 2019/07/14 2019/07/16 2019/07/18 2019/07/20 2019/07/22 2019/07/24 2019/07/26 2019/07/28 2019/07/30 2019/08/01 2019/08/03 2019/08/05 2019/08/07 2019/08/09 2019/08/11 2019/08/13 2019/08/15 2019/08/17 2019/08/19 2019/08/21 2019/08/23 2019/08/25 2019/08/27 2019/08/29 2019/08/31 2019/09/02 2019/09/04 2019/09/06 2019/09/08 2019/09/10 2019/09/12 2019/09/14 2019/09/16 2019/09/18 2019/09/20 2019/09/22 2019/09/24 2019/09/26

Figure 5. Detailed view of the daily rate of eepsites found and crawled (green dots), SMA (Simple Moving Average) (orange line) and cumulated sum (blue crosses). At the beginning of the experiment, the system crawled more than 200 eepsites from seed, floodfill and discovered sources. discoverers), the DBMS engine being configured to support Figures 4 and 5 show the number of daily observed services a minimum of 10 (VMs) × 50 (threads) = 500 concurrent during the experiment. As expected, the number of eepsites connections. found (green dots in the figures) is much lower than the number of total observed services in the darknet (blue crosses V. RESULTS ON I2P EPPSITESAND DISCUSSION in the figures). In Figure 5, a detailed view of the total number of eepsites is shown together with their cumulated sum (blue The experiment took a total of 111 days, it starting on dots in the figure). Additionally, a SMA (Simple Moving June 7th 2019 and finishing on September 26th 2019. During Average) for 5 days (continuous orange line in both figures) this period we managed a total of 54,974 I2P URLs, most is computed in order to see trends in the discovering and of them obtained from leasesets of floodfill routers. They crawling process. correspond to services that can be web related (eepsites) or As it can be seen in the previous figures, new I2P services not. In fact, only 787 of them were eepsites and, thus, they are found every day. Indeed, a daily average of ∼ 500 services were successfully crawled. is computed at the end of the experiment. Similarly, eepsites 103 FINISHED 96.66 100 DISCOVERING DISCARDED

78.77 80 74.85 102

60 101 # of sites values (%) 40

23.96 19.86 20 100 3.31 0.00 1.18 1.03 0 0 SEED FLOODFILL DISCOVERED 0 100 200 300 400 500 600 700 Source Discovery attempts

Figure 6. Source vs. status results: Percentage of services from a specific Figure 7. Distribution of discovery attempts for finished eepsites. source in a specific status.

of attempts to be finally crawled. Indeed, ∼ 13% of eepsites are crawled almost every day, that resulting in an average of needed more than or equal to 200 attempts to be crawled. ∼ 5 which motivates the linear increment in the number of The main hypothesis behind this result is that these sites are found eepsites from the beginning of the experiment shown in continuously appearing and disappearing from the network Figure 5 (blue dots). The high number of eepsites observed in and they are difficult to be reached with just few discovering the first day is motivated by the initial system contribution to attempts. the total number of services. These eepsites come from seed, In summary, new I2P services are continuously appearing floodfill and discovered sources. Moreover, no significant in the darknet though most of them are not eepsites and have trends can apparently be seen for both total services and been obtained from floodfill routers. Moreover, most of active eepsites, according to the SMA values. eepsites in the past, those included as seed sources in this experiment, are not currently available. This could mean that, Some interesting conclusions can be also obtained from the in general, eepsites have a short life. In fact, only 3.31% analysis of the status and sources of the observed services. of the eepsites from seed sources have been finally crawled Figure 6 shows the rate of observed services in a specific according to Table II. As it will be seen in the next section, status, ordered by their source. These results are numerically such eepsites correspond to either source or sink eepsites shown in Table II too. From them, it can be seen that the (nodes) in the darknet. Moreover, even having a low number number of discarded eepsites is very high in comparison with of finished eepsites from discovered sources, it is expected the ones finally crawled (finished). Moreover, the number of to find a higher number of eepsites obtained from crawling finished eepsites is only ∼ 1.5% (787) of the total observed eepsites (discovered) than from other kind of sources due to services in the darknet. Notice that most of the services their interconnections and relationships. Finally, active and were obtained from floodfill routers (50,784), the number persistent eepsites are easy to be discovered. of discovered ones being really low (292). This last value is clearly motivated by the low number of eepsites finally A. Size analysis crawled. However, in percentage, the number of finished After the various results and behaviors previously de- eepsites obtained by crawling other eepsites (discovered) is scribed, we now perform an analysis about the size of I2P higher than those obtained from floodfill routers. Additionally, eepsites. For that, information about their number of pages as expected, the higher number of discovered eepsites comes is extracted during the crawling procedure. Additionally, the from floodfill sources. Finally, 96.66% of services from seed number of words, letters, images and scripts in the home related sources have been discarded. The initial list of seeds page, as well as the predominant eepsite’s language, are also was chosen manually from a previous work but, as the results estimated. shown, it can be obsolete with most of the eepsites no longer According to Figure 8, the I2P darknet is mostly composed available currently. As a consequence, we can conclude that of small eepsites. In fact, ∼ 80% of the total number of the number of eepsites comprising the I2P network changes eepsites have less than or equal to 30 pages. However, there over time, some of them disappearing from the network while are some exceptions. For instance, extraordinary large eepsites some others are created. with 5, 6 or even 11 thousand pages exist, the largest one It is also interesting to see the distribution of the number of having 20,630 pages. Table III shows sites with more than discovering attempts consumed for finished eepsites. This is 1,000 pages. shown in Figure 7. A deep inspection of the results concludes Regarding the size of eepsites related pages, Figure 9 shows that only ∼ 20% of eepsites needed 1 attempt to be finally that almost all eepsites have a home page with a few number crawled. From that we can conclude the poor availability of of letters, words, images and scripts. eepsites in the I2P network. Another interesting result is the Finally, also an analysis about the predominant language fact that some other nodes required an exceptional number used in eepsites is performed. To do that, we considered a Source # services by source Status # by status % by status # total services % of total services FINISHED 129 3.31 0.25 SEED 3,938 DISCOVERING 0 0 0 DISCARDED 3,768 96.66 6.85 FINISHED 600 1.18 1.09 FLOODFILL 50,784 DISCOVERING 12,167 23.96 54,974 22.13 DISCARDED 38,012 74.85 69.14 FINISHED 58 19.86 0.15 DISCOVERED 292 DISCOVERING 3 1.03 0.005 DISCARDED 230 78.77 0.42 Table II SOURCEVS. STATUS RESULTS:AGGREGATEDRESULTSOBTAINEDFROMTHEEXPERIMENTDEPENDINGONTHESOURCEANDTHEEEPSITES’ STATUS.

# of pages Eepsite 20630 ux6prousphswf56bym7yo7kst4ybh45y2z2wrnw7dujmrz56hq4q.b32.i2p 11059 qiii4iqrj3fwv4ucaj2oykcvsob75jviycv3ghw7dhzxg2kq53q.b32.i2p 6336 qa4boq364ndjdgow4kadycr5vvch7hofzblcqangh3nobzvyew7a.b32.i2p 5913 whba2ljn2sjvke45yjkyudzmelwkjcop3m7r6kubohngq3pb6cqa.b32.i2p 3614 a2lnpfsrhy5d3yky6xsut6gj6j76vn3lsy7kvabvedtu2d37s65q.b32.i2p 3516 gwqdodo2stgwgwusekxpkh3hbtph5jjc3kovmov2e2fbfdxg3woq.b32.i2p 3118 u6pciacxnpbsq7nwc3tgutywochfd6aysgayijr7jxzoysgxklvq.b32.i2p 2186 i2pforum.i2p 1194 25cb5kixhxm6i6c6wequrhi65mez4duc4l5qk6ictbik3tnxlu6a.b32.i2p Table III I2P EEPSITESWITHMORETHAN 1,000 PAGES.

Table IV 103 LANGUAGEOFEEPSITESACCORDINGTO GOOGLE TRANSLATOR ENGINE.

Language % English 96.31 2 10 French 0.86 German 0.86 Spanish 0.62 Norwegian 0.25 101 Latin 0.25

# of sites Italian 0.25 Welsh 0.12 Turkish 0.12 Portuguese 0.12 100 Dutch 0.12 Catalan 0.12 0 0 5000 10000 15000 20000 # of pages connection from u to v. This way, the number of edges u (u, v ), (u, v ), ··· , (u, v ) Figure 8. Eepsites size: Distribution of the number of pages of found from node to some other nodes: 1 2 n , eepsites. is denoted as out-degree (eepsite outgoing links). Similarly, the in-degree (eepsite incoming links) value of a node u corresponds to the number of edges pointing to node u from widely accepted language detection engine: Google Transla- some other nodes: (w1, u), (w2, u), ··· , (wm, u). Based on tor. The results obtained are shown in Table IV, from which that, we can define four types of nodes: source nodes, with we can conclude that English is the most used language. only outgoing links; sink nodes, with only incoming links; However, we cannot be very confident about these results due connected nodes, with both incoming and outgoing links; and to the small number of crawled eepsites. isolated nodes, which are not connected to any other node in the graph. B. Connectivity analysis The distribution of outgoing and incoming links is shown Several works in the literature have successfully addressed in Figures 10 and 11, respectively. We can observe that most the study of the Surface Web connectivity as a graph [23], of the crawled nodes have not significant out-degree or in- [24]. In this section, we will adopt a similar strategy to degree values. In fact, ∼ 10% of the nodes correspond to understand and unveil I2P eepsites relationships and orga- source nodes, ∼ 12% to sink nodes, and ∼ 12% to connected nization patterns. First of all, a brief explanation of graph nodes. Finally, ∼ 66% of nodes are isolated nodes. theory concepts is introduced. Second, specific results will be We should remark that there are some eepsites that can presented and discussed in detail. be considered as anomalous in terms of out- and in-degree Let G = (V,E) be a directed graph comprising a set values. Nodes with highest out- and in-degree values are of nodes V (eepsites) an a set of edges E (connections). summarized in Tables V and VI, respectively. In particular, Each edge connects a pair of nodes (u, v) through a direct the node with the highest value of outgoing links, 385 in total, 103 103

102 102

1 10 101 # of sites # of sites

0 10 100

0 0 0 0 25000 50000 75000 5000 100000 125000 150000 175000 10000 15000 20000 25000 30000 35000 40000 # of letters at home # of words at home

(a) (b)

103 103

102 102

1 10 101 # of sites # of sites

0 10 100

0 0 0 50 0 5 100 150 200 10 15 20 25 30 35 # of images at home # of scripts at home

(c) (d)

Figure 9. Eepsites content analysis in terms of number of (a) letters, (b) words, (c) images and (d) scripts.

103 103

102 102

101 101 # of sites # of sites

100 100

0 0 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 Outgoing links (out-degree) Incoming links (in-degree)

Figure 10. Distribution of eepsite outgoing links (out-degree). Figure 11. Distribution of eepsite incoming links (in-degree). is the identiguy.i2p eepsite. It contains a dynamically nodes are inconsistent, or many of the eepsites are unattended updated list of eepsites but does not cover the whole I2P, and not frequently updated. i.e. the hidden part. On the other hand, the node with the In addition, all the highest out- and in-degree values corres- highest value of incoming links, 58 in total, corresponds with pond to eepsites found from seed source. Based on that, we forum.i2p. It is worth noting that most of the biggest in- can confirm the existence and persistence of some relevant degree nodes were offline (discarded) probably due to two I2P eepsites supporting a kind of network backbone among principal reasons: the number and relationships among the eepsites. However, there is also an important group of eepsites, trac.i2p2.i2p forces.

stats.i2p s6lagaqbn572fvnr7vsx... i2pwiki.i2p In this work, we inspect the I2P darknet by devising new i2p-projekt.i2p

zy37tq6ynucp3ufoyeeg... methods and tools based on crawling procedures. The results obtained conclude that I2P is a highly decentralized network, where new services dynamically appear and disappear over 5ypxuqf2ufqdsf3ejv5x... pwgma3snbsgkddxgb54m... time. A small portion of the I2P services are web related

uda2rkhskjdb4w7xiftz... services, called eepsites. Regarding content and size aspects, i2pforum.i2p it can be concluded that eepsites are mainly small websites. From the perspective of connectivity and relationships, eep- sites are very disconnected from the rest, i.e., they present low

4bpcp4fmvyr46vb4kqjv... fex6v4zccrovs7dixqbi... isxls447iuumsb35pq5r... in- and out-going connectivity degrees. On the contrary, few of them highlight from the rest in the same terms thus having a kind of eepsite network backbone. Moreover, more than

inr.i2p a half of the total observed eepsites in our experimentation identiguy.i2p were completely isolated, thus becoming a hidden part of the overall set of eepsites. Figure 12. Connectivity graph for nodes with highest out-degree values. Beyond the results obtained, some further work should be done. This way, we plan to extend the experimentation to sperrbezirk.i2p cover as many I2P nodes as possible. Another relevant focus tino.i2p inproxy.tino.i2p is to study isolated eepsites and those separated from the rest making groups. It would be also interesting to analyze some perv.i2pi2host.i2pno.i2p www.i2p2.i2pugha.i2p other type of services used in I2P. Finally, aimed to get a more forum.i2p in-depth knowledge of the Deep Web, it would be necessary to extend the tool to address other darknet technologies, like Freenet (https://freenetproject.org/) for a further comparison. i2pwiki.i2p ACKNOWLEDGEMENT i2p-projekt.i2p www..i2p This work has been partially supported by Spanish Government-MINECO (Ministerio de Econom´ıa y Compet- killyourtv.i2p itividad), the ERDF (European Regional Development Fund) through project TIN2017-83494-R (MDSM) and the 5B starting research grants program of the University of Granada.

str4d.i2p REFERENCES visibility.i2p [1] M. K. Bergman, “White Paper: The Deep Web: Surfacing Hidden bote.i2p diftracker.i2p Value,” Journal of Electronic Publishing, vol. 7, no. 1, 2001. [2] M. W. Al-Nabki, E. Fidalgo, E. Alegre, and L. Fernandez-Robles,´ Figure 13. Connectivity graph for nodes with highest in-degree values. “TORank: Identifying the most influential suspicious domains in the tor network,” Expert Systems with Applications, vol. 123, pp. 212–226, 2019. [3] G. Kalpakis, T. Tsikrika, N. Cunningham, C. Iliou, S. Vrochidis, the isolated ones. J. Middleton, and I. Kompatsiaris, “OSINT and the ,” in Open Figures 12 and 13 depict the relationships among top Source Intelligence Investigation: From Strategy to Implementation, ser. Advanced Sciences and Technologies for Security Applications, eepsites with the highest out- and in-degree values. In the B. Akhgar, P. S. Bayerl, and F. Sampson, Eds. Cham: Springer figures, the size of the labels and nodes is directly proportional International Publishing, 2016, pp. 111–132. to their out- and in-degree values, respectively. As shown, all [4] N. P. Hoang, P. Kintis, M. Antonakakis, and M. Polychronakis, “An empirical study of the I2P anonymity network and its censorship the nodes are connected. resistance,” in Proceedings of the Internet Measurement Conference An overall view of the connectivity of the I2P network is 2018. ACM, 2018, pp. 379–392. finally shown in Figure 14, where only nodes with at least [5] A. Kumar and E. Rosenbach, “The Truth About the Dark Web,” Finance & Development, vol. 56, no. 3, pp. 22–25, 2019. one incoming or outgoing link are depicted. In this figure, the [Online]. Available: https://www.imf.org/external/pubs/ft/fandd/2019/ neighborhood for node identiguy.i2p can be observed, 09/pdf/the-truth-about-the-dark-web-kumar.pdf where all the nodes (colored nodes and links) can be accessed [6] A. Biryukov, I. Pustogarov, F. Thill, and R.-P. Weinmann, “Content identiguy.i2p and popularity analysis of tor hidden services,” in 2014 IEEE 34th in three or less hops. It is also remarkable International Conference on Systems Workshops the existence of pairs of nodes separated from the rest. (ICDCSW). IEEE, 2014, pp. 188–193. [7] M. W. Al Nabki, E. Fidalgo, E. Alegre, and I. de Paz, “Classifying VI.CONCLUSIONSANDFUTUREWORK illegal activities on TOR network based on web textual contents,” in Proceedings of the 15th Conference of the European Chapter of the Crawling the Surface Web has been widely addressed by Association for Computational Linguistics: Volume 1, Long Papers, the research community with different aims. However, the 2017, pp. 35–43. [8] A. Biryukov, I. Pustogarov, and R.-P. Weinmann, “Trawling for tor Deep Web part is still highly unknown despite its big interest hidden services: Detection, measurement, deanonymization,” in 2013 not only for researchers, but also for authorities and security IEEE Symposium on Security and Privacy. IEEE, 2013, pp. 80–94. out-degree Eepsite Status Source 385 identiguy.i2p FINISHED SEED 248 i2pwiki.i2p FINISHED SEED 152 inr.i2p FINISHED SEED 70 s6lagaqbn572fvnr7vsxsqwbwxb6m3gr5hu6eetstykdm2opp2ia.b32.i2p FINISHED FLOODFILL 54 zy37tq6ynucp3ufoyeegswqjaeofmj57cpm5ecd7nbanh2h6f2ja.b32.i2p FINISHED SEED 45 5ypxuqf2ufqdsf3ejv5xwrgwatjxf2uw7tyxz2av44pka4w3pvza.b32.i2p FINISHED FLOODFILL 31 fex6v4zccrovs7dixqbigbbqtrb7ylrmpgphwnwoyjutrg56qmoa.b32.i2p FINISHED FLOODFILL 28 i2pforum.i2p FINISHED DISCOVERED 26 stats.i2p FINISHED DISCOVERED 26 pwgma3snbsgkddxgb54mrxxkt3l4jzchrtp52vxmw7rbkjygylxq.b32.i2p FINISHED SEED 19 trac.i2p2.i2p FINISHED SEED 18 i2p-projekt.i2p FINISHED SEED 17 isxls447iuumsb35pq5r3di6xrxr2igugvshqwhi5hj5gvhwvqba.b32.i2p FINISHED SEED 17 4bpcp4fmvyr46vb4kqjvtxlst6puz4r3dld24umooiy5mesxzspa.b32.i2p FINISHED SEED 15 uda2rkhskjdb4w7xiftz3btfpl7bhxsy5gwpiiiongte4gulbuza.b32.i2p FINISHED FLOODFILL Table V EEPSITESWITHLARGESTOUT-DEGREE VALUES.

in-degree Eepsite Status Source 58 forum.i2p DISCARDED SEED 55 ugha.i2p DISCARDED SEED 52 www.i2p2.i2p DISCARDED SEED 49 i2p-projekt.i2p FINISHED SEED 47 no.i2p DISCARDED SEED 46 inproxy.tino.i2p DISCARDED SEED 45 i2host.i2p DISCARDED SEED 45 perv.i2p DISCARDED SEED 45 tino.i2p DISCARDED SEED 41 i2pwiki.i2p FINISHED SEED 21 sperrbezirk.i2p DISCARDED SEED 17 diftracker.i2p FINISHED SEED 13 visibility.i2p DISCARDED SEED 12 vstr4d.i2p DISCARDED SEED 11 www.imule.i2p DISCARDED SEED 11 bote.i2p DISCARDED SEED 11 bkillyourtv.i2p DISCARDED SEED Table VI EEPSITESWITHHIGHESTIN-DEGREE VALUES.

[9] G. Kadianakis and K. Loesing, “Extrapolating network totals from and empirical analysis for I2P eepsites,” in 2017 IEEE Symposium on hidden-service statistics,” , Tech. Rep. 2015-01-001, Computers and Communications (ISCC). IEEE, 2017, pp. 444–449. 2015. [Online]. Available: https://research.torproject.org/techreports/ [18] E. Spertus, “Parasite: Mining structural information on the web,” extrapolating-hidserv-stats-2015-01-31.pdf Computer Networks and ISDN Systems, vol. 29, no. 8-13, pp. 1205– [10] A. Montieri, D. Ciuonzo, G. Bovenzi, V. Persico, and A. Pescape,´ 1215, 1997. “A Dive into the Dark Web: Hierarchical Traffic Classification of [19] S. J. Carriere` and R. Kazman, “Webquery: Searching and visualizing Anonymity Tools,” IEEE Transactions on Network Science and En- the web through connectivity,” Computer Networks and ISDN Systems, gineering, vol. 7, no. 3, pp. 1043–1054, 2020. vol. 29, no. 8-13, pp. 1257–1267, 1997. [11] A. Montieri, D. Ciuonzo, G. Aceto, and A. Pescape,´ “Anonymity [20] S. Lawrence and C. L. Giles, “Accessibility of information on the web,” Services Tor, I2P, JonDonym: Classifying in the Dark (Web),” IEEE Nature, vol. 400, no. 6740, p. 107, 1999. Transactions on Dependable and Secure Computing, vol. 17, no. 3, pp. [21] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web 662–675, 2020. ,” Computer networks and ISDN systems, vol. 30, no. 1-7, [12] Z. Cai, B. Jiang, Z. Lu, J. Liu, and P. Ma, “isAnon: Flow-Based pp. 107–117, 1998. Anonymity Network Traffic Identification Using Extreme Gradient [22] K. Bharat and M. R. Henzinger, “Improved algorithms for topic distil- Boosting,” in 2019 International Joint Conference on Neural Networks lation in a hyperlinked environment,” in Proceedings of the 21st annual (IJCNN), 2019, pp. 1–8. international ACM SIGIR conference on Research and development [13] H. Yin and Y. He, “I2P Anonymous Traffic Detection and Identifica- in information retrieval, ser. SIGIR ’98. New York, NY, USA: tion,” in 2019 5th International Conference on Advanced Computing Association for Computing Machinery, 1998, pp. 104–111. Communication Systems (ICACCS), 2019, pp. 157–162. [23] J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. [14] K. Shahbar and A. N. Zincir-Heywood, “How far can we push flow Tomkins, “The web as a graph: measurements, models, and methods,” analysis to identify encrypted anonymity network traffic?” in NOMS in International Computing and Combinatorics Conference. Springer, 2018 - 2018 IEEE/IFIP Network Operations and Management Sympo- 1999, pp. 1–17. sium, 2018, pp. 1–6. [24] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, [15] G. Avarikioti, R. Brunner, A. Kiayias, R. Wattenhofer, and D. Zin- R. Stata, A. Tomkins, and J. Wiener, “Graph structure in the web,” dros, “Structure and content of the visible darknet,” arXiv preprint Computer networks, vol. 33, no. 1-6, pp. 309–320, 2000. arXiv:1811.01348, 2018. [25] R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer, “Graph structure in [16] I. Sanchez-Rola, D. Balzarotti, and I. Santos, “The onions have eyes: A the web—revisited: a trick of the heavy tail,” in Proceedings of the comprehensive structure and privacy analysis of tor hidden services,” in 23rd international conference on World Wide Web. ACM, 2014, pp. Proceedings of the 26th International Conference on World Wide Web. 427–432. International World Wide Web Conferences Steering Committee, 2017, [26] R. Baeza-Yates and C. Castillo, “Relating web characteristics with link pp. 1251–1260. based web page ranking,” in Proceedings Eighth Symposium on String [17] Y. Gao, Q. Tan, J. Shi, X. Wang, and M. Chen, “Large-scale discovery Processing and Information Retrieval. IEEE, 2001, pp. 21–32. s5b7l3hzs3ea535vqc5q...

evilchat.i2p

www.itphx.i2p

wdoh7rf7i2ze7xqwocxt...

cv72xsje5ihg6e24atit...

radio.r4sas.i2p

privacy.i2p qgv64klyy4tgk4ranazn...

tmlxlzag7lmkgwf6g2ms... fe6pk5sibhkr64veqxkf...

ov27guhkdyhgdkqcsi7i... hjmvfhdarw... n6uw4n5lhvi25qmybh... hackerculture.i2p ah4r4vzunzfa67atljlb... rnmosuwvtftfcrk5sk7z... arc2.i2p q3x6msrirtu5gy5vilpk...

update.dg.i2p outproxy-tor.meeh.i2p zh2qyqgeuzj5llclb2h6... xxu3lso4h2rh6wmrxiou...jikx.i2p anoncoin.i2p privacytools.i2p chen.i2p susi.i2p 3fjk4nfk3mccch4hdreg... lolicatgirls.i2p a4lzmjyba7aq7hl6okqp...pjhlh4h7zn3slglx46ff... bm2fib3tumer72lopjh4... li5kue3hfaqhhvaoxiw2...opal.i2p rs-tor.i2p jrandom.dev.i2p 3gvn4kwd7z74jxc2sn4u... cerapadus.i2p hiddenanswers.i2p ht4cn7mltt3bn7fm5uvb... onepost.i2p oitoja234mwfbeuhtuib... kj2kbzt27naifij4ki6b... rduua7bhest6rwsmmytt... shoieq3.i2p lawiki.i2p aktie.i2p mystery.i2p andmp.i2pdkkl3ab6iovxfqnp44ws... webviewer.aktie.i2p dkb5f63obsb6wmzcgile...

tracker.psi.i2p nine.i2p www.i2pcn6t5izb84928fjizjfi... lcjxhuo5lbrol3e5ijj3...openarea.i2p y7hb4kzomfzzhekib2mb...open4you.i2p

4qingryacppi5ed2vk26... ccea4jybmr3xbfiykaoa... 2pei6r2naufy54vby3iq...dnca-anarplex.i2pecc-anarplex.i2p temporarypluginlist.... mqllejokgwjpp5kc2bl5... yhs7tsjeznxdenmdho5g...exchanged.i2p ylkch2nkrwehakx4z6wi... anarplex.i2p 4qwhpshwlp6ndzvtiwvf...tyw3jaszslpsngs7tbrz... ygzrfsuuytq6chx26qtd... secure.thetinhat.i2p io.i2p

o5zdq6iuire5eatwehhh... rv6zugykqdhmwwsuglv7...ux6prousphswf56bym7y... 7uea4hzuqwbow5c6hm3g...l6ly7szjqvbffz7hacvj... nmju26n2m7tyqbuis6gh... www.i2ch.i2p wue3ycaccf4texikza3f...pl4pccq65gaee64zgo2s... pinkpaste.i2p j5l4jf4ddovwfr4geyom... wfewiqkdt5bvowfp3igm... sperrbezirk.i2p bf6ixuhfjftzk5hhag3e...aazr55itvyns4lwppvx5... bmtuitygmdsggzwrjiit... example.i2p i2p-bt.postman.i2p almmkofqaprptb7maojy... xrdc2qc7quhumhglpbcu...azkhqh2yvg7twsrhfqaq...tino.i2p zv2etzurnqjuuajmeb5g... rs-freenet.i2p update.killyourtv.i2p 7jq3ww5ex5f7esvixhez...bcdxkhp5tq2kw4kzumn4... gvofjbuvkl65f2npmyyi... wwjh534xtxaw6ofyklm6...ejokc34fr3kcprasmdvv...q36ixmjw7niomtxvsptv... http.i2p tjtcervddo6xi2krj76k...inproxy.tino.i2p zroed2cxga5zeuu6rcvm... .i2p kjff4hohxnrwwozi4rzu... gcqohmcgbq3naq5fxen6... paste.crypthost.i2p bluepaper.i2p lnmwcbqedc5s4xfyrayp...oartgetziabrdemxctow...df2kxa5jvbfm36bfuw2g... baltijateam.i2psmeghead.i2ppostman.i2p hq.postman.i2p avl2n23ukdvucdk5upvr...i2host.i2p no.i2p irc.devfs.i2p paste.i2p2.i2pdocs.i2p2.i2p mv32lpcx5gpsioobdbo4... perv.i2p hcogmvou7ogzae37h4l5...2xa7va6jsuyluoy4ocdp... fomjl7cori4juycw55kd...jhpvnnhfrozore2wlhi5... ugha.i2p deic2twnqvpeksyrhecz...j4layma3akpqdhjygbvu... o5p4gocoulbw2rg2uadu... -escrow.i2p nntp-overchan.i2p fnibopri3v7ou4n4t2fl... cgfnyxv62msfynsfbv3k... man.doc.i2p www.i2p2.i2p ui6vb2oubjxeljsyfbp6... xosberjz2geveh5dcstz... gallery.i2p trac.i2p2.i2p 2y7fgouz2sirzg62mj3k... wvup5klmiv5mam55jmwo... 2ps7wt3v67ni2d535lwa... q6jlwyjwgaqnhj6dohzj...rh4gqqzqkcfkqrof5sou...k4icuorgtmobxekbtfhp...ghlra7mm6fjqntmadgiv...riz5ztjyumwat5mmdwes... r6m7cy2ifxea54porru4...pds246heu42hlydlihdm...w4lgoexstg4hkunnhr74...iqwixx4vukgsjjdcjr7i... vmow3h54yljn7zvzbqep... ezmtfe5smkcw6hmytbaw...xdffxnxtjd6ji2zig3cg... d5rw2gplqr5tb4e2b3e2...frrl6jltlsxh5hsyqmhz... duck.i2p 6d2tru665udiixwb7miy...www..i2p forum.i2ptx5erhwh4inxu7zepn36...wfvgyyp3lbztwkmc75l7...qa5qxgica62szhwbuhuv... dxzos3skva6irjef2tet... freenet3.i2p pqiur6srlfszrl342x5q... ecduxoion5uc5hnvzjxf... nntp-paste.i2parchbsdmirror.i2p ipredia.i2p pastethis.i2p larvalstage.i2p xvghkss2pomhxxk4ch7p...666xzh44njtf3jnan6q4...62bm7v3hshk2qmgn7fnb... thadsabsi3fujk3e34gx... fx5dz54unkvktwmccrcu... classica.i2p i2p2tor.i2pflz.i2pmempo.i2p plugins.i2p .i2p dev.i2p openmusic.i2pihave2p.i2p rusleaks.i2p onh62ijqpgq3pr6wuzhb... anarchistfaq.i2p projects.i2p eepsites.i2p tpb.i2p shadowlife.i2ppartager.i2p epsilon.i2p who.i2p www.allofthis.i2pbtdigg.i2p ihave2paste.i2parchlinux.i2p codevoid.i2p 657xcllunctawyjtar5k... qtbwkqmdjdbb4owr3q4o... orion.i2p eepcast.i2pi2pfind.i2p thepiratebay.i2p stats.i2p direct.i2p rkpctabyjlyecpjdqqup... ofs.i2p darkreboot.i2pelgoog.i2p wqjsmjiwdcpx7dtm6rsz... syndie.welterde.i2p py-i2phosts.i2p afvtspvugtd32rsalxir... sigterm.i2p s6lagaqbn572fvnr7vsx... echelon.i2p qa4boq364ndjdgow4kad... hush.i2p itoopie.i2p bob.i2p lenta.i2p i2pwiki.i2p kxcyhuzgp2wlxz4sczyz... i2pbote.i2p darknet-.i2p privacyhawk.i2p i2p-projekt.i2p sponge.i2p linkz.i2p i4j37hcmfssokfb6w3np... n0pe.i2p u4s3jneepk3akoez46kq... lists.i2p2.i2p www.imule.i2p jwebcache.i2p dcherukhin.i2p shinobiwan.i2p abzmusjcm3p3llj4z7b5... jisko.i2p chitanka.i2p bxgqwfsnr3bgnr6adn62...icu812.i2p wiki.meeh.i2p y6d4fs3rpqrctuv77ltf... deadman.i2psatori-wiki.i2p q.i2p i2pfox.i2p zy37tq6ynucp3ufoyeeg... i2pchat.i2p i2jump.i2p syndie.darrob.i2parchive1.i2p jct3gx5wv2etpfhivy5x...halp.i2p killyourtv.i2p syndie.meeh.i2p yellowpages.i2pzephyr.i2pwrrwzdgsppwl2g2bdohh... salt.i2psyndie.inscrutable.i2psyndie.killyourtv.i2p archive2.i2p quotes.i2p tgtphg4wsipkb4r2hdze... archive3.i2p xmpp.volatile.i2p i2pmonkey.i2p archiv.tutorials.i2p det.i2pbhjhc3lsdqzoyhxwzyrd... q6rve733tvhgyys57jfw...mail.volatile.i2p inscrutable.i2p tracker.welterde.i2peye.i2p volatile.i2p reseed.i2p thetinhat.i2p qiii4iqrj3fwv4ucaji2...dn3tvalnjz432qkqsvpf... bigbrother.i2p whnxvjwjhzsske5yevyo... zab.i2p cbobsax3rhoyjbk7ii2n... bkf2zg3x5grgr2l4wjcv... muwire.i2p lawiki2p.i2p zzz.i2p xl33tvill3.i2p skank.i2p epi74hioo3hhffw3tpl6... syndie.echelon.i2p z3xmunpknnetfm7okrjy... forums.i2p thornworld.i2p repo.i2p 7etudb75nhe6vxslitpt...id3nt.i2p zrxzv2vggrnzjzvzi4md... str4d.i2p mabdiml4busx53hjh4el... que23xpe7o3lzq6auv6s... giqnkltyugfmsb4ot5yw... 3xqs2da7pnsqzmxqudsi... irongeeks.i2p aplus.i2ptx22i6crnorzuti3x6va... r5vbv2akbp6txy5amkft... co4edoxfpapksavjpbkd... git.repo.i2p tahoe.fatsvin.i2p visibility.i2p 3mvo5eifcwplcsoubtvq... redliberty.i2p bobthebuilder.i2p planet.i2p zerobin.i2pfzbdltgsg7jrpz7gmjfv... eepsitelist.i2p traveller.i2p fsec.i2p theanondog.i2p37v2wqt3x2d2s54ezo54... vev6dqg7kcdo7dslanvd... jf2g5hus6gp2jfgk7zgc...simplestore.i2p 5ypxuqf2ufqdsf3ejv5x... christianworks.i2p freedominthedeepweb.... halcyon.i2p v2stz6bsot7sbjzix5tk...fifi4all.i2p animes.i2p sglg.i2p bote.i2p tracker2.postman.i2p zerofiles.i2p diftracker.i2p torrfreedom.i2p pwgma3snbsgkddxgb54m... uxe3lqueuuyklel23sf5...torrent.repo.i2ptor.halp.i2p ginbb7blr6rvgfkuyx74... ebovdlxw47kodamth7p3... nfrjvknwcw47itotkzmk... code.halp.i2p utnamrwujeccu7aj7aau... i2p.halp.i2p tinyslibrary.i2p 3jvbwc7sy4lnhj25nj7y... bvpy6xf6ivyws6mshhqm... anongw.i2p5m3pd32zx43xk3uz6hvr... freedomforum.i2p marxists.i2p mpc73okj7wq2xl6clofl... tracker.thebland.i2pseeker.i2p fa.i2p red.i2p oiviukgwapzxsrwxsouc... omax2s5n4mzvymidpuxp... fs.i2p owrnciwubb3f3dctvlmn... gbh4eerxdsph7etxsxzn... sqz.i2p torrentfinder.i2p fbfvkq35z52cansvucog...iq26r2ls2qlkhbn62cvg... i2p_address.i2plepah55qyp2fhuwxlz7b... tahoeserve.i2p prmbv3xm63ksoetnhbzq...stream.i2p mosfet.i2p bikpeyxci4zuyy36eau5... qktkxwawgixrm5lzofnj... ehkjj4ptsagxlo27wpv4... synar.i2p opendiftracker.i2p tome.i2p theyosh.i2p s5ynkgagndmpxpf2kmne... redzara.i2p 5m7ygxhcdyfa3kx3wfjb... anodex.i2p wa11ed.city.i2p xotc.i2p go.i2p bioq5jbcnfopqwvk7qss... k1773r.i2p vathk2pyvaskeie63yyg...pharos.i2p libertor.i2p rutor.i2p projectmayhem2012-08... i2pbuggenie.i2p papel.i2p 6n6p3aj6xqhevfojj36d... hiddenchan.i2p 7mxwtmala3ycg2sybjww...city.i2p pasta-nojs.i2p in.i2p gqgvzum3xdgtaahkjfw3... nacl.i2p tracker.crypthost.i2puda2rkhskjdb4w7xiftz... tutorials.i2p i2peek-a-boo.i2p garden.i2p q7r32j7lc3xgrcw2ym33... uajd4nctepxpac4c4bdy... 2fwlpuouwxlk2nj4xklv... i2push.i2p qkk2dqx6nocycgt3vins... ipll7sit24oyhnwawpvo... i2pforum.i2p scussuni.i2p deb-mirror.i2p nvolgdj333esokdlf6z4...5iov5rqup5ji6os3zpxv... ll6q4lsirhwkln4dqxwq... lvxboy6ni52i7dwe4a54... git.psi.i2p noxan.i2p jz4quyw7zt63tmw65jfp... zma5du344hy2ip5xcu6x...em763732l4b7b7zhaolc...25cb5kixhxm6i6c6wequ... statistic.i2p ipfs-gw.i2p psi.i2p 1st.i2p p4swjett52gpafbbdkxw...ynkz7ebfkllljitiodcq... irc.i2p z7kw4nhoorv47jewfpvo... pyqjnycm55cuxow22voq... cw7ge5ey4ekp5iep2kaw...es.hiddenanswers.i2p l3ohmm4ccxvyuxuajead... jnpykdpp46zenz4p64eb... keyserver.sigterm.i2p invisible-internet.i2p 6ox7c4vu2dwezfxbtlv6... nnkikjorplul4dlytwfo... trlr7pylub7xl7lolfv3...ipfs.kycklingar.i2p gctswdhp4447yibxfbqg...dszr2z2swmeczeao2bqg...g7c54d4b7yva4ktpbaab... vsyjsbbf2pyggtilpqwq...i.i2p 5vrpjx4xika3gtdoazc5...lib.i2p zmw2cyw2vj7f6obx3msm... legwork.i2p kycklingar.i2p cuss2sgthm5wfipnnztr... ud3gcmvysjch4lbjr2kh... lifebox.i2p sbnoqznp5yzxals3vs6n... pool.gostcoin.i2p darkrealm.i2p 4gzcllfxktrqzv3uys5k... 0xcc.i2pirc.00.i2p edvccoobtjukjgw2os5e... deepwebradio.i2p diasporg.i2p

explorer.gostcoin.i2pvodkv54jaetjw7q2t2ie... chess.i2p wallet.gostcoin.i2p rus.i2p 6u2rfuq3cyeb7ytjzjxg...ilcosmista.i2p 102chan.i2p ptt.i2p lg.i2p

flibusta.i2p

fbdd4syj2v52x5zord67... gstbtc.i2p dreadpiratessociety.... hiddenbooru.i2p ctvfe2fimcsdfxmzmd42... wiki.ilita.i2p lainchan.i2p 4bpcp4fmvyr46vb4kqjv... fex6v4zccrovs7dixqbi... nvspc.i2p fsoc.i2p i2pd.i2p isxls447iuumsb35pq5r... black.i2p gostcoin.i2ponelon.i2p check.kovri.i2p clfi6ogwn6v72h5ghq3b.lnd.i2 7ko27dxvicr2sezvykkr...dead.i2p imule.i2p .i2p knijka.i2p hata.i2pmeshforum.i2p fantasy-worlds.i2panonyradio.i2p web.telegram.i2p false.i2p magix.i2p 333.i2p obmen.i2p i2podisy.i2prutracker.i2p pravtor.i2p dropbox.i2p infosecurity.i2p inclib.i2p deepdankmemes.i2p ebooks.i2p irkvgdnlc6tidoqomre4... isitup.i2pdosje.i2pnews-ru.i2p 5vik2232yfwyltuwzq7h...pizdabol.i2p hiddengate.i2p s3elzoj3wo6v6wqu5ehd... 2ch.i2p r4sas.i2p isotoxin.i2pblog.tinlans.i2p thebland.i2p co.i2p i2pmetrics.i2pforum-ru.i2p dumpteam.i2p statone.i2p 65tnauyus3odr47c7tya... tracker.lodikon.i2p u5safmawcxj5vlrdtqrs... i522xmu63hfbaw2k54ct...v3gkh5kqzawn2l3uzhw6...

ou7g64d5in4sugv5fgmm... irc.r4sas.i2p ca.i2pd.i2p libertas.i2p

x5oovnhnuli5fnwtgkbd... sqpxarqcgkozjywghang... lm.i2phodhusp73gltozgrnian... m16.i2p jd3agbakybnhfvkeoxrx... ucsr3eveuc4mx5y6gxno... exch.i2p e2t6rqd2ysbvs53t5nna... dbpegthe42sx2yendpes...

2nait2gdeozkgf6gyhzj... live.m16.i2p false2.i2p

epub-eepsite.i2p retrobbs2.i2p rufurus.i2p def4.i2p xs6lr2g5jiaajtb3nkno... repo.r4sas.i2p auchan.i2p v65p4czypwxrn35zlrfk... j7xszhsjy7orrnbdys7y... o46lsq4u7udxg3qqlidr... mt45a2z3sb2iyy2mwauj... o6rmndvggfwnuvxwyq54... cepsrw27kdegwo7ihzou...rslight.i2p pypz7ca24n3lyp4tm3kv...bible4u.i2p 7gajvk4dnnob6wlkoo2z... ru.hiddenanswers.i2p novospice.i2p re6cgwg2yrkgaixlqvt5... inr.i2p bitag46q3465nylvzuik...nyccgjh5cjapsf7frrhx... def2.i2p retrobbs.i2p traditio.i2pj5fex634r2altzb3kjvu... obscuratus.i2p pannuba.i2p xbf3ots2purqun7orn72...def3.i2p 6vxz4yp3vhjwbkmxajj7...suzp44odgixf5lthy5ng... unqueued.i2p 6iqqxuxzgpebvzlncwdd...t65u5hjx3ahgzbi5cvoy... homosexualchan.i2p novabbs.i2p 2gafixvoztrndawkmhfx... t7o2tw36cffedgfr6kah... j5i2tfumh3ti5sdtafwz... identiguy.i2p bnb46culzbssz6aipcjk...6y4tltjdgqwfdcz6tqwc... e-reading.i2p

kepphem42whle3rkfv26... m7fqktjgtmsb3x7bvfrd... vaqc4jm2trq7lx2kkglv... uk.i2p vinz4ygmodxarocntyjl...hz2xhtpeo6btgiwi6od4... short.i2p normal.i2p ktoacmumifddtqdw6ewn...5iedafy32swqq4t2wcmj...3c2jzypzjpxuq2ncr3wn...reuvum7lgetglafn72ch... z5mt5rvnanlex6r3x3jn... ru.i2p ot.knotwork.i2p tipped8.i2p tink.i2p bee.i2p brown.proxynet.i2p psy.i2p knotwork.i2p yayponiesreloaded.i2p 7msryymfdta3ssyz34qu... ciphercraft.i2p blackbox.i2p .i2p syndie-project.i2p vwrl2qmcif722fdkn3ld... erik-the-history.i2p ir6hzi66izmvqx3usjl6...psyco.i2p 3z5k3anbbk32thinvwcy...pt.hiddenanswers.i2p aalttzlt3z25ktokesce... 00.i2p esuwiki.i2p di.i2p heisenberg.i2p gray.proxynet.i2p nsa.i2p cerise.proxynet.i2pbarry.i2p z55khrnlj6bzhs5zielu... deploy.i2p boerse.i2p rideronthestorm.i2p vudu.i2p alice.i2p exchange.gostcoin.i2p 56p5qxsrvo5ereibevet...rpi.i2p bible.i2p keys.echelon.i2pi2p-epub-eepsite.i2p orange.proxynet.i2p eboochka.i2p kk566cv37hivbjafiij5... 6zlio2ipbnmir26nrspd... popki.i2p bugfuzz.i2p eddysblog.i2p me.i2p mauve.proxynet.i2psuicidal.i2pgit-ssh.crypthost.i2p forum.rus.i2p relatelist.i2p cgan.i2pbnc.i2p ujyqbawolalwvdpy33v2... darrob.i2p r233yskmowqe4od4he4b... znc.str4d.i2p metrics.i2ptnxiifs6uticzyg6ac4l... zam7u6vslhemddz347uu... xmpp.crypthost.i2p sto-man.i2pcosteira.i2p 0ipfs.i2parmada.i2pblue.proxynet.i2p ivorytower.i2p k2r2wda4eavt4hoq5hpt... 0pl.i2p anlncoi2fzbsadbujidq... proxynet.i2pjrrecology.i2p xn--l2bl5aw.i2p xc.i2p lithium.i2p xmpp.rpi.i2p freefallheavens.i2p z54dnry6rxtmzcg7e6y3...rebel.i2p mudgaard.i2p crypto.i2p investigaciones.i2p anonsfw.i2p archaicbinarybbs.i2p tabak.i2p ukqap24nwac4gns77s4z... pink.proxynet.i2p cbs.i2p animal.i2p rospravosudie.i2p amber.proxynet.i2p dashninja.i2p bofh.i2p omt56v4jxa4hurbwk44v... green.proxynet.i2pbitlox.i2p tr7ffmkmr6fcope2pix2... m4f4k3eeaj7otbc254cc... ruslibgen.i2p jade.proxynet.i2p fq7xhxkdcauhwn4loufc... redpanda.i2p manas.i2p zener.i2p ardor-wallet.i2p git.crypthost.i2p cases.i2p aport.i2p arf.i2p xk6ypey2az23vtdkitjx...62a4xcyyhvfrcq2bkckb... schwarzwald.i2prevo-ua.i2p www.infoserver.i2p eoilbrgyaiikxzdtmk2z... 2channel.i2p amazingmisc.i2p ymzx5zgt6qzdg6nhxnec... djpn5cbcgmpumwcriuzq...zd37rfivydhkiyvau27q...gh6655arkncnbrzq5tmq... statoneu43konjhym7yp... fido.r4sas.i2pn4xen5sohufgjhv327ex... darknetnow.i2p wkpjjloylf6jopu2itgp... utrer5zgnou72hs4eztm... ato242wckzs4eaawlr5m... gd7qe2pu2jwqabz4zcf3... mqamk4cfykdvhw5kjez2... docs.i2p pop.postman.i2p j2xorlcb3qxubnthzqu7... bwxry5alzx5ihgrd3gla...easygpg2.i2p lldr2miowq6353fxy44p... mattermost.i2p tracker.icu812.i2p knjkodsakcxihwk5w5ne... q2a7tqlyddbyhxhtuia4...5wiyndm44bysev22kxvc... o5jlxbbnx3byzgmihqye...oniichan.i2p vydbychnep3mzkzhg43p... jpv23ou4splo2yd7pqj2... publicwww.i2p instantexchange.i2p ilita.i2p hagen.i2p cathugger.i2p znc.i2p ol.i2p o35ut7hgywy35okvgkjk...itemname.i2p vq43xjjcnejqpzfprws5... gzrjm62ektpqjfsem3r3... vadino.i2p heligoland.i2p seomon.i2p ir2ky5ejx4f646l4fsnu... 5mvpsy4h45w4fx7upen7... falafel.i2p yxvzjwd4vin6pnjauekd... project-future.i2p uw2yt6njjl676fupd72h... newanarchistfaq.i2p vthumgmniwvac2wcrjui... kmpmk2fmineaiwublteq... w.i2p 7633w56hd53sesr6b532... infoserver.i2pyeyar743vuwmm6fpgf3x... mosbot.i2p pomoyka.i2p anotoi.i2p u3f67staiwhqxpacya3c...lodikon.i2p uvpczf5ljaptviombwxi... ts.i2p nebcjgfx3f7q4wzihqmg... serien.i2p 5bhmrp43mjwlzf4x64xg... overchan.oniichan.i2p h77hk3pr622mx5c6qmyb... mwfpkdmjur5ytq4og36y... nch2arl45crkyk6bklyk... fpwrfvidfexsz7dspofk... i2pjump.i2p sjwueu62qpe6dtv5b322... y5o2vwb6kart7ivpnbpk... reg.rus.i2p gtigajpl3dg7ux3hngim... u6pciacxnpbsq7nwc3tg...tumbach.i2p ransack.i2p 00-klartext.i2p ivqynpfwxzl746gxf376... isoxvnflrdn7cm76yjlf...

auvuinzogu6gc4pwsgbj...

5hkhjehj3ct2pvcah7dc...

neodome.i2p

git.volatile.i2p gwqdodo2stgwgwusekxp... paste.r4sas.i2p csen43keji3qiw6uobsg...

kbhfkzx5jeqhfgss4xix...

ginnegappen.i2p

Figure 14. Overall view of the I2P eepsites with at least one incoming or outgoing link.

[27] M. R. Henzinger, “Hyperlink analysis for the web,” IEEE Internet and N. Debnath, “TOR vs I2P: A comparative study,” in 2016 IEEE computing, vol. 5, no. 1, pp. 45–50, 2001. International Conference on Industrial Technology (ICIT). IEEE, 2016, [28] C. H. Ding, H. Zha, X. He, P. Husbands, and H. D. Simon, “Link pp. 1748–1751. analysis: hubs and authorities on the world wide web,” SIAM review, [35] J. P. Timpanaro, I. Chrisment, and O. Festor, “A bird’s eye view on the vol. 46, no. 2, pp. 256–268, 2004. I2P anonymous file-sharing environment,” in International Conference [29] C. Iliou, G. Kalpakis, T. Tsikrika, S. Vrochidis, and I. Kompatsiaris, on Network and System Security. Springer, 2012, pp. 135–148. “Hybrid focused crawling for homemade explosives discovery on [36] K. Shahbar and A. N. Zincir-Heywood, “Effects of shared bandwidth surface and dark web,” in 2016 11th International Conference on on anonymity of the I2P network users,” in 2017 IEEE Security and Availability, Reliability and Security (ARES). IEEE, 2016, pp. 229–234. Privacy Workshops (SPW). IEEE, 2017, pp. 235–240. [30] T. Sochor, “Fuzzy control of configuration of web anonymization using [37] M. Wilson and B. Bazli, “Forensic analysis of I2P activities,” in 2016 TOR,” in 2013 The International Conference on Technological Ad- 22nd International Conference on Automation and Computing (ICAC). vances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE, 2016, pp. 529–534. IEEE, 2013, pp. 115–120. [38] J. P. Timpanaro, T. Cholez, I. Chrisment, and O. Festor, “Evaluation [31] R. A. Rodr´ıguez-Gomez,´ G. Macia-Fern´ andez,´ and A. Casares-Andres,´ of the anonymous I2P network’s design choices against performance “On understanding the existence of a deep torrent,” IEEE Communica- and security,” in 2015 International Conference on Information Systems tions Magazine, vol. 55, no. 7, pp. 64–69, 2017. Security and Privacy (ICISSP). IEEE, 2015, pp. 1–10. [32] M. W. Al-Nabki, E. Fidalgo Fernandez,´ E. Alegre Gutierrez,´ [39] P. Liu, L. Wang, Q. Tan, Q. Li, X. Wang, and J. Shi, “Empirical V. Gonzalez´ Castro et al., “Detecting emerging products in TOR measurement and analysis of I2P routers,” Journal of Networks, vol. 9, network based on k-shell graph decomposition,” in Proceeding of the no. 9, p. 2269, 2014. 3th Spanish Conference on Cybersecurity Research, 2017, pp. 24–30. [40] J. P. Timpanaro, I. Chrisment, and O. Festor, “Monitoring the I2P [33] E. Erdin, C. Zachor, and M. H. Gunes, “How to find hidden users: network,” [Research Report] RR-7844, INRIA, p. 16, 2011. A survey of attacks on anonymity networks,” IEEE Communications [41] L. Liu, H. Zhang, J. Shi, X. Yu, and H. Xu, “I2P anonymous communi- Surveys & Tutorials, vol. 17, no. 4, pp. 2296–2316, 2015. cation network measurement and analysis,” in International Conference [34] A. Ali, M. Khan, M. Saddique, U. Pirzada, M. Zohaib, I. Ahmad, on Smart Computing and Communication. Springer, 2019, pp. 105– 115. [42] TOR, “TOR metrics project,” https://metrics.torproject.org, 2020, [On- line; Accessed 12-24-2020]. [43] I2P, “I2p official site,” https://geti2p.net/, 2020, [Online; Accessed 12- 11-2019]. [44] D. Echeverri Montoya, Deep Web: TOR, FreeNET & I2P. Privacidad y Anonimato. 0xWord, 2016. [45] J. Nurmi, “ crawler official site,” https://github.com/ahmia/ ahmia-crawler, 2015, [Online; Accessed 12-24-2020]. [46] R. Magan-Carri´ on,´ A. Abellan-Galera,´ and G. Macia-Fern´ andez,´ “C4i2p official repository code,” https://github.com/nesg-ugr/I2P Crawler, 2019, [Online; accessed 12-24-2020]. [47] M. T. Khan, J. DeBlasio, G. M. Voelker, A. C. Snoeren, C. Kanich, and N. Vallina-Rodriguez, “An Empirical Analysis of the Commercial VPN Ecosystem,” in Proceedings of the Internet Measurement Conference 2018, ser. IMC ’18. New York, NY, USA: Association for Computing Machinery, 2018, pp. 443–456.